[2026-04-15 15:00:45,120] [DEBUG] [axolotl.utils.config.resolve_dtype:66] [PID:2788] bf16 support detected, enabling for this configuration.
[2026-04-15 15:00:45,223] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:2788] baseline 0.000GB ()
[2026-04-15 15:00:45,223] [INFO] [axolotl.cli.config.load_cfg:248] [PID:2788] config:
{
  "activation_offloading": false,
  "adapter": "lora",
  "axolotl_config_path": "config.yaml",
  "base_model": "Qwen/Qwen2.5-Coder-3B-Instruct",
  "base_model_config": "Qwen/Qwen2.5-Coder-3B-Instruct",
  "batch_size": 8,
  "bf16": true,
  "capabilities": {
    "bf16": true,
    "compute_capability": "sm_90",
    "fp8": false,
    "n_gpu": 1,
    "n_node": 1
  },
  "context_parallel_size": 1,
  "dataloader_num_workers": 1,
  "dataloader_pin_memory": true,
  "dataloader_prefetch_factor": 256,
  "dataset_processes": 24,
  "datasets": [
    {
      "chat_template": "tokenizer_default",
      "ds_type": "json",
      "field_messages": "messages",
      "message_property_mappings": {
        "content": "content",
        "role": "role"
      },
      "path": "dria_pythonic_fc_chatml.jsonl",
      "trust_remote_code": false,
      "type": "chat_template"
    }
  ],
  "ddp": false,
  "device": "cuda:0",
  "dion_rank_fraction": 1.0,
  "dion_rank_multiple_of": 1,
  "env_capabilities": {
    "torch_version": "2.7.1"
  },
  "eval_batch_size": 1,
  "eval_causal_lm_metrics": [
    "sacrebleu",
    "comet",
    "ter",
    "chrf"
  ],
  "eval_max_new_tokens": 128,
  "eval_table_size": 0,
  "experimental_skip_move_to_device": true,
  "fp16": false,
  "gradient_accumulation_steps": 8,
  "gradient_checkpointing": false,
  "include_tkps": true,
  "learning_rate": 0.0002,
  "lisa_layers_attribute": "model.layers",
  "load_best_model_at_end": false,
  "load_in_4bit": false,
  "load_in_8bit": true,
  "local_rank": 0,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "lora_r": 16,
  "lora_target_modules": [
    "q_proj",
    "v_proj",
    "k_proj",
    "o_proj",
    "gate_proj",
    "down_proj",
    "up_proj"
  ],
  "loraplus_lr_embedding": 1e-06,
  "lr_scheduler": "cosine",
  "mean_resizing_embeddings": false,
  "micro_batch_size": 1,
  "model_config_type": "qwen2",
  "num_epochs": 2.0,
  "optimizer": "adamw_bnb_8bit",
  "output_dir": "./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent",
  "pretrain_multipack_attn": true,
  "profiler_steps_start": 0,
  "qlora_sharded_model_loading": false,
  "ray_num_workers": 1,
  "resources_per_worker": {
    "GPU": 1
  },
  "sample_packing_bin_size": 200,
  "sample_packing_group_size": 100000,
  "save_only_model": false,
  "save_safetensors": true,
  "sequence_len": 2048,
  "shuffle_before_merging_datasets": false,
  "shuffle_merged_datasets": true,
  "skip_prepare_dataset": false,
  "streaming_multipack_buffer_size": 10000,
  "strict": false,
  "tensor_parallel_size": 1,
  "tiled_mlp_use_original_mlp": true,
  "tokenizer_config": "Qwen/Qwen2.5-Coder-3B-Instruct",
  "tokenizer_save_jinja_files": true,
  "torch_dtype": "torch.bfloat16",
  "train_on_inputs": false,
  "trl": {
    "log_completions": false,
    "mask_truncated_completions": false,
    "ref_model_mixup_alpha": 0.9,
    "ref_model_sync_steps": 64,
    "scale_rewards": true,
    "sync_ref_model": false,
    "use_vllm": false,
    "vllm_server_host": "0.0.0.0",
    "vllm_server_port": 8000
  },
  "use_ray": false,
  "val_set_size": 0.0,
  "vllm": {
    "device": "auto",
    "dtype": "auto",
    "gpu_memory_utilization": 0.9,
    "host": "0.0.0.0",
    "port": 8000
  },
  "weight_decay": 0.0,
  "world_size": 1
}
[2026-04-15 15:00:45,611] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:278] [PID:2788] EOS: 151645 / <|im_end|>
[2026-04-15 15:00:45,616] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:279] [PID:2788] BOS: None / None
[2026-04-15 15:00:45,617] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:280] [PID:2788] PAD: 151643 / <|endoftext|>
[2026-04-15 15:00:45,618] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:2788] UNK: None / None
[2026-04-15 15:00:45,618] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:476] [PID:2788] Unable to find prepared dataset in last_run_prepared/8f7eb77e9223d8e0f9d67ceaf2fe66aa
[2026-04-15 15:00:45,619] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:2788] Loading raw datasets...
[2026-04-15 15:00:45,619] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:2788] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`.
Generating train split: 0 examples [00:00, ? examples/s]Generating train split: 4700 examples [00:00, 27428.78 examples/s]Generating train split: 18689 examples [00:00, 65610.98 examples/s]Generating train split: 31377 examples [00:00, 74707.26 examples/s]Generating train split: 43579 examples [00:00, 81622.44 examples/s]Generating train split: 52456 examples [00:00, 65689.00 examples/s]Generating train split: 61915 examples [00:01, 48484.23 examples/s]Generating train split: 70945 examples [00:01, 50576.27 examples/s]Generating train split: 79532 examples [00:01, 53934.34 examples/s]Generating train split: 81819 examples [00:01, 53985.94 examples/s]
[2026-04-15 15:00:47,413] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:2788] Loading dataset: dria_pythonic_fc_chatml.jsonl with base_type: chat_template and prompt_style: None
[2026-04-15 15:00:47,422] [INFO] [axolotl.prompt_strategies.chat_template.__call__:969] [PID:2788] Using chat template:
---
{%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0]['role'] == 'system' %}
        {{- messages[0]['content'] }}
    {%- else %}
        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
    {%- endif %}
    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
    {%- if messages[0]['role'] == 'system' %}
        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
    {%- else %}
        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
    {%- endif %}
{%- endif %}
{%- for message in messages %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {{- '<|im_start|>' + message.role }}
        {%- if message.content %}
            {{- '\n' + message.content }}
        {%- endif %}
        {%- for tool_call in message.tool_calls %}
            {%- if tool_call.function is defined %}
                {%- set tool_call = tool_call.function %}
            {%- endif %}
            {{- '\n<tool_call>\n{"name": "' }}
            {{- tool_call.name }}
            {{- '", "arguments": ' }}
            {{- tool_call.arguments | tojson }}
            {{- '}\n</tool_call>' }}
        {%- endfor %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- message.content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
{%- endif %}

---
Tokenizing Prompts (num_proc=24):   0%|                                                         | 0/81819 [00:00<?, ? examples/s][2026-04-15 15:00:49,658] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,692] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,733] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,734] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,758] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,983] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:49,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,028] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,253] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,285] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,411] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,529] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,534] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,561] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,563] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,619] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,648] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,683] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,752] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,817] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,915] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:50,977] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,230] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,368] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,417] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):   1%|▌                                            | 1000/81819 [00:04<05:25, 248.22 examples/s][2026-04-15 15:00:51,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,596] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):   2%|█                                            | 2000/81819 [00:04<02:17, 578.47 examples/s][2026-04-15 15:00:51,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,851] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):   7%|███▏                                        | 6000/81819 [00:04<00:32, 2328.47 examples/s][2026-04-15 15:00:51,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,877] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:51,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,230] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,267] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  10%|████▎                                       | 8000/81819 [00:04<00:26, 2771.68 examples/s][2026-04-15 15:00:52,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,362] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,416] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,483] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,492] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,581] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  11%|████▊                                       | 9000/81819 [00:05<00:24, 2941.80 examples/s][2026-04-15 15:00:52,590] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,629] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,631] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  13%|█████▊                                     | 11000/81819 [00:05<00:17, 3990.96 examples/s][2026-04-15 15:00:52,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,812] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,932] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:52,977] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  15%|██████▎                                    | 12000/81819 [00:05<00:17, 4012.43 examples/s][2026-04-15 15:00:53,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,057] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,088] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,126] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,162] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,229] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,235] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,333] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,361] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,381] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,425] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,487] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,536] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,543] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  17%|███████▎                                   | 14000/81819 [00:06<00:17, 3897.43 examples/s][2026-04-15 15:00:53,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,628] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,663] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,680] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,723] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,817] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,950] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,950] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:53,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,059] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,230] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,332] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,418] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,426] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,435] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,508] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,512] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,609] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,734] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,765] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,828] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,871] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  18%|███████▉                                   | 15000/81819 [00:07<00:32, 2078.89 examples/s][2026-04-15 15:00:54,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:54,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,023] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  20%|████████▍                                  | 16000/81819 [00:07<00:26, 2445.25 examples/s][2026-04-15 15:00:55,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,059] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,114] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,120] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,190] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  24%|██████████▌                                | 20000/81819 [00:07<00:12, 5019.25 examples/s][2026-04-15 15:00:55,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,278] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,432] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  27%|███████████▌                               | 22000/81819 [00:07<00:10, 5671.80 examples/s][2026-04-15 15:00:55,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,532] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,616] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,692] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  29%|████████████▌                              | 24000/81819 [00:08<00:09, 5953.86 examples/s][2026-04-15 15:00:55,752] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,862] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,877] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,914] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:55,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,029] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,127] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,176] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  31%|█████████████▏                             | 25000/81819 [00:08<00:12, 4535.95 examples/s][2026-04-15 15:00:56,235] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,276] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,353] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,532] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  32%|█████████████▋                             | 26000/81819 [00:09<00:13, 4060.81 examples/s][2026-04-15 15:00:56,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,607] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,658] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,692] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,728] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  34%|██████████████▋                            | 28000/81819 [00:09<00:10, 5235.86 examples/s][2026-04-15 15:00:56,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,785] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,812] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,832] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,927] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,928] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:56,991] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  37%|███████████████▊                           | 30000/81819 [00:09<00:09, 5683.38 examples/s][2026-04-15 15:00:57,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,125] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,135] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,214] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,219] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,260] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,285] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,322] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,357] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,360] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  40%|█████████████████▎                         | 33000/81819 [00:09<00:07, 6886.79 examples/s][2026-04-15 15:00:57,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,382] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,428] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,456] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,484] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,528] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,529] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,563] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,581] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,631] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,648] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,657] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,725] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,831] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,831] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,837] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,857] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,949] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:57,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,123] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,228] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,322] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,328] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,357] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  43%|██████████████████▍                        | 35000/81819 [00:10<00:11, 4030.97 examples/s][2026-04-15 15:00:58,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,476] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,514] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,533] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,565] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,656] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  46%|███████████████████▉                       | 38000/81819 [00:11<00:08, 5157.41 examples/s][2026-04-15 15:00:58,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,722] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,785] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,828] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,928] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:58,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,034] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,131] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,134] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,214] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,331] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  48%|████████████████████▍                      | 39000/81819 [00:11<00:11, 3754.86 examples/s][2026-04-15 15:00:59,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,408] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,534] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,581] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,629] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,631] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,635] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,664] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,683] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,692] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,733] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  50%|█████████████████████▌                     | 41000/81819 [00:12<00:09, 4084.26 examples/s][2026-04-15 15:00:59,758] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  52%|██████████████████████▏                    | 42230/81819 [00:12<00:08, 4748.40 examples/s][2026-04-15 15:00:59,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,973] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,984] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:00:59,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,015] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,085] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,093] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  53%|██████████████████████▉                    | 43639/81819 [00:12<00:07, 4879.25 examples/s][2026-04-15 15:01:00,119] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,128] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,129] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,130] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,198] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,318] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,337] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  55%|███████████████████████▍                   | 44639/81819 [00:12<00:07, 4668.59 examples/s][2026-04-15 15:01:00,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,430] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,435] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,456] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,459] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  57%|████████████████████████▌                  | 46639/81819 [00:12<00:05, 6109.19 examples/s][2026-04-15 15:01:00,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,529] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  59%|█████████████████████████▎                 | 48048/81819 [00:13<00:05, 6645.55 examples/s][2026-04-15 15:01:00,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,726] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  61%|██████████████████████████▎                | 50048/81819 [00:13<00:03, 8250.23 examples/s][2026-04-15 15:01:00,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,885] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,898] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,914] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,915] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,926] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:00,973] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  63%|███████████████████████████                | 51457/81819 [00:13<00:03, 8332.89 examples/s][2026-04-15 15:01:01,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,127] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,131] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,190] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,229] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,254] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  65%|███████████████████████████▊               | 52866/81819 [00:13<00:04, 6718.31 examples/s][2026-04-15 15:01:01,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,333] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,422] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,431] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,432] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,508] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,537] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,561] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,609] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  66%|████████████████████████████▏              | 53684/81819 [00:14<00:06, 4616.16 examples/s][2026-04-15 15:01:01,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,776] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,837] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  68%|█████████████████████████████▏             | 55502/81819 [00:14<00:04, 6093.69 examples/s][2026-04-15 15:01:01,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,898] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,898] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,926] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:01,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,130] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,225] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,311] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,318] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,332] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,361] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,418] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,437] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,454] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,492] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,595] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,701] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,825] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,869] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,907] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,958] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:02,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,032] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,067] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,101] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,134] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,157] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,223] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,267] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,368] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,437] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,502] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,528] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,532] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,597] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,620] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,632] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,709] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,728] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,731] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  70%|██████████████████████████████             | 57320/81819 [00:16<00:11, 2137.16 examples/s][2026-04-15 15:01:03,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,889] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,914] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,925] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:03,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,047] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,199] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,270] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,383] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,465] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,500] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  71%|██████████████████████████████▋            | 58320/81819 [00:16<00:12, 1873.84 examples/s][2026-04-15 15:01:04,531] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,568] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,590] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,662] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  73%|███████████████████████████████▏           | 59320/81819 [00:17<00:10, 2240.89 examples/s][2026-04-15 15:01:04,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,724] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,760] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:04,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,094] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  74%|███████████████████████████████▋           | 60320/81819 [00:17<00:09, 2255.07 examples/s][2026-04-15 15:01:05,132] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,157] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,296] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,302] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,332] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,396] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,435] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,460] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,508] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,555] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,570] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,589] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  75%|████████████████████████████████▏          | 61320/81819 [00:18<00:09, 2212.88 examples/s][2026-04-15 15:01:05,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,619] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,629] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,663] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,698] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,699] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,862] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,882] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,918] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:05,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  76%|████████████████████████████████▊          | 62320/81819 [00:18<00:08, 2250.49 examples/s][2026-04-15 15:01:06,030] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,130] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,222] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,228] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,276] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,301] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,422] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,426] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,517] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,530] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,594] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,619] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,664] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,718] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,726] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,729] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,766] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,776] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,798] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,873] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:06,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,032] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,094] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,114] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  77%|█████████████████████████████████▎         | 63320/81819 [00:19<00:12, 1471.78 examples/s][2026-04-15 15:01:07,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,352] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,401] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,429] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,535] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,563] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  78%|█████████████████████████████████▍         | 63729/81819 [00:20<00:12, 1476.89 examples/s][2026-04-15 15:01:07,587] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,595] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,636] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,876] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  79%|██████████████████████████████████         | 64729/81819 [00:20<00:09, 1758.08 examples/s][2026-04-15 15:01:07,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,936] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:07,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  80%|██████████████████████████████████▏        | 65138/81819 [00:20<00:09, 1848.22 examples/s][2026-04-15 15:01:08,097] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,138] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,244] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,328] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,364] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,394] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,408] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,431] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,470] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,561] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,581] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,648] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,765] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,771] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,788] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,876] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,901] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,907] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,950] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:08,997] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,026] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,162] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,169] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,198] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,307] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,333] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,353] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,353] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,382] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  81%|██████████████████████████████████▊        | 66138/81819 [00:21<00:13, 1186.29 examples/s][2026-04-15 15:01:09,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,533] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,543] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,563] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,565] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,617] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,657] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,662] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,679] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,766] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,798] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,802] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,935] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:09,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,035] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,093] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  82%|███████████████████████████████████▎       | 67138/81819 [00:22<00:11, 1270.51 examples/s][2026-04-15 15:01:10,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,193] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,250] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,435] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,471] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,512] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,517] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,525] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,543] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,568] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,595] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,597] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,619] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,633] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,657] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,718] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  84%|████████████████████████████████████       | 68547/81819 [00:23<00:08, 1485.03 examples/s][2026-04-15 15:01:10,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,935] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,949] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:10,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,032] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,034] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,160] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,190] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,226] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,228] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,323] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  85%|████████████████████████████████████▌      | 69547/81819 [00:23<00:07, 1630.61 examples/s][2026-04-15 15:01:11,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,394] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  86%|█████████████████████████████████████      | 70547/81819 [00:23<00:05, 2121.35 examples/s][2026-04-15 15:01:11,463] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,533] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,587] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,589] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,837] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,925] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:11,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,028] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,036] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  87%|█████████████████████████████████████▎     | 70956/81819 [00:24<00:06, 1623.30 examples/s][2026-04-15 15:01:12,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,094] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,098] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,225] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,250] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,302] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,352] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,383] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,422] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,437] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,500] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,589] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,596] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,616] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  87%|█████████████████████████████████████▌     | 71365/81819 [00:25<00:07, 1337.53 examples/s][2026-04-15 15:01:12,616] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,634] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,680] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,760] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,799] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,817] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,869] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,869] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,959] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:12,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,025] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,311] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  88%|█████████████████████████████████████▋     | 71774/81819 [00:25<00:09, 1067.86 examples/s][2026-04-15 15:01:13,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,323] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,345] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,383] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,408] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,425] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,465] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,499] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,535] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,603] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,604] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,635] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,636] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,661] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,663] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,687] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,720] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,728] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,737] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,765] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,812] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,871] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,889] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  88%|██████████████████████████████████████▊     | 72183/81819 [00:26<00:10, 953.90 examples/s][2026-04-15 15:01:13,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,927] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,928] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,959] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:13,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,062] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,085] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,204] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  89%|██████████████████████████████████████▏    | 72592/81819 [00:26<00:09, 1022.79 examples/s][2026-04-15 15:01:14,230] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,235] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,254] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,271] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,303] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,311] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,334] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,366] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,371] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,487] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,499] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,566] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,609] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,635] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,711] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,733] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,759] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,760] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,785] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,802] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,882] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,935] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:14,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,012] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,035] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,036] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,103] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,131] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,198] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,244] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,329] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,470] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,484] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,502] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,555] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,607] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,662] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,718] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,828] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,829] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,885] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,977] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:15,984] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,012] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,032] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,054] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,126] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,130] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,160] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,213] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,219] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,270] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,307] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,337] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,368] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,368] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,487] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,502] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,582] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,679] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,718] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,750] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,798] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,889] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,926] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:16,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,029] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,127] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,158] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,267] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,303] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,335] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,362] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,437] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,528] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,588] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,617] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,648] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,687] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,701] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,892] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,900] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,949] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:17,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,026] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,036] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,059] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,125] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,223] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,267] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,271] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,325] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,364] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,417] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,470] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,487] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,531] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,596] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,630] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,765] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,831] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,926] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,950] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,959] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,961] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:18,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,006] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,016] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,020] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,134] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,193] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,214] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,225] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,378] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,397] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,516] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,532] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,555] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,587] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,588] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,636] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,724] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,725] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,750] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,829] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,851] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,876] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,907] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,973] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:19,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,005] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,097] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,119] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,127] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,234] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,328] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,432] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,459] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,476] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,484] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,582] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  90%|███████████████████████████████████████▌    | 73592/81819 [00:33<00:28, 286.21 examples/s][2026-04-15 15:01:20,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,788] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,814] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,814] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,862] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,889] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:20,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,028] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,054] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,301] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,302] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  91%|████████████████████████████████████████    | 74592/81819 [00:33<00:17, 416.11 examples/s][2026-04-15 15:01:21,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,565] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,629] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,633] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,671] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,698] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,699] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,732] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,814] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,873] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,885] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,936] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,964] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:21,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,057] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,063] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,134] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,169] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,301] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,361] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,364] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,394] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,434] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,454] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,617] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,627] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,629] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,687] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,692] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,706] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,737] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,785] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,832] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,882] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:22,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,047] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,054] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,180] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,214] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,253] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,259] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,260] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,337] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,459] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,487] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,534] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,616] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,718] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,724] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,766] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,766] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,799] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,884] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:23,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,062] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,132] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,233] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,306] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,364] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,427] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,437] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,471] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,537] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,543] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,600] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,657] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,679] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,729] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,735] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,758] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,889] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,961] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:24,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,025] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,054] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,101] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,138] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,193] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,244] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,278] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,324] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,353] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,543] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,687] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,722] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,728] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,852] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,876] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,901] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,964] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:25,997] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,198] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,225] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,270] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,298] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,328] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,364] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,397] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,418] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,432] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,525] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,588] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,628] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,698] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,709] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,724] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,750] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,814] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,837] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,869] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,876] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:26,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,012] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,094] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,199] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,276] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,285] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,307] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,325] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,425] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,502] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,597] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,658] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,729] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,828] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,859] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,892] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,925] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,930] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,959] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:27,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,029] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,162] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,223] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,250] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,416] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,426] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,561] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,596] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,620] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,731] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,737] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,814] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,837] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,877] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:28,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,062] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,120] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,128] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,228] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,259] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,271] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,303] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,334] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,422] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,430] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,500] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,516] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,531] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,587] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,603] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,634] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,658] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,771] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,892] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,991] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:29,997] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,020] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,093] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,103] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,230] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,254] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,270] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,301] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,306] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,366] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,394] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,429] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,487] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,525] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,656] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,687] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,723] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,759] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,771] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,829] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,851] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,901] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,991] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:30,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,015] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,020] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,034] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,118] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,223] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,366] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,383] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,408] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,456] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,456] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,519] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,519] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,566] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,595] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,604] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,628] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,735] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,949] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,984] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:31,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,029] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,131] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,151] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,331] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,362] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,426] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,459] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,527] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,528] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,555] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,563] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,597] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,799] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,898] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,914] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:32,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,118] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,134] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,222] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,229] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,253] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,285] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,371] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,471] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,508] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,530] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,543] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,543] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,699] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,737] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,766] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,825] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,873] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,926] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,991] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:33,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,027] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,037] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,131] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,132] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,158] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,169] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,296] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,302] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,318] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,352] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,388] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,388] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,416] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,434] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,477] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,492] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,633] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,658] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,683] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,729] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,799] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:34,997] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,015] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,022] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,028] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,088] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,120] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,125] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,135] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,199] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,225] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,276] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,311] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,369] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,401] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,428] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,512] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,530] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,594] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,603] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,620] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  91%|████████████████████████████████████████    | 74592/81819 [00:48<00:17, 416.11 examples/s][2026-04-15 15:01:35,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,701] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,734] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,776] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,802] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,869] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,884] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,930] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:35,991] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,005] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,101] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,163] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,235] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,306] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,345] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,371] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,411] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,508] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,582] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,582] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,656] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,698] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,723] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,725] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,928] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:36,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,012] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,192] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,234] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,270] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,324] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,325] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,352] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,483] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,517] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,568] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,588] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,596] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,627] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,628] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,656] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,698] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,731] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,760] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,776] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,812] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,950] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,984] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:37,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,016] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,128] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,151] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,228] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,311] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,337] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,366] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,417] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,456] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,460] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,508] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,536] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,568] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,607] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,657] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,679] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,683] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,711] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,859] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,932] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,936] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:38,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,063] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,094] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,132] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,192] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,226] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,318] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,512] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,570] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,581] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  92%|████████████████████████████████████████▋   | 75592/81819 [00:52<00:50, 123.79 examples/s][2026-04-15 15:01:39,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,679] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,726] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,771] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,828] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,852] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,907] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:39,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,005] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,006] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,037] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,062] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,118] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,329] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,361] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,519] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,529] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,627] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,663] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,726] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,788] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,825] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,859] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,877] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,900] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,949] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:40,997] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,067] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,094] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,120] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,132] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,176] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,190] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,254] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,301] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,333] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,357] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,382] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,426] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,470] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,516] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,566] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,570] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,600] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,680] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,724] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,750] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,758] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,852] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,871] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,964] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:41,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,085] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,118] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,193] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,260] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,328] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,394] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,418] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,463] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,483] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,533] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,720] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,752] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,788] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,800] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,862] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,885] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,983] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:42,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,015] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,016] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,020] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,047] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,064] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,114] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,158] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,333] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,357] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,371] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,383] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,535] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,587] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,603] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,664] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,720] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,828] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,900] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,915] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:43,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,047] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,097] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,190] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,226] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,254] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,260] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,371] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,459] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,502] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,516] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,531] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,587] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,609] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,635] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,701] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,831] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,882] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,900] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,928] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:44,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,098] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,130] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,158] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,160] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,199] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,213] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,222] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,250] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,353] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,401] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,582] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,706] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,735] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,859] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,859] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,885] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,981] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:45,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,098] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,122] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,160] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,254] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,267] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,278] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,303] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,437] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,471] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,499] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,500] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,569] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,590] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,706] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,759] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,885] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,931] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:46,997] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,009] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,015] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,034] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,067] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,135] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,151] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,223] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,253] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,303] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,401] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,411] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,456] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,483] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,517] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,529] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,629] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,680] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,687] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,732] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,825] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,889] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,918] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,945] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:47,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,097] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,198] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,213] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,225] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,397] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,422] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,483] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,508] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,527] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,720] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,771] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,851] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,877] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,883] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,915] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,958] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:48,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,027] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,036] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,063] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,088] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,093] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,119] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,121] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,127] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,158] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,260] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,397] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,429] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,476] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,616] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,851] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,884] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,958] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,977] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,977] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:49,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,026] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,035] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,067] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,088] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,101] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,127] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,158] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,233] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,260] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,483] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,536] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,565] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,588] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  94%|█████████████████████████████████████████▏  | 76592/81819 [01:03<00:47, 109.54 examples/s][2026-04-15 15:01:50,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,800] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,871] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:50,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  95%|█████████████████████████████████████████▋  | 77592/81819 [01:03<00:26, 159.95 examples/s][2026-04-15 15:01:51,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,276] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,306] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,307] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,323] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,332] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,366] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,428] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,434] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,484] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,590] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,617] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,679] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,699] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,706] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,732] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,771] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,884] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,928] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,931] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:51,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,015] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,025] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,101] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,219] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,259] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,259] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,368] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,378] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,383] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,411] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,484] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,517] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,628] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,718] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,732] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,882] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,900] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,973] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:52,997] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,020] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,128] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,142] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,204] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,278] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,278] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,302] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,333] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,337] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,357] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,366] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,416] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,456] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,590] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,600] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,619] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,679] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,735] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,892] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,958] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:53,984] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,037] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,059] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,101] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,132] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,135] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,169] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,226] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,428] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,429] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,499] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,529] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,531] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,589] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,604] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,607] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,661] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,680] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,752] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,798] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,949] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:54,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,027] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,054] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,171] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,262] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,289] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,335] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,337] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,352] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,368] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,427] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,454] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,462] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,471] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,581] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,594] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,630] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,716] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,759] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,802] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,941] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:55,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,009] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,016] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,038] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,085] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,180] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,192] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,223] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,225] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,302] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,322] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,332] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,346] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,416] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,428] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,437] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,563] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,604] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,607] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,662] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,683] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,718] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,723] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,731] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,868] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,918] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,950] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,973] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:56,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,059] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,059] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,162] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,254] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,289] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,296] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,467] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,550] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,600] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,622] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,725] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,803] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,932] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:57,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,047] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,062] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,157] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,176] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,324] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,387] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,394] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,417] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,425] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,477] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,537] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,555] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,570] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,603] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,706] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,706] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,788] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,909] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,932] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,936] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,967] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,981] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:58,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,022] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,139] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,163] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,196] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,213] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,226] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,240] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,259] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,270] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,298] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,306] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,324] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,345] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,396] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,422] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,517] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,518] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,728] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,857] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,877] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,979] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,981] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:01:59,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,062] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,176] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,199] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,250] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,278] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,306] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,334] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,475] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,582] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,594] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,660] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,719] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,765] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,812] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,918] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,983] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:00,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,015] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,035] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,073] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,098] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,192] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,223] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,260] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,289] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,329] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,400] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,408] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,618] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,699] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,722] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,829] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,898] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,915] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,983] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:01,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,012] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,101] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,180] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,192] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,244] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,289] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,370] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,545] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,589] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,711] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,728] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,799] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,817] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,854] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,871] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,919] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,961] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:02,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,048] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,071] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,131] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,163] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,176] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,231] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,244] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,394] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,430] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,442] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,470] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,525] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,531] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,578] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,603] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,680] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,692] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,699] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,725] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,785] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,800] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,811] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,914] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,925] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,935] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:03,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,005] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,005] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,029] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,075] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,114] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,178] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,300] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,361] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,362] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,382] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,411] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,460] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,484] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,502] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,512] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,635] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,701] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,729] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,758] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,758] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,776] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,873] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,892] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,915] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,923] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,984] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:04,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,034] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,057] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,088] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,107] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,121] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,214] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,250] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,271] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,343] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,435] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,496] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,519] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,527] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,570] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,607] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,622] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,628] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,655] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,656] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  95%|█████████████████████████████████████████▋  | 77592/81819 [01:18<00:26, 159.95 examples/s][2026-04-15 15:02:05,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,709] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,876] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,931] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,959] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,970] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:05,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,020] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,041] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,064] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,118] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,141] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,193] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,285] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,337] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,357] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,360] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,361] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,424] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,457] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,479] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,571] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,595] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,628] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,634] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,708] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,726] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,749] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,775] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,800] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,814] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,843] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,950] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,972] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:06,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,061] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,126] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,152] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,296] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,296] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,331] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,362] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,401] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,417] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,486] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,514] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,516] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,517] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,555] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,557] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,561] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,579] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,587] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,661] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,681] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,707] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,722] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,723] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,771] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,824] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,827] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,852] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,896] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,901] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:07,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,026] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,028] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,047] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,187] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,237] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,267] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,278] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,378] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,415] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,530] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,611] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,614] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,645] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,709] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,722] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,779] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,819] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,835] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,852] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,932] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,983] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:08,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,012] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,024] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,064] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,120] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,146] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,210] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,259] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,306] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,316] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,330] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,361] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,378] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,458] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,474] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,510] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,630] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,661] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,665] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,698] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,731] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,752] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,789] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,802] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,850] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,884] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,888] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:09,999] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,016] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,020] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,078] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,106] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,117] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,151] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,151] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,205] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,244] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,311] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,336] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,350] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,372] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,376] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,399] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,454] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,492] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,503] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,528] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,548] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,584] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,609] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,622] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,690] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,711] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,720] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,750] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,769] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,781] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,812] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,820] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,837] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,857] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,866] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,914] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,936] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,947] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,981] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:10,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,022] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,027] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,066] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,067] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,105] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,163] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,202] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,249] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,324] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,371] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,385] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,411] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,463] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,501] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  96%|███████████████████████████████████████████▏ | 78592/81819 [01:24<00:34, 92.64 examples/s][2026-04-15 15:02:11,622] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,632] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,664] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,684] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,710] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,725] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,760] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,763] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,791] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,802] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,885] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,924] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:11,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,016] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,029] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,050] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,063] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,089] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,123] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,149] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,180] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,192] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,211] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,222] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,245] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,257] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,331] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,352] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,383] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,421] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,425] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,459] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,481] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,507] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,527] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,564] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,582] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,594] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,602] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,622] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,656] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,672] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,704] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,726] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,746] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,808] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,810] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,829] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,842] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,873] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,875] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,898] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,901] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,930] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,986] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:12,989] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,022] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,025] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,046] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,070] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,085] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,092] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,111] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,137] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,160] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,169] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,214] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,279] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,311] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,313] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,335] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,364] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,402] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,434] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,471] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,504] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,527] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,597] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,607] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,648] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,662] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,752] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,798] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,825] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,827] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,845] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,863] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,869] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,870] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,943] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:13,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,002] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,007] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,025] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,051] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,085] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,087] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,118] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,121] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,130] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,151] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,154] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,163] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,264] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,292] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,318] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,320] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,348] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,382] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,384] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,408] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,429] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,469] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,477] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,519] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,520] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,567] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,570] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,630] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,649] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,685] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,691] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,711] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,724] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,744] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,760] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,794] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,865] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,894] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,903] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,925] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,964] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:14,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,004] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,018] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,057] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,064] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,097] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,127] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,133] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,153] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,193] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,199] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,201] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,261] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,284] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,285] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,314] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,315] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,322] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,345] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,388] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,433] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,448] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,454] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,476] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,477] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,483] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,493] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,509] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,512] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,547] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,561] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,609] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,617] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,661] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,667] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,699] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,720] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,756] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,761] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,798] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,831] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,848] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,857] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,897] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,916] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,948] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,975] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:15,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,009] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,021] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,047] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,054] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,081] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,088] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,114] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,116] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,126] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,175] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,183] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,198] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,212] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,235] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,248] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,276] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,324] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,349] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,362] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,388] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,393] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,426] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,428] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,435] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,446] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,463] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,500] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,513] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,529] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,540] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,563] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,595] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,636] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,644] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,706] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,714] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,722] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,764] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,782] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,813] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,817] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,829] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,836] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,849] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,857] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,927] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,957] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,958] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,960] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:16,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,009] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,035] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,056] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,067] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,069] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,079] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,115] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,157] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,174] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,177] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,179] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,206] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,213] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,241] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,259] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,273] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,291] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,301] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,318] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,334] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,371] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,388] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,407] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,414] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,441] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,449] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,490] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,492] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,522] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,525] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,552] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,553] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,581] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,598] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,609] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,639] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,674] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,676] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,696] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,703] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,721] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,726] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,736] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,758] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,762] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,790] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,792] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,795] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,834] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,855] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,858] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,873] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,887] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,904] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,911] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,921] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,940] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,953] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,966] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,981] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:17,998] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,001] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,009] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,037] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,060] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,086] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,093] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,120] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,120] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,150] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,155] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,176] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,181] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,204] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,219] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,253] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,293] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,308] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,341] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2852] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,360] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,378] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,413] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,431] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,440] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,453] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,488] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,528] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,572] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,574] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,597] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,612] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,623] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,650] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,670] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,682] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,787] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,851] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,861] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,891] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,934] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,958] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,974] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:18,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,026] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,112] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,169] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,169] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,195] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,220] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,244] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,250] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,275] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,283] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,301] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,351] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,377] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,381] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,409] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,438] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,460] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,482] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,500] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,515] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,559] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,597] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,615] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,634] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,648] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  97%|██████████████████████████████████████████▊ | 79592/81819 [01:32<00:22, 100.63 examples/s][2026-04-15 15:02:19,673] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,686] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,713] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,724] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,748] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,760] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,799] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,852] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,882] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,910] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,912] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,944] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,946] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,977] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:19,978] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,005] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,010] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,035] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,058] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,099] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,161] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,173] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,189] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,242] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,269] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,271] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,297] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,322] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,327] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,389] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,403] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,416] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,439] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,451] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,468] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,546] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,620] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,638] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,651] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,666] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,732] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,757] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,773] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,784] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,807] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,816] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,821] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,839] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,847] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,864] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,890] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,900] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,917] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,942] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,987] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:20,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,027] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,045] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,084] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,110] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,164] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,200] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,207] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,246] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,326] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,354] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,363] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,418] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,425] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,450] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,473] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,532] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,539] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,565] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,573] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,599] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,601] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  98%|███████████████████████████████████████████ | 80001/81819 [01:34<00:16, 109.39 examples/s][2026-04-15 15:02:21,647] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,653] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,694] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,730] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,755] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,765] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,777] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,786] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,801] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,838] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,860] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,886] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,889] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:21,922] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,030] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,042] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,068] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,123] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,136] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,217] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,253] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,281] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,310] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,344] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,355] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,392] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,418] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,454] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,485] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,511] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,544] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,576] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,619] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,652] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,722] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,785] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,817] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24):  99%|███████████████████████████████████████████▌| 81001/81819 [01:35<00:05, 157.14 examples/s][2026-04-15 15:02:22,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,884] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,926] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,931] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:22,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,009] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,031] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,040] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,064] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,121] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,166] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,184] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,203] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,219] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,236] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,255] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,266] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,280] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,307] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,328] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,358] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,373] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,391] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,405] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,417] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,476] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,494] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,523] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,541] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,561] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,568] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,590] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,596] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,616] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,624] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,640] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,656] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,675] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,683] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,738] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,740] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,770] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,797] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,805] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,859] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,878] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,895] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,939] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,965] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:23,969] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,035] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,044] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,083] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2850] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,096] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,128] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,191] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,224] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,265] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24): 100%|███████████████████████████████████████████▊| 81410/81819 [01:36<00:02, 169.95 examples/s][2026-04-15 15:02:24,304] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,379] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,416] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,454] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,484] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,528] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,562] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,589] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,663] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,695] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,727] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,846] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,880] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,954] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:24,993] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,033] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,063] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,095] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,131] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,238] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,296] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,325] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,356] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,460] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,489] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,519] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,586] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,626] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,671] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,712] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,822] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,899] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,933] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,956] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,976] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:25,995] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,017] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,049] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,113] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,180] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,216] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,272] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,295] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,328] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,360] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,390] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,419] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,464] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,499] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,556] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,580] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,606] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,643] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,669] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,697] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,732] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,768] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,823] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,851] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,882] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,913] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,949] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:26,980] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,009] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,037] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,074] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,098] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,140] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,180] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,215] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,256] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,290] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,331] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,357] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,397] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,432] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,498] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,530] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,591] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,621] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,657] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,693] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,723] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,753] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,852] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,893] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,929] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,963] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:27,996] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,019] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,055] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,091] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,118] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,144] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,170] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,188] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,214] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,252] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,277] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,307] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,352] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,410] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,443] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,471] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,497] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,519] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,551] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,595] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,613] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,637] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,659] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,688] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,717] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,741] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,767] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,793] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,818] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,853] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,873] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,906] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,968] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:28,992] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,005] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,109] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,134] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,159] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,182] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,209] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,228] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,258] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,286] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,312] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,359] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,375] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,398] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,452] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,480] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,537] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,566] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,593] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,625] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,654] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,700] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,742] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,772] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,796] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,826] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,830] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,879] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,905] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,937] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,962] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:29,985] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,008] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,053] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,077] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,100] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,145] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,165] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,193] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,227] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,251] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,299] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,342] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,367] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,386] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,423] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,447] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,477] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,524] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,549] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,577] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,605] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,677] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,702] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,745] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,774] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,798] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,841] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,867] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,898] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,952] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,983] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:30,990] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,011] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,027] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,065] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,104] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,143] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,156] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,167] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,194] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,218] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,247] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,268] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,294] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,319] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,340] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,365] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,395] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,422] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,455] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,478] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,506] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,538] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,560] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,585] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,608] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,661] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,689] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,715] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,739] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,754] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,806] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,833] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,857] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,881] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,892] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,920] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:31,988] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,003] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,043] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,080] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,102] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,157] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,186] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,208] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,239] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,288] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,317] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,339] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,380] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,406] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,436] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,461] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,495] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,526] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,558] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,592] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,617] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,642] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,678] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,743] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,778] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,809] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,840] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,872] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,902] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,951] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:32,977] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,000] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,039] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,072] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,124] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,148] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,221] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,274] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,305] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,338] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,374] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,404] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,445] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,466] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,491] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,521] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,554] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,583] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,617] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,646] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,705] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,751] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,780] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,815] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,856] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,892] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,926] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,955] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:33,982] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,013] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,082] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,147] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,172] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,197] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,243] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,282] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,321] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,345] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,378] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,412] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,444] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,472] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,505] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,542] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,575] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,610] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,641] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,668] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,701] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,747] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,783] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,804] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,844] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,874] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,908] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,938] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,971] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:34,994] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,014] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,052] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,076] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,090] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,108] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,138] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,168] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,185] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,204] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,263] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,287] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,309] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
[2026-04-15 15:02:35,347] [WARNING] [axolotl.prompt_strategies.chat_template._tokenize_single_prompt:489] [PID:2851] Last turn is not trainable, skipping having to find the turn indices. This may cause incorrect last EOT/EOS token to be unmasked.This is likely a dataset design issue. Please ensure last turn is trainable.
Tokenizing Prompts (num_proc=24): 100%|█████████████████████████████████████████████| 81819/81819 [01:48<00:00, 98.81 examples/s]Tokenizing Prompts (num_proc=24): 100%|████████████████████████████████████████████| 81819/81819 [01:48<00:00, 756.04 examples/s]
[2026-04-15 15:02:36,104] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:218] [PID:2788] min_input_len: 262
[2026-04-15 15:02:36,114] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:220] [PID:2788] max_input_len: 4669
Dropping Long Sequences (>2048) (num_proc=24):   0%|                                            | 0/81819 [00:00<?, ? examples/s]Dropping Long Sequences (>2048) (num_proc=24):   1%|▍                              | 1000/81819 [00:00<00:41, 1938.04 examples/s]Dropping Long Sequences (>2048) (num_proc=24):  23%|██████▋                      | 19000/81819 [00:00<00:01, 35010.37 examples/s]Dropping Long Sequences (>2048) (num_proc=24):  35%|██████████▎                  | 29000/81819 [00:00<00:01, 48529.89 examples/s]Dropping Long Sequences (>2048) (num_proc=24):  51%|██████████████▉              | 42000/81819 [00:00<00:00, 62786.53 examples/s]Dropping Long Sequences (>2048) (num_proc=24):  62%|██████████████████           | 51000/81819 [00:01<00:00, 68171.50 examples/s]Dropping Long Sequences (>2048) (num_proc=24):  81%|███████████████████████▌     | 66637/81819 [00:01<00:00, 90094.56 examples/s]Dropping Long Sequences (>2048) (num_proc=24):  96%|███████████████████████████▊ | 78365/81819 [00:01<00:00, 87181.77 examples/s]Dropping Long Sequences (>2048) (num_proc=24): 100%|█████████████████████████████| 81819/81819 [00:01<00:00, 54016.24 examples/s]
[2026-04-15 15:02:37,679] [WARNING] [axolotl.utils.data.utils.handle_long_seq_in_dataset:260] [PID:2788] Dropped 1351 samples from dataset
Saving the dataset (0/24 shards):   0%|                                                         | 0/80468 [00:00<?, ? examples/s]Saving the dataset (0/24 shards):   2%|█                                           | 2000/80468 [00:00<00:10, 7146.16 examples/s]Saving the dataset (1/24 shards):  19%|████████▏                                  | 15353/80468 [00:00<00:09, 7146.16 examples/s]Saving the dataset (2/24 shards):  24%|██████████▌                                | 19706/80468 [00:00<00:08, 7146.16 examples/s]Saving the dataset (3/24 shards):  27%|███████████▊                               | 22059/80468 [00:00<00:08, 7146.16 examples/s]Saving the dataset (4/24 shards):  29%|████████████▌                              | 23412/80468 [00:00<00:07, 7146.16 examples/s]Saving the dataset (5/24 shards):  41%|█████████████████▌                         | 32765/80468 [00:00<00:06, 7146.16 examples/s]Saving the dataset (6/24 shards):  42%|██████████████████▏                        | 34118/80468 [00:00<00:06, 7146.16 examples/s]Saving the dataset (7/24 shards):  44%|██████████████████▉                        | 35471/80468 [00:00<00:06, 7146.16 examples/s]Saving the dataset (8/24 shards):  45%|███████████████████▏                       | 35824/80468 [00:00<00:06, 7146.16 examples/s]Saving the dataset (9/24 shards):  46%|███████████████████▊                       | 37177/80468 [00:00<00:06, 7146.16 examples/s]Saving the dataset (10/24 shards):  47%|███████████████████▌                      | 37530/80468 [00:00<00:06, 7146.16 examples/s]Saving the dataset (11/24 shards):  52%|█████████████████████▊                    | 41883/80468 [00:00<00:05, 7146.16 examples/s]Saving the dataset (12/24 shards):  56%|███████████████████████▌                  | 45236/80468 [00:00<00:04, 7146.16 examples/s]Saving the dataset (13/24 shards):  57%|███████████████████████▊                  | 45589/80468 [00:00<00:04, 7146.16 examples/s]Saving the dataset (14/24 shards):  63%|██████████████████████████▌               | 50942/80468 [00:00<00:04, 7146.16 examples/s]Saving the dataset (15/24 shards):  75%|███████████████████████████████▍          | 60295/80468 [00:00<00:02, 7146.16 examples/s]Saving the dataset (16/24 shards):  80%|█████████████████████████████████▋        | 64648/80468 [00:00<00:02, 7146.16 examples/s]Saving the dataset (17/24 shards):  82%|██████████████████████████████████▍       | 66001/80468 [00:00<00:02, 7146.16 examples/s]Saving the dataset (17/24 shards):  86%|██████████████████████████████████▎     | 69001/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (18/24 shards):  87%|██████████████████████████████████▉     | 70354/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (19/24 shards):  90%|███████████████████████████████████▊    | 72060/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (20/24 shards):  90%|███████████████████████████████████▊    | 72060/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (21/24 shards):  90%|███████████████████████████████████▉    | 72412/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (22/24 shards):  97%|██████████████████████████████████████▋ | 77764/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (23/24 shards):  98%|███████████████████████████████████████▎| 79116/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (24/24 shards): 100%|████████████████████████████████████████| 80468/80468 [00:00<00:00, 230370.55 examples/s]Saving the dataset (24/24 shards): 100%|████████████████████████████████████████| 80468/80468 [00:00<00:00, 166494.14 examples/s]
[2026-04-15 15:02:38,765] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:404] [PID:2788] total_num_tokens: 63_958_727
[2026-04-15 15:02:39,351] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:422] [PID:2788] `total_supervised_tokens: 8_437_411`
[2026-04-15 15:02:39,351] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:520] [PID:2788] total_num_steps: 20117
[2026-04-15 15:02:39,352] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:2788] Maximum number of steps set at 20117
[2026-04-15 15:02:39,378] [DEBUG] [axolotl.train.setup_model_and_tokenizer:65] [PID:2788] Loading tokenizer... Qwen/Qwen2.5-Coder-3B-Instruct
[2026-04-15 15:02:39,709] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:278] [PID:2788] EOS: 151645 / <|im_end|>
[2026-04-15 15:02:39,713] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:279] [PID:2788] BOS: None / None
[2026-04-15 15:02:39,714] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:280] [PID:2788] PAD: 151643 / <|endoftext|>
[2026-04-15 15:02:39,714] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:2788] UNK: None / None
[2026-04-15 15:02:39,714] [DEBUG] [axolotl.train.setup_model_and_tokenizer:74] [PID:2788] Loading model
[2026-04-15 15:02:39,766] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:2788] Patched Trainer.evaluation_loop with nanmean loss calculation
[2026-04-15 15:02:39,767] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:2788] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation
model.safetensors.index.json: 0.00B [00:00, ?B/s]model.safetensors.index.json: 35.6kB [00:00, 170MB/s]
model-00001-of-00002.safetensors:   0%|                                                              | 0.00/4.96G [00:00<?, ?B/s]model-00001-of-00002.safetensors:   0%|▏                                                    | 19.0M/4.96G [00:02<08:46, 9.38MB/s]model-00001-of-00002.safetensors:   3%|█▊                                                    | 161M/4.96G [00:02<01:07, 71.0MB/s]model-00001-of-00002.safetensors:   9%|█████▏                                                 | 470M/4.96G [00:03<00:22, 196MB/s]model-00001-of-00002.safetensors:  14%|███████▋                                               | 698M/4.96G [00:03<00:14, 289MB/s]model-00001-of-00002.safetensors:  15%|████████▍                                              | 765M/4.96G [00:03<00:13, 304MB/s]model-00001-of-00002.safetensors:  18%|█████████▉                                             | 899M/4.96G [00:04<00:11, 351MB/s]model-00001-of-00002.safetensors:  19%|██████████▋                                            | 966M/4.96G [00:04<00:10, 377MB/s]model-00001-of-00002.safetensors:  21%|███████████▎                                          | 1.03G/4.96G [00:04<00:09, 409MB/s]model-00001-of-00002.safetensors:  22%|███████████▉                                          | 1.10G/4.96G [00:04<00:09, 401MB/s]model-00001-of-00002.safetensors:  24%|████████████▋                                         | 1.17G/4.96G [00:04<00:11, 331MB/s]model-00001-of-00002.safetensors:  24%|█████████████▏                                        | 1.21G/4.96G [00:05<00:13, 280MB/s]model-00001-of-00002.safetensors:  26%|█████████████▉                                        | 1.28G/4.96G [00:05<00:12, 303MB/s]model-00001-of-00002.safetensors:  27%|██████████████▋                                       | 1.34G/4.96G [00:05<00:11, 317MB/s]model-00001-of-00002.safetensors:  28%|███████████████▎                                      | 1.41G/4.96G [00:05<00:09, 359MB/s]model-00001-of-00002.safetensors:  30%|████████████████                                      | 1.48G/4.96G [00:05<00:10, 325MB/s]model-00001-of-00002.safetensors:  31%|████████████████▊                                     | 1.54G/4.96G [00:06<00:11, 297MB/s]model-00001-of-00002.safetensors:  32%|█████████████████▌                                    | 1.61G/4.96G [00:06<00:09, 336MB/s]model-00001-of-00002.safetensors:  34%|██████████████████▎                                   | 1.68G/4.96G [00:06<00:11, 280MB/s]model-00001-of-00002.safetensors:  35%|███████████████████                                   | 1.75G/4.96G [00:06<00:09, 322MB/s]model-00001-of-00002.safetensors:  37%|███████████████████▋                                  | 1.81G/4.96G [00:06<00:09, 348MB/s]model-00001-of-00002.safetensors:  38%|████████████████████▍                                 | 1.88G/4.96G [00:07<00:08, 361MB/s]model-00001-of-00002.safetensors:  39%|█████████████████████▏                                | 1.95G/4.96G [00:07<00:08, 373MB/s]model-00001-of-00002.safetensors:  41%|█████████████████████▉                                | 2.01G/4.96G [00:07<00:07, 414MB/s]model-00001-of-00002.safetensors:  42%|██████████████████████▋                               | 2.08G/4.96G [00:07<00:06, 447MB/s]model-00001-of-00002.safetensors:  45%|████████████████████████▏                             | 2.22G/4.96G [00:07<00:04, 574MB/s]model-00001-of-00002.safetensors:  46%|████████████████████████▊                             | 2.28G/4.96G [00:07<00:05, 470MB/s]model-00001-of-00002.safetensors:  47%|█████████████████████████▌                            | 2.35G/4.96G [00:07<00:05, 480MB/s]model-00001-of-00002.safetensors:  49%|██████████████████████████▎                           | 2.42G/4.96G [00:08<00:05, 490MB/s]model-00001-of-00002.safetensors:  50%|███████████████████████████                           | 2.48G/4.96G [00:08<00:04, 505MB/s]model-00001-of-00002.safetensors:  51%|███████████████████████████▊                          | 2.55G/4.96G [00:08<00:04, 485MB/s]model-00001-of-00002.safetensors:  53%|████████████████████████████▌                         | 2.62G/4.96G [00:08<00:04, 503MB/s]model-00001-of-00002.safetensors:  54%|█████████████████████████████▏                        | 2.68G/4.96G [00:08<00:04, 512MB/s]model-00001-of-00002.safetensors:  56%|█████████████████████████████▉                        | 2.75G/4.96G [00:08<00:04, 445MB/s]model-00001-of-00002.safetensors:  57%|██████████████████████████████▋                       | 2.82G/4.96G [00:08<00:05, 427MB/s]model-00001-of-00002.safetensors:  58%|███████████████████████████████▍                      | 2.89G/4.96G [00:09<00:05, 388MB/s]model-00001-of-00002.safetensors:  60%|████████████████████████████████▏                     | 2.95G/4.96G [00:09<00:04, 428MB/s]model-00001-of-00002.safetensors:  61%|████████████████████████████████▉                     | 3.02G/4.96G [00:09<00:04, 427MB/s]model-00001-of-00002.safetensors:  62%|█████████████████████████████████▋                    | 3.09G/4.96G [00:09<00:04, 417MB/s]model-00001-of-00002.safetensors:  64%|██████████████████████████████████▎                   | 3.15G/4.96G [00:09<00:04, 445MB/s]model-00001-of-00002.safetensors:  65%|███████████████████████████████████                   | 3.22G/4.96G [00:09<00:03, 450MB/s]model-00001-of-00002.safetensors:  67%|████████████████████████████████████▎                 | 3.33G/4.96G [00:10<00:04, 385MB/s]model-00001-of-00002.safetensors:  69%|█████████████████████████████████████                 | 3.40G/4.96G [00:10<00:03, 395MB/s]model-00001-of-00002.safetensors:  70%|█████████████████████████████████████▋                | 3.47G/4.96G [00:10<00:03, 415MB/s]model-00001-of-00002.safetensors:  71%|██████████████████████████████████████▍               | 3.53G/4.96G [00:10<00:03, 418MB/s]model-00001-of-00002.safetensors:  73%|███████████████████████████████████████▏              | 3.60G/4.96G [00:10<00:03, 422MB/s]model-00001-of-00002.safetensors:  74%|████████████████████████████████████████▏             | 3.69G/4.96G [00:11<00:02, 469MB/s]model-00001-of-00002.safetensors:  77%|█████████████████████████████████████████▋            | 3.83G/4.96G [00:11<00:02, 495MB/s]model-00001-of-00002.safetensors:  79%|██████████████████████████████████████████▍           | 3.89G/4.96G [00:11<00:02, 493MB/s]model-00001-of-00002.safetensors:  80%|███████████████████████████████████████████▍          | 3.98G/4.96G [00:11<00:02, 456MB/s]model-00001-of-00002.safetensors:  82%|████████████████████████████████████████████          | 4.05G/4.96G [00:11<00:01, 482MB/s]model-00001-of-00002.safetensors:  83%|████████████████████████████████████████████▊         | 4.12G/4.96G [00:11<00:01, 496MB/s]model-00001-of-00002.safetensors:  84%|█████████████████████████████████████████████▌        | 4.18G/4.96G [00:12<00:01, 497MB/s]model-00001-of-00002.safetensors:  86%|██████████████████████████████████████████████▎       | 4.25G/4.96G [00:12<00:01, 501MB/s]model-00001-of-00002.safetensors:  87%|███████████████████████████████████████████████       | 4.32G/4.96G [00:12<00:01, 474MB/s]model-00001-of-00002.safetensors:  88%|███████████████████████████████████████████████▊      | 4.39G/4.96G [00:12<00:01, 478MB/s]model-00001-of-00002.safetensors:  90%|████████████████████████████████████████████████▌     | 4.45G/4.96G [00:12<00:00, 520MB/s]model-00001-of-00002.safetensors:  92%|█████████████████████████████████████████████████▌    | 4.56G/4.96G [00:12<00:00, 547MB/s]model-00001-of-00002.safetensors:  93%|██████████████████████████████████████████████████▎   | 4.62G/4.96G [00:12<00:00, 450MB/s]model-00001-of-00002.safetensors:  95%|███████████████████████████████████████████████████   | 4.69G/4.96G [00:13<00:00, 465MB/s]model-00001-of-00002.safetensors:  97%|████████████████████████████████████████████████████▌ | 4.82G/4.96G [00:13<00:00, 519MB/s]model-00001-of-00002.safetensors:  99%|█████████████████████████████████████████████████████▎| 4.89G/4.96G [00:13<00:00, 503MB/s]model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████| 4.96G/4.96G [00:13<00:00, 499MB/s]model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████| 4.96G/4.96G [00:13<00:00, 365MB/s]
model-00002-of-00002.safetensors:   0%|                                                              | 0.00/1.21G [00:00<?, ?B/s]model-00002-of-00002.safetensors:   0%|                                                      | 1.07M/1.21G [00:01<27:11, 744kB/s]model-00002-of-00002.safetensors:   6%|██▉                                                  | 68.1M/1.21G [00:01<00:24, 47.4MB/s]model-00002-of-00002.safetensors:  11%|██████                                                | 135M/1.21G [00:02<00:13, 82.2MB/s]model-00002-of-00002.safetensors:  17%|█████████▏                                             | 202M/1.21G [00:02<00:08, 114MB/s]model-00002-of-00002.safetensors:  22%|████████████▏                                          | 269M/1.21G [00:02<00:06, 146MB/s]model-00002-of-00002.safetensors:  28%|███████████████▏                                       | 336M/1.21G [00:03<00:04, 177MB/s]model-00002-of-00002.safetensors:  33%|██████████████████▎                                    | 403M/1.21G [00:03<00:04, 195MB/s]model-00002-of-00002.safetensors:  39%|█████████████████████▎                                 | 470M/1.21G [00:03<00:03, 222MB/s]model-00002-of-00002.safetensors:  44%|████████████████████████▎                              | 537M/1.21G [00:03<00:02, 246MB/s]model-00002-of-00002.safetensors:  50%|███████████████████████████▍                           | 604M/1.21G [00:04<00:02, 214MB/s]model-00002-of-00002.safetensors:  55%|██████████████████████████████▍                        | 672M/1.21G [00:04<00:02, 214MB/s]model-00002-of-00002.safetensors:  61%|█████████████████████████████████▍                     | 739M/1.21G [00:04<00:02, 217MB/s]model-00002-of-00002.safetensors:  66%|████████████████████████████████████▍                  | 806M/1.21G [00:04<00:01, 258MB/s]model-00002-of-00002.safetensors:  72%|███████████████████████████████████████▌               | 873M/1.21G [00:05<00:01, 274MB/s]model-00002-of-00002.safetensors:  77%|██████████████████████████████████████████▌            | 940M/1.21G [00:05<00:00, 292MB/s]model-00002-of-00002.safetensors:  83%|████████████████████████████████████████████▊         | 1.01G/1.21G [00:05<00:00, 268MB/s]model-00002-of-00002.safetensors:  89%|████████████████████████████████████████████████      | 1.08G/1.21G [00:05<00:00, 260MB/s]model-00002-of-00002.safetensors:  94%|███████████████████████████████████████████████████   | 1.15G/1.21G [00:06<00:00, 303MB/s]model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████| 1.21G/1.21G [00:06<00:00, 303MB/s]model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████| 1.21G/1.21G [00:06<00:00, 193MB/s]
Loading checkpoint shards:   0%|                                                                           | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:  50%|█████████████████████████████████▌                                 | 1/2 [00:06<00:06,  6.64s/it]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  3.79s/it]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.22s/it]
generation_config.json:   0%|                                                                          | 0.00/243 [00:00<?, ?B/s]generation_config.json: 100%|███████████████████████████████████████████████████████████████████| 243/243 [00:00<00:00, 3.13MB/s]
[2026-04-15 15:03:09,250] [INFO] [axolotl.loaders.model._prepare_model_for_quantization:863] [PID:2788] converting PEFT model w/ prepare_model_for_kbit_training
[2026-04-15 15:03:09,253] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:345] [PID:2788] Converting modules to torch.bfloat16
[2026-04-15 15:03:09,255] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:2788] Memory usage after model load 4.423GB (+4.423GB allocated, +4.574GB reserved)
trainable params: 29,933,568 || all params: 3,115,872,256 || trainable%: 0.9607
[2026-04-15 15:03:09,834] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:2788] after adapters 3.348GB (+3.348GB allocated, +4.686GB reserved)
[2026-04-15 15:03:14,328] [INFO] [axolotl.train.save_initial_configs:398] [PID:2788] Pre-saving adapter config to ./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent...
[2026-04-15 15:03:14,332] [INFO] [axolotl.train.save_initial_configs:402] [PID:2788] Pre-saving tokenizer to ./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent...
[2026-04-15 15:03:14,571] [INFO] [axolotl.train.save_initial_configs:407] [PID:2788] Pre-saving model config to ./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent...
[2026-04-15 15:03:14,576] [INFO] [axolotl.train.execute_training:196] [PID:2788] Starting trainer...
  0%|                                                                                                  | 0/20117 [00:00<?, ?it/s][2026-04-15 15:03:15,564] [WARNING] [py.warnings._showwarnmsg:110] [PID:2788] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:186: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")

  0%|                                                                                       | 1/20117 [00:03<20:06:40,  3.60s/it]  0%|                                                                                       | 2/20117 [00:05<15:53:34,  2.84s/it]  0%|                                                                                       | 3/20117 [00:08<14:45:31,  2.64s/it]  0%|                                                                                       | 4/20117 [00:10<13:51:30,  2.48s/it]  0%|                                                                                       | 5/20117 [00:12<13:16:54,  2.38s/it]  0%|                                                                                       | 6/20117 [00:14<12:59:47,  2.33s/it]  0%|                                                                                       | 7/20117 [00:17<12:44:14,  2.28s/it]  0%|                                                                                       | 8/20117 [00:19<12:32:47,  2.25s/it]  0%|                                                                                       | 9/20117 [00:21<12:28:48,  2.23s/it]  0%|                                                                                      | 10/20117 [00:23<12:17:36,  2.20s/it]                                                                                                                                 {'loss': 0.604, 'grad_norm': 1.0385397672653198, 'learning_rate': 1.8e-05, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 21.62, 'tokens_per_second_per_gpu': 347.46, 'epoch': 0.0}
  0%|                                                                                      | 10/20117 [00:23<12:17:36,  2.20s/it]  0%|                                                                                      | 11/20117 [00:25<12:14:36,  2.19s/it]  0%|                                                                                      | 12/20117 [00:28<12:20:16,  2.21s/it]  0%|                                                                                      | 13/20117 [00:30<12:16:51,  2.20s/it]  0%|                                                                                      | 14/20117 [00:32<12:16:05,  2.20s/it]  0%|                                                                                      | 15/20117 [00:34<12:12:56,  2.19s/it]  0%|                                                                                      | 16/20117 [00:36<12:15:59,  2.20s/it]  0%|                                                                                      | 17/20117 [00:39<12:16:13,  2.20s/it]  0%|                                                                                      | 18/20117 [00:41<12:18:05,  2.20s/it]  0%|                                                                                      | 19/20117 [00:43<12:18:05,  2.20s/it]  0%|                                                                                      | 20/20117 [00:45<12:14:31,  2.19s/it]                                                                                                                                 {'loss': 0.4244, 'grad_norm': 0.556696891784668, 'learning_rate': 3.8e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 21.62, 'tokens_per_second_per_gpu': 318.15, 'epoch': 0.0}
  0%|                                                                                      | 20/20117 [00:45<12:14:31,  2.19s/it]  0%|                                                                                      | 21/20117 [00:47<12:10:45,  2.18s/it]  0%|                                                                                      | 22/20117 [00:49<12:09:57,  2.18s/it]  0%|                                                                                      | 23/20117 [00:52<12:10:55,  2.18s/it]  0%|                                                                                      | 24/20117 [00:54<12:10:11,  2.18s/it]  0%|                                                                                      | 25/20117 [00:56<12:12:29,  2.19s/it]  0%|                                                                                      | 26/20117 [00:58<12:13:00,  2.19s/it]  0%|                                                                                      | 27/20117 [01:00<12:15:05,  2.20s/it]  0%|                                                                                      | 28/20117 [01:03<12:12:43,  2.19s/it]  0%|                                                                                      | 29/20117 [01:05<12:08:48,  2.18s/it]  0%|▏                                                                                     | 30/20117 [01:07<12:10:48,  2.18s/it]                                                                                                                                 {'loss': 0.3883, 'grad_norm': 0.24665255844593048, 'learning_rate': 5.8e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 21.62, 'tokens_per_second_per_gpu': 330.79, 'epoch': 0.0}
  0%|▏                                                                                     | 30/20117 [01:07<12:10:48,  2.18s/it]  0%|▏                                                                                     | 31/20117 [01:09<12:10:04,  2.18s/it]  0%|▏                                                                                     | 32/20117 [01:11<12:10:30,  2.18s/it]  0%|▏                                                                                     | 33/20117 [01:13<12:05:37,  2.17s/it]  0%|▏                                                                                     | 34/20117 [01:16<12:10:50,  2.18s/it]  0%|▏                                                                                     | 35/20117 [01:18<12:09:44,  2.18s/it]  0%|▏                                                                                     | 36/20117 [01:20<12:09:48,  2.18s/it]  0%|▏                                                                                     | 37/20117 [01:22<12:11:46,  2.19s/it]  0%|▏                                                                                     | 38/20117 [01:24<12:08:13,  2.18s/it]  0%|▏                                                                                     | 39/20117 [01:27<12:21:00,  2.21s/it]  0%|▏                                                                                     | 40/20117 [01:29<12:21:24,  2.22s/it]                                                                                                                                 {'loss': 0.4163, 'grad_norm': 0.3350813090801239, 'learning_rate': 7.800000000000001e-05, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.38, 'tokens_per_second_per_gpu': 396.14, 'epoch': 0.0}
  0%|▏                                                                                     | 40/20117 [01:29<12:21:24,  2.22s/it]  0%|▏                                                                                     | 41/20117 [01:31<12:18:01,  2.21s/it]  0%|▏                                                                                     | 42/20117 [01:33<12:16:32,  2.20s/it]  0%|▏                                                                                     | 43/20117 [01:36<12:22:02,  2.22s/it]  0%|▏                                                                                     | 44/20117 [01:38<12:19:59,  2.21s/it]  0%|▏                                                                                     | 45/20117 [01:40<12:33:31,  2.25s/it]  0%|▏                                                                                     | 46/20117 [01:42<12:30:14,  2.24s/it]  0%|▏                                                                                     | 47/20117 [01:44<12:23:31,  2.22s/it]  0%|▏                                                                                     | 48/20117 [01:47<12:18:37,  2.21s/it]  0%|▏                                                                                     | 49/20117 [01:49<12:19:36,  2.21s/it]  0%|▏                                                                                     | 50/20117 [01:51<12:16:25,  2.20s/it]                                                                                                                                 {'loss': 0.3811, 'grad_norm': 0.42506587505340576, 'learning_rate': 9.8e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.42, 'tokens_per_second_per_gpu': 402.54, 'epoch': 0.0}
  0%|▏                                                                                     | 50/20117 [01:51<12:16:25,  2.20s/it]  0%|▏                                                                                     | 51/20117 [01:53<12:14:06,  2.20s/it]  0%|▏                                                                                     | 52/20117 [01:55<12:16:20,  2.20s/it]  0%|▏                                                                                     | 53/20117 [01:58<12:49:03,  2.30s/it]  0%|▏                                                                                     | 54/20117 [02:00<12:43:39,  2.28s/it]  0%|▏                                                                                     | 55/20117 [02:02<12:32:07,  2.25s/it]  0%|▏                                                                                     | 56/20117 [02:05<12:24:46,  2.23s/it]  0%|▏                                                                                     | 57/20117 [02:07<12:19:02,  2.21s/it]  0%|▏                                                                                     | 58/20117 [02:09<12:21:44,  2.22s/it]  0%|▎                                                                                     | 59/20117 [02:11<12:19:51,  2.21s/it]  0%|▎                                                                                     | 60/20117 [02:13<12:15:24,  2.20s/it]                                                                                                                                 {'loss': 0.418, 'grad_norm': 0.5153183937072754, 'learning_rate': 0.000118, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 330.44, 'epoch': 0.01}
  0%|▎                                                                                     | 60/20117 [02:13<12:15:24,  2.20s/it]  0%|▎                                                                                     | 61/20117 [02:15<12:11:34,  2.19s/it]  0%|▎                                                                                     | 62/20117 [02:18<12:09:31,  2.18s/it]  0%|▎                                                                                     | 63/20117 [02:20<12:16:27,  2.20s/it]  0%|▎                                                                                     | 64/20117 [02:22<12:16:48,  2.20s/it]  0%|▎                                                                                     | 65/20117 [02:24<12:15:26,  2.20s/it]  0%|▎                                                                                     | 66/20117 [02:27<12:22:33,  2.22s/it]  0%|▎                                                                                     | 67/20117 [02:29<12:20:29,  2.22s/it]  0%|▎                                                                                     | 68/20117 [02:31<12:19:01,  2.21s/it]  0%|▎                                                                                     | 69/20117 [02:33<12:18:35,  2.21s/it]  0%|▎                                                                                     | 70/20117 [02:35<12:17:39,  2.21s/it]                                                                                                                                 {'loss': 0.3671, 'grad_norm': 0.3010534644126892, 'learning_rate': 0.000138, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 344.47, 'epoch': 0.01}
  0%|▎                                                                                     | 70/20117 [02:35<12:17:39,  2.21s/it]  0%|▎                                                                                     | 71/20117 [02:38<12:19:37,  2.21s/it]  0%|▎                                                                                     | 72/20117 [02:40<12:14:17,  2.20s/it]  0%|▎                                                                                     | 73/20117 [02:42<12:13:44,  2.20s/it]  0%|▎                                                                                     | 74/20117 [02:44<12:11:25,  2.19s/it]  0%|▎                                                                                     | 75/20117 [02:46<12:17:20,  2.21s/it]  0%|▎                                                                                     | 76/20117 [02:49<12:13:36,  2.20s/it]  0%|▎                                                                                     | 77/20117 [02:51<12:19:32,  2.21s/it]  0%|▎                                                                                     | 78/20117 [02:53<12:33:31,  2.26s/it]  0%|▎                                                                                     | 79/20117 [02:56<12:40:38,  2.28s/it]  0%|▎                                                                                     | 80/20117 [02:58<12:36:33,  2.27s/it]                                                                                                                                 {'loss': 0.3387, 'grad_norm': 0.46113327145576477, 'learning_rate': 0.00015800000000000002, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 372.27, 'epoch': 0.01}
  0%|▎                                                                                     | 80/20117 [02:58<12:36:33,  2.27s/it]  0%|▎                                                                                     | 81/20117 [03:00<12:39:33,  2.27s/it]  0%|▎                                                                                     | 82/20117 [03:02<12:27:59,  2.24s/it]  0%|▎                                                                                     | 83/20117 [03:04<12:24:55,  2.23s/it]  0%|▎                                                                                     | 84/20117 [03:07<12:29:43,  2.25s/it]  0%|▎                                                                                     | 85/20117 [03:09<12:20:50,  2.22s/it]  0%|▎                                                                                     | 86/20117 [03:11<12:20:20,  2.22s/it]  0%|▎                                                                                     | 87/20117 [03:13<12:14:31,  2.20s/it]  0%|▍                                                                                     | 88/20117 [03:15<12:12:52,  2.20s/it]  0%|▍                                                                                     | 89/20117 [03:18<12:07:41,  2.18s/it]  0%|▍                                                                                     | 90/20117 [03:20<12:07:20,  2.18s/it]                                                                                                                                 {'loss': 0.2999, 'grad_norm': 0.4268002212047577, 'learning_rate': 0.00017800000000000002, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 344.49, 'epoch': 0.01}
  0%|▍                                                                                     | 90/20117 [03:20<12:07:20,  2.18s/it]  0%|▍                                                                                     | 91/20117 [03:22<12:16:43,  2.21s/it]  0%|▍                                                                                     | 92/20117 [03:24<12:14:56,  2.20s/it]  0%|▍                                                                                     | 93/20117 [03:26<12:15:26,  2.20s/it]  0%|▍                                                                                     | 94/20117 [03:29<12:13:44,  2.20s/it]  0%|▍                                                                                     | 95/20117 [03:31<12:12:02,  2.19s/it]  0%|▍                                                                                     | 96/20117 [03:33<12:13:21,  2.20s/it]  0%|▍                                                                                     | 97/20117 [03:35<12:11:02,  2.19s/it]  0%|▍                                                                                     | 98/20117 [03:37<12:06:47,  2.18s/it]  0%|▍                                                                                     | 99/20117 [03:39<12:04:03,  2.17s/it]  0%|▍                                                                                    | 100/20117 [03:42<12:03:09,  2.17s/it]                                                                                                                                 {'loss': 0.3356, 'grad_norm': 0.5650917291641235, 'learning_rate': 0.00019800000000000002, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 335.57, 'epoch': 0.01}
  0%|▍                                                                                    | 100/20117 [03:42<12:03:09,  2.17s/it]  1%|▍                                                                                    | 101/20117 [03:44<12:10:08,  2.19s/it]  1%|▍                                                                                    | 102/20117 [03:46<12:12:08,  2.19s/it]  1%|▍                                                                                    | 103/20117 [03:48<12:11:54,  2.19s/it]  1%|▍                                                                                    | 104/20117 [03:50<12:09:23,  2.19s/it]  1%|▍                                                                                    | 105/20117 [03:53<12:06:01,  2.18s/it]  1%|▍                                                                                    | 106/20117 [03:55<12:35:29,  2.27s/it]  1%|▍                                                                                    | 107/20117 [03:57<12:34:35,  2.26s/it]  1%|▍                                                                                    | 108/20117 [03:59<12:23:21,  2.23s/it]  1%|▍                                                                                    | 109/20117 [04:02<12:21:06,  2.22s/it]  1%|▍                                                                                    | 110/20117 [04:04<12:22:41,  2.23s/it]                                                                                                                                 {'loss': 0.3025, 'grad_norm': 0.2521424889564514, 'learning_rate': 0.00019999990023993625, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 307.78, 'epoch': 0.01}
  1%|▍                                                                                    | 110/20117 [04:04<12:22:41,  2.23s/it]  1%|▍                                                                                    | 111/20117 [04:06<12:19:02,  2.22s/it]  1%|▍                                                                                    | 112/20117 [04:08<12:17:13,  2.21s/it]  1%|▍                                                                                    | 113/20117 [04:11<12:27:22,  2.24s/it]  1%|▍                                                                                    | 114/20117 [04:13<12:21:44,  2.22s/it]  1%|▍                                                                                    | 115/20117 [04:15<12:14:33,  2.20s/it]  1%|▍                                                                                    | 116/20117 [04:17<12:14:20,  2.20s/it]  1%|▍                                                                                    | 117/20117 [04:19<12:09:59,  2.19s/it]  1%|▍                                                                                    | 118/20117 [04:22<12:14:28,  2.20s/it]  1%|▌                                                                                    | 119/20117 [04:24<12:18:00,  2.21s/it]  1%|▌                                                                                    | 120/20117 [04:26<12:11:35,  2.20s/it]                                                                                                                                 {'loss': 0.351, 'grad_norm': 0.34742406010627747, 'learning_rate': 0.00019999955539058868, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 355.93, 'epoch': 0.01}
  1%|▌                                                                                    | 120/20117 [04:26<12:11:35,  2.20s/it]  1%|▌                                                                                    | 121/20117 [04:28<12:13:32,  2.20s/it]  1%|▌                                                                                    | 122/20117 [04:30<12:11:45,  2.20s/it]  1%|▌                                                                                    | 123/20117 [04:33<12:11:57,  2.20s/it]  1%|▌                                                                                    | 124/20117 [04:35<12:15:55,  2.21s/it]  1%|▌                                                                                    | 125/20117 [04:37<12:08:57,  2.19s/it]  1%|▌                                                                                    | 126/20117 [04:39<12:10:11,  2.19s/it]  1%|▌                                                                                    | 127/20117 [04:41<12:11:33,  2.20s/it]  1%|▌                                                                                    | 128/20117 [04:43<12:06:00,  2.18s/it]  1%|▌                                                                                    | 129/20117 [04:46<12:02:23,  2.17s/it]  1%|▌                                                                                    | 130/20117 [04:48<12:01:57,  2.17s/it]                                                                                                                                 {'loss': 0.4031, 'grad_norm': 0.2816642224788666, 'learning_rate': 0.00019999896422120075, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 392.82, 'epoch': 0.01}
  1%|▌                                                                                    | 130/20117 [04:48<12:01:57,  2.17s/it]  1%|▌                                                                                    | 131/20117 [04:50<12:00:42,  2.16s/it]  1%|▌                                                                                    | 132/20117 [04:52<12:11:57,  2.20s/it]  1%|▌                                                                                    | 133/20117 [04:54<12:07:02,  2.18s/it]  1%|▌                                                                                    | 134/20117 [04:57<12:12:50,  2.20s/it]  1%|▌                                                                                    | 135/20117 [04:59<12:13:43,  2.20s/it]  1%|▌                                                                                    | 136/20117 [05:01<12:07:57,  2.19s/it]  1%|▌                                                                                    | 137/20117 [05:03<12:08:42,  2.19s/it]  1%|▌                                                                                    | 138/20117 [05:05<12:06:45,  2.18s/it]  1%|▌                                                                                    | 139/20117 [05:08<12:12:11,  2.20s/it]  1%|▌                                                                                    | 140/20117 [05:10<12:16:06,  2.21s/it]                                                                                                                                 {'loss': 0.3481, 'grad_norm': 0.41705670952796936, 'learning_rate': 0.0001999981267332287, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 404.67, 'epoch': 0.01}
  1%|▌                                                                                    | 140/20117 [05:10<12:16:06,  2.21s/it]  1%|▌                                                                                    | 141/20117 [05:12<12:19:32,  2.22s/it]  1%|▌                                                                                    | 142/20117 [05:14<12:16:40,  2.21s/it]  1%|▌                                                                                    | 143/20117 [05:16<12:16:44,  2.21s/it]  1%|▌                                                                                    | 144/20117 [05:19<12:11:59,  2.20s/it]  1%|▌                                                                                    | 145/20117 [05:21<12:15:08,  2.21s/it]  1%|▌                                                                                    | 146/20117 [05:23<12:05:45,  2.18s/it]  1%|▌                                                                                    | 147/20117 [05:25<12:16:44,  2.21s/it]  1%|▋                                                                                    | 148/20117 [05:27<12:10:24,  2.19s/it]  1%|▋                                                                                    | 149/20117 [05:30<12:20:25,  2.22s/it]  1%|▋                                                                                    | 150/20117 [05:32<12:20:48,  2.23s/it]                                                                                                                                 {'loss': 0.3784, 'grad_norm': 0.5290879011154175, 'learning_rate': 0.00019999704292873545, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 409.59, 'epoch': 0.01}
  1%|▋                                                                                    | 150/20117 [05:32<12:20:48,  2.23s/it]  1%|▋                                                                                    | 151/20117 [05:34<12:15:10,  2.21s/it]  1%|▋                                                                                    | 152/20117 [05:36<12:14:33,  2.21s/it]  1%|▋                                                                                    | 153/20117 [05:38<12:14:19,  2.21s/it]  1%|▋                                                                                    | 154/20117 [05:41<12:06:30,  2.18s/it]  1%|▋                                                                                    | 155/20117 [05:43<12:12:07,  2.20s/it]  1%|▋                                                                                    | 156/20117 [05:45<12:10:43,  2.20s/it]  1%|▋                                                                                    | 157/20117 [05:47<12:09:33,  2.19s/it]  1%|▋                                                                                    | 158/20117 [05:50<12:34:27,  2.27s/it]  1%|▋                                                                                    | 159/20117 [05:52<12:24:59,  2.24s/it]  1%|▋                                                                                    | 160/20117 [05:54<12:24:57,  2.24s/it]                                                                                                                                 {'loss': 0.2029, 'grad_norm': 0.2704632878303528, 'learning_rate': 0.0001999957128103906, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 309.25, 'epoch': 0.02}
  1%|▋                                                                                    | 160/20117 [05:54<12:24:57,  2.24s/it]  1%|▋                                                                                    | 161/20117 [05:56<12:20:41,  2.23s/it]  1%|▋                                                                                    | 162/20117 [05:58<12:19:14,  2.22s/it]  1%|▋                                                                                    | 163/20117 [06:01<12:13:28,  2.21s/it]  1%|▋                                                                                    | 164/20117 [06:03<12:11:53,  2.20s/it]  1%|▋                                                                                    | 165/20117 [06:05<12:11:44,  2.20s/it]  1%|▋                                                                                    | 166/20117 [06:07<12:17:01,  2.22s/it]  1%|▋                                                                                    | 167/20117 [06:09<12:11:54,  2.20s/it]  1%|▋                                                                                    | 168/20117 [06:12<12:14:28,  2.21s/it]  1%|▋                                                                                    | 169/20117 [06:14<12:15:15,  2.21s/it]  1%|▋                                                                                    | 170/20117 [06:16<12:14:14,  2.21s/it]                                                                                                                                 {'loss': 0.3084, 'grad_norm': 0.3863286077976227, 'learning_rate': 0.00019999413638147049, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 395.0, 'epoch': 0.02}
  1%|▋                                                                                    | 170/20117 [06:16<12:14:14,  2.21s/it]  1%|▋                                                                                    | 171/20117 [06:18<12:15:10,  2.21s/it]  1%|▋                                                                                    | 172/20117 [06:20<12:10:42,  2.20s/it]  1%|▋                                                                                    | 173/20117 [06:23<12:05:02,  2.18s/it]  1%|▋                                                                                    | 174/20117 [06:25<12:14:58,  2.21s/it]  1%|▋                                                                                    | 175/20117 [06:27<12:09:08,  2.19s/it]  1%|▋                                                                                    | 176/20117 [06:29<12:07:01,  2.19s/it]  1%|▋                                                                                    | 177/20117 [06:31<12:03:28,  2.18s/it]  1%|▊                                                                                    | 178/20117 [06:34<12:04:10,  2.18s/it]  1%|▊                                                                                    | 179/20117 [06:36<12:10:22,  2.20s/it]  1%|▊                                                                                    | 180/20117 [06:38<12:09:49,  2.20s/it]                                                                                                                                 {'loss': 0.2713, 'grad_norm': 0.32178717851638794, 'learning_rate': 0.00019999231364585827, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 343.31, 'epoch': 0.02}
  1%|▊                                                                                    | 180/20117 [06:38<12:09:49,  2.20s/it]  1%|▊                                                                                    | 181/20117 [06:40<12:07:47,  2.19s/it]  1%|▊                                                                                    | 182/20117 [06:42<12:09:58,  2.20s/it]  1%|▊                                                                                    | 183/20117 [06:45<12:07:59,  2.19s/it]  1%|▊                                                                                    | 184/20117 [06:47<12:09:14,  2.20s/it]  1%|▊                                                                                    | 185/20117 [06:49<12:06:31,  2.19s/it]  1%|▊                                                                                    | 186/20117 [06:51<12:13:44,  2.21s/it]  1%|▊                                                                                    | 187/20117 [06:53<12:11:55,  2.20s/it]  1%|▊                                                                                    | 188/20117 [06:56<12:09:46,  2.20s/it]  1%|▊                                                                                    | 189/20117 [06:58<12:11:24,  2.20s/it]  1%|▊                                                                                    | 190/20117 [07:00<12:19:33,  2.23s/it]                                                                                                                                 {'loss': 0.353, 'grad_norm': 0.3010699450969696, 'learning_rate': 0.00019999024460804366, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 353.97, 'epoch': 0.02}
  1%|▊                                                                                    | 190/20117 [07:00<12:19:33,  2.23s/it]  1%|▊                                                                                    | 191/20117 [07:02<12:11:21,  2.20s/it]  1%|▊                                                                                    | 192/20117 [07:04<12:17:34,  2.22s/it]  1%|▊                                                                                    | 193/20117 [07:07<12:10:00,  2.20s/it]  1%|▊                                                                                    | 194/20117 [07:09<12:15:07,  2.21s/it]  1%|▊                                                                                    | 195/20117 [07:11<12:07:56,  2.19s/it]  1%|▊                                                                                    | 196/20117 [07:13<12:08:19,  2.19s/it]  1%|▊                                                                                    | 197/20117 [07:15<12:08:19,  2.19s/it]  1%|▊                                                                                    | 198/20117 [07:18<12:04:29,  2.18s/it]  1%|▊                                                                                    | 199/20117 [07:20<12:05:04,  2.18s/it]  1%|▊                                                                                    | 200/20117 [07:22<12:11:09,  2.20s/it]                                                                                                                                 {'loss': 0.1904, 'grad_norm': 0.2678498327732086, 'learning_rate': 0.00019998792927312315, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 352.72, 'epoch': 0.02}
  1%|▊                                                                                    | 200/20117 [07:22<12:11:09,  2.20s/it]  1%|▊                                                                                    | 201/20117 [07:24<12:13:59,  2.21s/it]  1%|▊                                                                                    | 202/20117 [07:26<12:14:55,  2.21s/it]  1%|▊                                                                                    | 203/20117 [07:29<12:18:51,  2.23s/it]  1%|▊                                                                                    | 204/20117 [07:31<12:09:36,  2.20s/it]  1%|▊                                                                                    | 205/20117 [07:33<12:14:33,  2.21s/it]  1%|▊                                                                                    | 206/20117 [07:35<12:05:47,  2.19s/it]  1%|▊                                                                                    | 207/20117 [07:37<12:09:49,  2.20s/it]  1%|▉                                                                                    | 208/20117 [07:40<12:12:06,  2.21s/it]  1%|▉                                                                                    | 209/20117 [07:42<12:05:04,  2.19s/it]  1%|▉                                                                                    | 210/20117 [07:44<12:34:19,  2.27s/it]                                                                                                                                 {'loss': 0.2397, 'grad_norm': 0.25298821926116943, 'learning_rate': 0.00019998536764679993, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 336.24, 'epoch': 0.02}
  1%|▉                                                                                    | 210/20117 [07:44<12:34:19,  2.27s/it]  1%|▉                                                                                    | 211/20117 [07:47<12:29:11,  2.26s/it]  1%|▉                                                                                    | 212/20117 [07:49<12:28:08,  2.26s/it]  1%|▉                                                                                    | 213/20117 [07:51<12:25:12,  2.25s/it]  1%|▉                                                                                    | 214/20117 [07:53<12:16:00,  2.22s/it]  1%|▉                                                                                    | 215/20117 [07:55<12:15:00,  2.22s/it]  1%|▉                                                                                    | 216/20117 [07:58<12:10:46,  2.20s/it]  1%|▉                                                                                    | 217/20117 [08:00<12:05:55,  2.19s/it]  1%|▉                                                                                    | 218/20117 [08:02<12:01:41,  2.18s/it]  1%|▉                                                                                    | 219/20117 [08:04<12:03:35,  2.18s/it]  1%|▉                                                                                    | 220/20117 [08:06<12:02:14,  2.18s/it]                                                                                                                                 {'loss': 0.3111, 'grad_norm': 0.4027327001094818, 'learning_rate': 0.0001999825597353838, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 388.05, 'epoch': 0.02}
  1%|▉                                                                                    | 220/20117 [08:06<12:02:14,  2.18s/it]  1%|▉                                                                                    | 221/20117 [08:08<12:00:24,  2.17s/it]  1%|▉                                                                                    | 222/20117 [08:10<11:57:44,  2.16s/it]  1%|▉                                                                                    | 223/20117 [08:13<11:58:13,  2.17s/it]  1%|▉                                                                                    | 224/20117 [08:15<12:02:56,  2.18s/it]  1%|▉                                                                                    | 225/20117 [08:17<12:08:05,  2.20s/it]  1%|▉                                                                                    | 226/20117 [08:19<12:03:51,  2.18s/it]  1%|▉                                                                                    | 227/20117 [08:21<12:04:45,  2.19s/it]  1%|▉                                                                                    | 228/20117 [08:24<12:02:18,  2.18s/it]  1%|▉                                                                                    | 229/20117 [08:26<12:01:27,  2.18s/it]  1%|▉                                                                                    | 230/20117 [08:28<12:00:52,  2.17s/it]                                                                                                                                 {'loss': 0.2578, 'grad_norm': 0.4060591757297516, 'learning_rate': 0.00019997950554579124, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 354.57, 'epoch': 0.02}
  1%|▉                                                                                    | 230/20117 [08:28<12:00:52,  2.17s/it]  1%|▉                                                                                    | 231/20117 [08:30<12:18:34,  2.23s/it]  1%|▉                                                                                    | 232/20117 [08:33<12:45:27,  2.31s/it]  1%|▉                                                                                    | 233/20117 [08:35<12:49:35,  2.32s/it]  1%|▉                                                                                    | 234/20117 [08:37<12:41:24,  2.30s/it]  1%|▉                                                                                    | 235/20117 [08:40<12:33:38,  2.27s/it]  1%|▉                                                                                    | 236/20117 [08:42<12:26:48,  2.25s/it]  1%|█                                                                                    | 237/20117 [08:44<12:24:48,  2.25s/it]  1%|█                                                                                    | 238/20117 [08:46<12:16:51,  2.22s/it]  1%|█                                                                                    | 239/20117 [08:48<12:11:25,  2.21s/it]  1%|█                                                                                    | 240/20117 [08:51<12:14:11,  2.22s/it]                                                                                                                                 {'loss': 0.2952, 'grad_norm': 0.29408156871795654, 'learning_rate': 0.00019997620508554537, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 334.85, 'epoch': 0.02}
  1%|█                                                                                    | 240/20117 [08:51<12:14:11,  2.22s/it]  1%|█                                                                                    | 241/20117 [08:53<12:14:44,  2.22s/it]  1%|█                                                                                    | 242/20117 [08:55<12:13:30,  2.21s/it]  1%|█                                                                                    | 243/20117 [08:57<12:13:22,  2.21s/it]  1%|█                                                                                    | 244/20117 [08:59<12:09:47,  2.20s/it]  1%|█                                                                                    | 245/20117 [09:02<12:10:14,  2.20s/it]  1%|█                                                                                    | 246/20117 [09:04<12:14:21,  2.22s/it]  1%|█                                                                                    | 247/20117 [09:06<12:14:33,  2.22s/it]  1%|█                                                                                    | 248/20117 [09:08<12:17:19,  2.23s/it]  1%|█                                                                                    | 249/20117 [09:11<12:13:29,  2.22s/it]  1%|█                                                                                    | 250/20117 [09:13<12:10:14,  2.21s/it]                                                                                                                                 {'loss': 0.2397, 'grad_norm': 0.381528377532959, 'learning_rate': 0.00019997265836277595, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 402.73, 'epoch': 0.02}
  1%|█                                                                                    | 250/20117 [09:13<12:10:14,  2.21s/it]  1%|█                                                                                    | 251/20117 [09:15<12:09:32,  2.20s/it]  1%|█                                                                                    | 252/20117 [09:17<12:06:47,  2.20s/it]  1%|█                                                                                    | 253/20117 [09:19<12:04:42,  2.19s/it]  1%|█                                                                                    | 254/20117 [09:21<12:01:48,  2.18s/it]  1%|█                                                                                    | 255/20117 [09:24<11:59:18,  2.17s/it]  1%|█                                                                                    | 256/20117 [09:26<12:01:21,  2.18s/it]  1%|█                                                                                    | 257/20117 [09:28<12:02:29,  2.18s/it]  1%|█                                                                                    | 258/20117 [09:30<12:07:03,  2.20s/it]  1%|█                                                                                    | 259/20117 [09:32<12:07:18,  2.20s/it]  1%|█                                                                                    | 260/20117 [09:35<12:06:59,  2.20s/it]                                                                                                                                 {'loss': 0.4017, 'grad_norm': 0.30223792791366577, 'learning_rate': 0.00019996886538621925, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 430.9, 'epoch': 0.03}
  1%|█                                                                                    | 260/20117 [09:35<12:06:59,  2.20s/it]  1%|█                                                                                    | 261/20117 [09:37<12:32:28,  2.27s/it]  1%|█                                                                                    | 262/20117 [09:39<12:27:47,  2.26s/it]  1%|█                                                                                    | 263/20117 [09:42<12:22:18,  2.24s/it]  1%|█                                                                                    | 264/20117 [09:44<12:18:49,  2.23s/it]  1%|█                                                                                    | 265/20117 [09:46<12:10:59,  2.21s/it]  1%|█                                                                                    | 266/20117 [09:48<12:05:58,  2.19s/it]  1%|█▏                                                                                   | 267/20117 [09:50<12:04:51,  2.19s/it]  1%|█▏                                                                                   | 268/20117 [09:52<12:00:34,  2.18s/it]  1%|█▏                                                                                   | 269/20117 [09:55<11:59:23,  2.17s/it]  1%|█▏                                                                                   | 270/20117 [09:57<11:58:33,  2.17s/it]                                                                                                                                 {'loss': 0.2051, 'grad_norm': 0.3889918327331543, 'learning_rate': 0.0001999648261652182, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 279.64, 'epoch': 0.03}
  1%|█▏                                                                                   | 270/20117 [09:57<11:58:33,  2.17s/it]  1%|█▏                                                                                   | 271/20117 [09:59<11:56:13,  2.17s/it]  1%|█▏                                                                                   | 272/20117 [10:01<11:56:39,  2.17s/it]  1%|█▏                                                                                   | 273/20117 [10:03<12:05:23,  2.19s/it]  1%|█▏                                                                                   | 274/20117 [10:05<12:06:49,  2.20s/it]  1%|█▏                                                                                   | 275/20117 [10:08<12:05:52,  2.19s/it]  1%|█▏                                                                                   | 276/20117 [10:10<12:09:45,  2.21s/it]  1%|█▏                                                                                   | 277/20117 [10:12<12:04:54,  2.19s/it]  1%|█▏                                                                                   | 278/20117 [10:14<12:06:28,  2.20s/it]  1%|█▏                                                                                   | 279/20117 [10:16<12:01:34,  2.18s/it]  1%|█▏                                                                                   | 280/20117 [10:19<11:58:44,  2.17s/it]                                                                                                                                 {'loss': 0.3332, 'grad_norm': 0.4357030391693115, 'learning_rate': 0.00019996054070972225, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 362.33, 'epoch': 0.03}
  1%|█▏                                                                                   | 280/20117 [10:19<11:58:44,  2.17s/it]  1%|█▏                                                                                   | 281/20117 [10:21<12:02:44,  2.19s/it]  1%|█▏                                                                                   | 282/20117 [10:23<12:06:26,  2.20s/it]  1%|█▏                                                                                   | 283/20117 [10:25<12:03:07,  2.19s/it]  1%|█▏                                                                                   | 284/20117 [10:27<12:00:21,  2.18s/it]  1%|█▏                                                                                   | 285/20117 [10:30<12:01:07,  2.18s/it]  1%|█▏                                                                                   | 286/20117 [10:32<12:07:16,  2.20s/it]  1%|█▏                                                                                   | 287/20117 [10:34<12:06:19,  2.20s/it]  1%|█▏                                                                                   | 288/20117 [10:36<12:06:19,  2.20s/it]  1%|█▏                                                                                   | 289/20117 [10:38<12:10:17,  2.21s/it]  1%|█▏                                                                                   | 290/20117 [10:41<12:05:59,  2.20s/it]                                                                                                                                 {'loss': 0.3052, 'grad_norm': 0.3736005425453186, 'learning_rate': 0.00019995600903028742, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 362.89, 'epoch': 0.03}
  1%|█▏                                                                                   | 290/20117 [10:41<12:05:59,  2.20s/it]  1%|█▏                                                                                   | 291/20117 [10:43<12:02:42,  2.19s/it]  1%|█▏                                                                                   | 292/20117 [10:45<11:58:49,  2.18s/it]  1%|█▏                                                                                   | 293/20117 [10:47<12:07:07,  2.20s/it]  1%|█▏                                                                                   | 294/20117 [10:49<12:17:02,  2.23s/it]  1%|█▏                                                                                   | 295/20117 [10:52<12:11:33,  2.21s/it]  1%|█▎                                                                                   | 296/20117 [10:54<12:08:54,  2.21s/it]  1%|█▎                                                                                   | 297/20117 [10:56<12:06:37,  2.20s/it]  1%|█▎                                                                                   | 298/20117 [10:58<12:02:36,  2.19s/it]  1%|█▎                                                                                   | 299/20117 [11:00<12:06:27,  2.20s/it]  1%|█▎                                                                                   | 300/20117 [11:03<12:07:52,  2.20s/it]                                                                                                                                 {'loss': 0.361, 'grad_norm': 0.39748865365982056, 'learning_rate': 0.00019995123113807615, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 387.24, 'epoch': 0.03}
  1%|█▎                                                                                   | 300/20117 [11:03<12:07:52,  2.20s/it]  1%|█▎                                                                                   | 301/20117 [11:05<12:05:14,  2.20s/it]  2%|█▎                                                                                   | 302/20117 [11:07<12:01:23,  2.18s/it]  2%|█▎                                                                                   | 303/20117 [11:09<11:59:57,  2.18s/it]  2%|█▎                                                                                   | 304/20117 [11:11<12:03:38,  2.19s/it]  2%|█▎                                                                                   | 305/20117 [11:14<12:04:31,  2.19s/it]  2%|█▎                                                                                   | 306/20117 [11:16<11:59:44,  2.18s/it]  2%|█▎                                                                                   | 307/20117 [11:18<12:03:36,  2.19s/it]  2%|█▎                                                                                   | 308/20117 [11:20<12:05:00,  2.20s/it]  2%|█▎                                                                                   | 309/20117 [11:22<12:05:04,  2.20s/it]  2%|█▎                                                                                   | 310/20117 [11:25<12:13:22,  2.22s/it]                                                                                                                                 {'loss': 0.2449, 'grad_norm': 0.18977899849414825, 'learning_rate': 0.00019994620704485741, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 339.16, 'epoch': 0.03}
  2%|█▎                                                                                   | 310/20117 [11:25<12:13:22,  2.22s/it]  2%|█▎                                                                                   | 311/20117 [11:27<12:13:53,  2.22s/it]  2%|█▎                                                                                   | 312/20117 [11:29<12:13:36,  2.22s/it]  2%|█▎                                                                                   | 313/20117 [11:31<12:15:37,  2.23s/it]  2%|█▎                                                                                   | 314/20117 [11:34<12:19:56,  2.24s/it]  2%|█▎                                                                                   | 315/20117 [11:36<12:39:44,  2.30s/it]  2%|█▎                                                                                   | 316/20117 [11:38<12:32:43,  2.28s/it]  2%|█▎                                                                                   | 317/20117 [11:40<12:23:47,  2.25s/it]  2%|█▎                                                                                   | 318/20117 [11:43<12:14:56,  2.23s/it]  2%|█▎                                                                                   | 319/20117 [11:45<12:06:39,  2.20s/it]  2%|█▎                                                                                   | 320/20117 [11:47<12:01:08,  2.19s/it]                                                                                                                                 {'loss': 0.266, 'grad_norm': 0.3898354172706604, 'learning_rate': 0.00019994093676300662, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 360.62, 'epoch': 0.03}
  2%|█▎                                                                                   | 320/20117 [11:47<12:01:08,  2.19s/it]  2%|█▎                                                                                   | 321/20117 [11:49<12:05:00,  2.20s/it]  2%|█▎                                                                                   | 322/20117 [11:51<12:07:32,  2.21s/it]  2%|█▎                                                                                   | 323/20117 [11:53<12:02:06,  2.19s/it]  2%|█▎                                                                                   | 324/20117 [11:56<12:02:22,  2.19s/it]  2%|█▎                                                                                   | 325/20117 [11:58<12:14:55,  2.23s/it]  2%|█▍                                                                                   | 326/20117 [12:00<12:17:49,  2.24s/it]  2%|█▍                                                                                   | 327/20117 [12:02<12:10:21,  2.21s/it]  2%|█▍                                                                                   | 328/20117 [12:05<12:03:42,  2.19s/it]  2%|█▍                                                                                   | 329/20117 [12:07<11:58:59,  2.18s/it]  2%|█▍                                                                                   | 330/20117 [12:09<12:07:17,  2.21s/it]                                                                                                                                 {'loss': 0.2886, 'grad_norm': 0.3335312008857727, 'learning_rate': 0.00019993542030550553, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 386.95, 'epoch': 0.03}
  2%|█▍                                                                                   | 330/20117 [12:09<12:07:17,  2.21s/it]  2%|█▍                                                                                   | 331/20117 [12:11<12:10:11,  2.21s/it]  2%|█▍                                                                                   | 332/20117 [12:13<12:12:51,  2.22s/it]  2%|█▍                                                                                   | 333/20117 [12:16<12:12:15,  2.22s/it]  2%|█▍                                                                                   | 334/20117 [12:18<12:12:01,  2.22s/it]  2%|█▍                                                                                   | 335/20117 [12:20<12:08:54,  2.21s/it]  2%|█▍                                                                                   | 336/20117 [12:22<12:12:11,  2.22s/it]  2%|█▍                                                                                   | 337/20117 [12:24<12:09:57,  2.21s/it]  2%|█▍                                                                                   | 338/20117 [12:27<12:03:18,  2.19s/it]  2%|█▍                                                                                   | 339/20117 [12:29<12:01:33,  2.19s/it]  2%|█▍                                                                                   | 340/20117 [12:31<11:57:05,  2.18s/it]                                                                                                                                 {'loss': 0.2542, 'grad_norm': 0.3043772280216217, 'learning_rate': 0.00019992965768594244, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 400.18, 'epoch': 0.03}
  2%|█▍                                                                                   | 340/20117 [12:31<11:57:05,  2.18s/it]  2%|█▍                                                                                   | 341/20117 [12:33<12:01:06,  2.19s/it]  2%|█▍                                                                                   | 342/20117 [12:35<12:00:27,  2.19s/it]  2%|█▍                                                                                   | 343/20117 [12:37<11:56:22,  2.17s/it]  2%|█▍                                                                                   | 344/20117 [12:40<11:57:31,  2.18s/it]  2%|█▍                                                                                   | 345/20117 [12:42<12:06:47,  2.21s/it]  2%|█▍                                                                                   | 346/20117 [12:44<12:01:16,  2.19s/it]  2%|█▍                                                                                   | 347/20117 [12:46<11:57:12,  2.18s/it]  2%|█▍                                                                                   | 348/20117 [12:48<11:58:14,  2.18s/it]  2%|█▍                                                                                   | 349/20117 [12:51<11:53:02,  2.16s/it]  2%|█▍                                                                                   | 350/20117 [12:53<11:53:25,  2.17s/it]                                                                                                                                 {'loss': 0.2748, 'grad_norm': 0.35784608125686646, 'learning_rate': 0.00019992364891851185, 'memory/max_active (GiB)': 19.66, 'memory/max_allocated (GiB)': 19.66, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 333.64, 'epoch': 0.03}
  2%|█▍                                                                                   | 350/20117 [12:53<11:53:25,  2.17s/it]  2%|█▍                                                                                   | 351/20117 [12:55<11:55:22,  2.17s/it]  2%|█▍                                                                                   | 352/20117 [12:57<11:59:57,  2.19s/it]  2%|█▍                                                                                   | 353/20117 [12:59<11:56:59,  2.18s/it]  2%|█▍                                                                                   | 354/20117 [13:01<11:56:18,  2.17s/it]  2%|█▍                                                                                   | 355/20117 [13:04<11:55:11,  2.17s/it]  2%|█▌                                                                                   | 356/20117 [13:06<11:59:42,  2.19s/it]  2%|█▌                                                                                   | 357/20117 [13:08<12:09:57,  2.22s/it]  2%|█▌                                                                                   | 358/20117 [13:10<12:11:35,  2.22s/it]  2%|█▌                                                                                   | 359/20117 [13:12<12:03:01,  2.20s/it]  2%|█▌                                                                                   | 360/20117 [13:15<11:57:23,  2.18s/it]                                                                                                                                 {'loss': 0.2705, 'grad_norm': 0.5068204998970032, 'learning_rate': 0.00019991739401801464, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 367.07, 'epoch': 0.04}
  2%|█▌                                                                                   | 360/20117 [13:15<11:57:23,  2.18s/it]  2%|█▌                                                                                   | 361/20117 [13:17<11:56:06,  2.17s/it]  2%|█▌                                                                                   | 362/20117 [13:19<11:53:51,  2.17s/it]  2%|█▌                                                                                   | 363/20117 [13:21<11:58:23,  2.18s/it]  2%|█▌                                                                                   | 364/20117 [13:23<11:53:49,  2.17s/it]  2%|█▌                                                                                   | 365/20117 [13:25<11:51:18,  2.16s/it]  2%|█▌                                                                                   | 366/20117 [13:28<12:21:56,  2.25s/it]  2%|█▌                                                                                   | 367/20117 [13:30<12:27:36,  2.27s/it]  2%|█▌                                                                                   | 368/20117 [13:32<12:13:16,  2.23s/it]  2%|█▌                                                                                   | 369/20117 [13:35<12:11:45,  2.22s/it]  2%|█▌                                                                                   | 370/20117 [13:37<12:02:28,  2.20s/it]                                                                                                                                 {'loss': 0.2403, 'grad_norm': 0.44382113218307495, 'learning_rate': 0.00019991089299985793, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 365.0, 'epoch': 0.04}
  2%|█▌                                                                                   | 370/20117 [13:37<12:02:28,  2.20s/it]  2%|█▌                                                                                   | 371/20117 [13:39<11:59:33,  2.19s/it]  2%|█▌                                                                                   | 372/20117 [13:41<11:57:11,  2.18s/it]  2%|█▌                                                                                   | 373/20117 [13:43<11:56:12,  2.18s/it]  2%|█▌                                                                                   | 374/20117 [13:45<11:55:54,  2.18s/it]  2%|█▌                                                                                   | 375/20117 [13:48<11:57:03,  2.18s/it]  2%|█▌                                                                                   | 376/20117 [13:50<12:02:09,  2.19s/it]  2%|█▌                                                                                   | 377/20117 [13:52<12:02:32,  2.20s/it]  2%|█▌                                                                                   | 378/20117 [13:54<12:01:25,  2.19s/it]  2%|█▌                                                                                   | 379/20117 [13:56<12:00:41,  2.19s/it]  2%|█▌                                                                                   | 380/20117 [13:59<11:59:16,  2.19s/it]                                                                                                                                 {'loss': 0.3065, 'grad_norm': 0.27181145548820496, 'learning_rate': 0.0001999041458800551, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 372.54, 'epoch': 0.04}
  2%|█▌                                                                                   | 380/20117 [13:59<11:59:16,  2.19s/it]  2%|█▌                                                                                   | 381/20117 [14:01<11:54:22,  2.17s/it]  2%|█▌                                                                                   | 382/20117 [14:03<11:55:00,  2.17s/it]  2%|█▌                                                                                   | 383/20117 [14:05<12:02:10,  2.20s/it]  2%|█▌                                                                                   | 384/20117 [14:07<11:57:00,  2.18s/it]  2%|█▋                                                                                   | 385/20117 [14:09<12:02:19,  2.20s/it]  2%|█▋                                                                                   | 386/20117 [14:12<12:01:57,  2.20s/it]  2%|█▋                                                                                   | 387/20117 [14:14<12:00:34,  2.19s/it]  2%|█▋                                                                                   | 388/20117 [14:16<11:55:44,  2.18s/it]  2%|█▋                                                                                   | 389/20117 [14:18<11:51:08,  2.16s/it]  2%|█▋                                                                                   | 390/20117 [14:20<11:52:32,  2.17s/it]                                                                                                                                 {'loss': 0.2894, 'grad_norm': 0.28408923745155334, 'learning_rate': 0.00019989715267522575, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 340.31, 'epoch': 0.04}
  2%|█▋                                                                                   | 390/20117 [14:20<11:52:32,  2.17s/it]  2%|█▋                                                                                   | 391/20117 [14:23<11:57:39,  2.18s/it]  2%|█▋                                                                                   | 392/20117 [14:25<11:57:07,  2.18s/it]  2%|█▋                                                                                   | 393/20117 [14:27<11:55:27,  2.18s/it]  2%|█▋                                                                                   | 394/20117 [14:29<11:59:48,  2.19s/it]  2%|█▋                                                                                   | 395/20117 [14:31<11:59:31,  2.19s/it]  2%|█▋                                                                                   | 396/20117 [14:33<12:02:17,  2.20s/it]  2%|█▋                                                                                   | 397/20117 [14:36<12:03:48,  2.20s/it]  2%|█▋                                                                                   | 398/20117 [14:38<12:04:34,  2.20s/it]  2%|█▋                                                                                   | 399/20117 [14:40<12:10:20,  2.22s/it]  2%|█▋                                                                                   | 400/20117 [14:42<12:08:50,  2.22s/it]                                                                                                                                 {'loss': 0.4061, 'grad_norm': 0.4882698357105255, 'learning_rate': 0.00019988991340259563, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 358.94, 'epoch': 0.04}
  2%|█▋                                                                                   | 400/20117 [14:42<12:08:50,  2.22s/it]  2%|█▋                                                                                   | 401/20117 [14:45<12:10:48,  2.22s/it]  2%|█▋                                                                                   | 402/20117 [14:47<12:08:55,  2.22s/it]  2%|█▋                                                                                   | 403/20117 [14:49<12:12:20,  2.23s/it]  2%|█▋                                                                                   | 404/20117 [14:51<12:10:40,  2.22s/it]  2%|█▋                                                                                   | 405/20117 [14:53<12:04:35,  2.21s/it]  2%|█▋                                                                                   | 406/20117 [14:56<12:06:53,  2.21s/it]  2%|█▋                                                                                   | 407/20117 [14:58<12:08:26,  2.22s/it]  2%|█▋                                                                                   | 408/20117 [15:00<12:08:59,  2.22s/it]  2%|█▋                                                                                   | 409/20117 [15:02<12:04:05,  2.20s/it]  2%|█▋                                                                                   | 410/20117 [15:04<11:56:37,  2.18s/it]                                                                                                                                 {'loss': 0.3141, 'grad_norm': 0.2663392722606659, 'learning_rate': 0.0001998824280799966, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 368.76, 'epoch': 0.04}
  2%|█▋                                                                                   | 410/20117 [15:04<11:56:37,  2.18s/it]  2%|█▋                                                                                   | 411/20117 [15:07<12:01:52,  2.20s/it]  2%|█▋                                                                                   | 412/20117 [15:09<11:56:46,  2.18s/it]  2%|█▋                                                                                   | 413/20117 [15:11<11:55:51,  2.18s/it]  2%|█▋                                                                                   | 414/20117 [15:13<11:51:58,  2.17s/it]  2%|█▊                                                                                   | 415/20117 [15:15<11:59:22,  2.19s/it]  2%|█▊                                                                                   | 416/20117 [15:18<12:05:21,  2.21s/it]  2%|█▊                                                                                   | 417/20117 [15:20<12:08:22,  2.22s/it]  2%|█▊                                                                                   | 418/20117 [15:22<12:09:28,  2.22s/it]  2%|█▊                                                                                   | 419/20117 [15:25<12:36:59,  2.31s/it]  2%|█▊                                                                                   | 420/20117 [15:27<12:25:17,  2.27s/it]                                                                                                                                 {'loss': 0.3374, 'grad_norm': 0.25356051325798035, 'learning_rate': 0.00019987469672586654, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 402.99, 'epoch': 0.04}
  2%|█▊                                                                                   | 420/20117 [15:27<12:25:17,  2.27s/it]  2%|█▊                                                                                   | 421/20117 [15:29<12:14:25,  2.24s/it]  2%|█▊                                                                                   | 422/20117 [15:31<12:13:00,  2.23s/it]  2%|█▊                                                                                   | 423/20117 [15:33<12:06:36,  2.21s/it]  2%|█▊                                                                                   | 424/20117 [15:35<11:58:31,  2.19s/it]  2%|█▊                                                                                   | 425/20117 [15:38<12:00:34,  2.20s/it]  2%|█▊                                                                                   | 426/20117 [15:40<12:03:16,  2.20s/it]  2%|█▊                                                                                   | 427/20117 [15:42<12:04:48,  2.21s/it]  2%|█▊                                                                                   | 428/20117 [15:44<11:58:36,  2.19s/it]  2%|█▊                                                                                   | 429/20117 [15:46<11:58:10,  2.19s/it]  2%|█▊                                                                                   | 430/20117 [15:49<11:59:10,  2.19s/it]                                                                                                                                 {'loss': 0.2929, 'grad_norm': 0.4773045778274536, 'learning_rate': 0.00019986671935924946, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 360.32, 'epoch': 0.04}
  2%|█▊                                                                                   | 430/20117 [15:49<11:59:10,  2.19s/it]  2%|█▊                                                                                   | 431/20117 [15:51<11:58:12,  2.19s/it]  2%|█▊                                                                                   | 432/20117 [15:53<11:58:43,  2.19s/it]  2%|█▊                                                                                   | 433/20117 [15:55<11:57:04,  2.19s/it]  2%|█▊                                                                                   | 434/20117 [15:57<11:57:45,  2.19s/it]  2%|█▊                                                                                   | 435/20117 [16:00<12:01:41,  2.20s/it]  2%|█▊                                                                                   | 436/20117 [16:02<11:59:21,  2.19s/it]  2%|█▊                                                                                   | 437/20117 [16:04<11:58:19,  2.19s/it]  2%|█▊                                                                                   | 438/20117 [16:06<12:00:55,  2.20s/it]  2%|█▊                                                                                   | 439/20117 [16:08<11:59:22,  2.19s/it]  2%|█▊                                                                                   | 440/20117 [16:11<11:55:02,  2.18s/it]                                                                                                                                 {'loss': 0.3106, 'grad_norm': 0.37164929509162903, 'learning_rate': 0.0001998584959997953, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 415.21, 'epoch': 0.04}
  2%|█▊                                                                                   | 440/20117 [16:11<11:55:02,  2.18s/it]  2%|█▊                                                                                   | 441/20117 [16:13<11:56:44,  2.19s/it]  2%|█▊                                                                                   | 442/20117 [16:15<11:59:44,  2.19s/it]  2%|█▊                                                                                   | 443/20117 [16:17<11:57:34,  2.19s/it]  2%|█▉                                                                                   | 444/20117 [16:19<11:53:21,  2.18s/it]  2%|█▉                                                                                   | 445/20117 [16:22<11:59:58,  2.20s/it]  2%|█▉                                                                                   | 446/20117 [16:24<11:58:58,  2.19s/it]  2%|█▉                                                                                   | 447/20117 [16:26<12:01:07,  2.20s/it]  2%|█▉                                                                                   | 448/20117 [16:28<11:57:15,  2.19s/it]  2%|█▉                                                                                   | 449/20117 [16:30<12:01:16,  2.20s/it]  2%|█▉                                                                                   | 450/20117 [16:33<12:02:09,  2.20s/it]                                                                                                                                 {'loss': 0.2676, 'grad_norm': 0.3310747742652893, 'learning_rate': 0.00019985002666775986, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 363.95, 'epoch': 0.04}
  2%|█▉                                                                                   | 450/20117 [16:33<12:02:09,  2.20s/it]  2%|█▉                                                                                   | 451/20117 [16:35<11:58:06,  2.19s/it]  2%|█▉                                                                                   | 452/20117 [16:37<12:00:40,  2.20s/it]  2%|█▉                                                                                   | 453/20117 [16:39<11:54:37,  2.18s/it]  2%|█▉                                                                                   | 454/20117 [16:41<11:58:48,  2.19s/it]  2%|█▉                                                                                   | 455/20117 [16:43<11:53:08,  2.18s/it]  2%|█▉                                                                                   | 456/20117 [16:46<11:53:30,  2.18s/it]  2%|█▉                                                                                   | 457/20117 [16:48<11:56:46,  2.19s/it]  2%|█▉                                                                                   | 458/20117 [16:50<12:02:58,  2.21s/it]  2%|█▉                                                                                   | 459/20117 [16:52<11:56:36,  2.19s/it]  2%|█▉                                                                                   | 460/20117 [16:54<12:03:16,  2.21s/it]                                                                                                                                 {'loss': 0.3139, 'grad_norm': 0.32523512840270996, 'learning_rate': 0.000199841311384005, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 361.92, 'epoch': 0.05}
  2%|█▉                                                                                   | 460/20117 [16:54<12:03:16,  2.21s/it]  2%|█▉                                                                                   | 461/20117 [16:57<12:00:17,  2.20s/it]  2%|█▉                                                                                   | 462/20117 [16:59<11:55:45,  2.18s/it]  2%|█▉                                                                                   | 463/20117 [17:01<11:52:12,  2.17s/it]  2%|█▉                                                                                   | 464/20117 [17:03<11:53:16,  2.18s/it]  2%|█▉                                                                                   | 465/20117 [17:05<11:49:13,  2.17s/it]  2%|█▉                                                                                   | 466/20117 [17:07<11:48:30,  2.16s/it]  2%|█▉                                                                                   | 467/20117 [17:10<11:49:40,  2.17s/it]  2%|█▉                                                                                   | 468/20117 [17:12<11:52:51,  2.18s/it]  2%|█▉                                                                                   | 469/20117 [17:14<11:56:57,  2.19s/it]  2%|█▉                                                                                   | 470/20117 [17:16<11:57:50,  2.19s/it]                                                                                                                                 {'loss': 0.323, 'grad_norm': 0.40525123476982117, 'learning_rate': 0.00019983235016999827, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 413.28, 'epoch': 0.05}
  2%|█▉                                                                                   | 470/20117 [17:16<11:57:50,  2.19s/it]  2%|█▉                                                                                   | 471/20117 [17:18<11:54:50,  2.18s/it]  2%|█▉                                                                                   | 472/20117 [17:21<11:55:25,  2.19s/it]  2%|█▉                                                                                   | 473/20117 [17:23<12:24:38,  2.27s/it]  2%|██                                                                                   | 474/20117 [17:25<12:10:28,  2.23s/it]  2%|██                                                                                   | 475/20117 [17:27<12:06:37,  2.22s/it]  2%|██                                                                                   | 476/20117 [17:30<12:07:53,  2.22s/it]  2%|██                                                                                   | 477/20117 [17:32<12:00:23,  2.20s/it]  2%|██                                                                                   | 478/20117 [17:34<12:00:02,  2.20s/it]  2%|██                                                                                   | 479/20117 [17:36<11:57:14,  2.19s/it]  2%|██                                                                                   | 480/20117 [17:38<11:53:02,  2.18s/it]                                                                                                                                 {'loss': 0.2941, 'grad_norm': 0.4233141541481018, 'learning_rate': 0.000199823143047813, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 383.95, 'epoch': 0.05}
  2%|██                                                                                   | 480/20117 [17:38<11:53:02,  2.18s/it]  2%|██                                                                                   | 481/20117 [17:40<11:50:20,  2.17s/it]  2%|██                                                                                   | 482/20117 [17:43<11:54:08,  2.18s/it]  2%|██                                                                                   | 483/20117 [17:45<11:59:08,  2.20s/it]  2%|██                                                                                   | 484/20117 [17:47<11:56:22,  2.19s/it]  2%|██                                                                                   | 485/20117 [17:49<11:56:37,  2.19s/it]  2%|██                                                                                   | 486/20117 [17:51<11:57:14,  2.19s/it]  2%|██                                                                                   | 487/20117 [17:54<11:55:17,  2.19s/it]  2%|██                                                                                   | 488/20117 [17:56<11:56:57,  2.19s/it]  2%|██                                                                                   | 489/20117 [17:58<11:58:58,  2.20s/it]  2%|██                                                                                   | 490/20117 [18:00<11:58:23,  2.20s/it]                                                                                                                                 {'loss': 0.2835, 'grad_norm': 0.21106044948101044, 'learning_rate': 0.0001998136900401283, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 368.99, 'epoch': 0.05}
  2%|██                                                                                   | 490/20117 [18:00<11:58:23,  2.20s/it]  2%|██                                                                                   | 491/20117 [18:02<12:07:32,  2.22s/it]  2%|██                                                                                   | 492/20117 [18:05<12:02:52,  2.21s/it]  2%|██                                                                                   | 493/20117 [18:07<12:01:44,  2.21s/it]  2%|██                                                                                   | 494/20117 [18:09<12:03:49,  2.21s/it]  2%|██                                                                                   | 495/20117 [18:11<12:01:14,  2.21s/it]  2%|██                                                                                   | 496/20117 [18:13<12:01:16,  2.21s/it]  2%|██                                                                                   | 497/20117 [18:16<11:57:59,  2.20s/it]  2%|██                                                                                   | 498/20117 [18:18<11:59:02,  2.20s/it]  2%|██                                                                                   | 499/20117 [18:20<11:53:31,  2.18s/it]  2%|██                                                                                   | 500/20117 [18:22<11:51:53,  2.18s/it]                                                                                                                                 {'loss': 0.3895, 'grad_norm': 0.34198832511901855, 'learning_rate': 0.00019980399117022895, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 387.18, 'epoch': 0.05}
  2%|██                                                                                   | 500/20117 [18:22<11:51:53,  2.18s/it]  2%|██                                                                                   | 501/20117 [18:24<11:49:47,  2.17s/it]  2%|██                                                                                   | 502/20117 [18:26<11:45:44,  2.16s/it]  3%|██▏                                                                                  | 503/20117 [18:29<11:47:40,  2.16s/it]  3%|██▏                                                                                  | 504/20117 [18:31<11:46:58,  2.16s/it]  3%|██▏                                                                                  | 505/20117 [18:33<11:50:33,  2.17s/it]  3%|██▏                                                                                  | 506/20117 [18:35<11:47:13,  2.16s/it]  3%|██▏                                                                                  | 507/20117 [18:37<11:51:05,  2.18s/it]  3%|██▏                                                                                  | 508/20117 [18:39<11:45:51,  2.16s/it]  3%|██▏                                                                                  | 509/20117 [18:42<11:45:14,  2.16s/it]  3%|██▏                                                                                  | 510/20117 [18:44<11:41:05,  2.15s/it]                                                                                                                                 {'loss': 0.2854, 'grad_norm': 0.44045203924179077, 'learning_rate': 0.00019979404646200527, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 336.09, 'epoch': 0.05}
  3%|██▏                                                                                  | 510/20117 [18:44<11:41:05,  2.15s/it]  3%|██▏                                                                                  | 511/20117 [18:46<11:39:09,  2.14s/it]  3%|██▏                                                                                  | 512/20117 [18:48<11:37:01,  2.13s/it]  3%|██▏                                                                                  | 513/20117 [18:50<11:41:08,  2.15s/it]  3%|██▏                                                                                  | 514/20117 [18:52<11:40:18,  2.14s/it]  3%|██▏                                                                                  | 515/20117 [18:54<11:37:54,  2.14s/it]  3%|██▏                                                                                  | 516/20117 [18:57<11:40:52,  2.15s/it]  3%|██▏                                                                                  | 517/20117 [18:59<11:39:50,  2.14s/it]  3%|██▏                                                                                  | 518/20117 [19:01<11:47:03,  2.16s/it]  3%|██▏                                                                                  | 519/20117 [19:03<11:55:34,  2.19s/it]  3%|██▏                                                                                  | 520/20117 [19:05<11:58:28,  2.20s/it]                                                                                                                                 {'loss': 0.3218, 'grad_norm': 0.33906373381614685, 'learning_rate': 0.0001997838559399532, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 406.12, 'epoch': 0.05}
  3%|██▏                                                                                  | 520/20117 [19:05<11:58:28,  2.20s/it]  3%|██▏                                                                                  | 521/20117 [19:08<11:58:32,  2.20s/it]  3%|██▏                                                                                  | 522/20117 [19:10<12:03:54,  2.22s/it]  3%|██▏                                                                                  | 523/20117 [19:12<12:01:07,  2.21s/it]  3%|██▏                                                                                  | 524/20117 [19:15<12:35:45,  2.31s/it]  3%|██▏                                                                                  | 525/20117 [19:17<12:31:47,  2.30s/it]  3%|██▏                                                                                  | 526/20117 [19:19<12:22:06,  2.27s/it]  3%|██▏                                                                                  | 527/20117 [19:21<12:17:59,  2.26s/it]  3%|██▏                                                                                  | 528/20117 [19:23<12:07:39,  2.23s/it]  3%|██▏                                                                                  | 529/20117 [19:26<12:02:05,  2.21s/it]  3%|██▏                                                                                  | 530/20117 [19:28<11:54:42,  2.19s/it]                                                                                                                                 {'loss': 0.2803, 'grad_norm': 0.32613444328308105, 'learning_rate': 0.00019977341962917414, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 407.45, 'epoch': 0.05}
  3%|██▏                                                                                  | 530/20117 [19:28<11:54:42,  2.19s/it]  3%|██▏                                                                                  | 531/20117 [19:30<11:52:16,  2.18s/it]  3%|██▏                                                                                  | 532/20117 [19:32<11:50:16,  2.18s/it]  3%|██▎                                                                                  | 533/20117 [19:34<11:45:49,  2.16s/it]  3%|██▎                                                                                  | 534/20117 [19:36<11:54:23,  2.19s/it]  3%|██▎                                                                                  | 535/20117 [19:39<11:55:45,  2.19s/it]  3%|██▎                                                                                  | 536/20117 [19:41<11:50:27,  2.18s/it]  3%|██▎                                                                                  | 537/20117 [19:43<11:52:33,  2.18s/it]  3%|██▎                                                                                  | 538/20117 [19:45<11:54:53,  2.19s/it]  3%|██▎                                                                                  | 539/20117 [19:47<11:55:38,  2.19s/it]  3%|██▎                                                                                  | 540/20117 [19:50<11:54:49,  2.19s/it]                                                                                                                                 {'loss': 0.3143, 'grad_norm': 0.3789099454879761, 'learning_rate': 0.00019976273755537499, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 377.61, 'epoch': 0.05}
  3%|██▎                                                                                  | 540/20117 [19:50<11:54:49,  2.19s/it]  3%|██▎                                                                                  | 541/20117 [19:52<11:58:25,  2.20s/it]  3%|██▎                                                                                  | 542/20117 [19:54<11:58:45,  2.20s/it]  3%|██▎                                                                                  | 543/20117 [19:56<11:53:36,  2.19s/it]  3%|██▎                                                                                  | 544/20117 [19:58<11:48:53,  2.17s/it]  3%|██▎                                                                                  | 545/20117 [20:00<11:48:44,  2.17s/it]  3%|██▎                                                                                  | 546/20117 [20:03<11:46:10,  2.16s/it]  3%|██▎                                                                                  | 547/20117 [20:05<11:44:27,  2.16s/it]  3%|██▎                                                                                  | 548/20117 [20:07<11:46:04,  2.16s/it]  3%|██▎                                                                                  | 549/20117 [20:09<11:49:22,  2.18s/it]  3%|██▎                                                                                  | 550/20117 [20:11<11:53:46,  2.19s/it]                                                                                                                                 {'loss': 0.2434, 'grad_norm': 0.4602185785770416, 'learning_rate': 0.00019975180974486786, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 349.16, 'epoch': 0.05}
  3%|██▎                                                                                  | 550/20117 [20:11<11:53:46,  2.19s/it]  3%|██▎                                                                                  | 551/20117 [20:14<11:51:29,  2.18s/it]  3%|██▎                                                                                  | 552/20117 [20:16<11:45:01,  2.16s/it]  3%|██▎                                                                                  | 553/20117 [20:18<11:47:26,  2.17s/it]  3%|██▎                                                                                  | 554/20117 [20:20<11:51:54,  2.18s/it]  3%|██▎                                                                                  | 555/20117 [20:22<11:53:02,  2.19s/it]  3%|██▎                                                                                  | 556/20117 [20:24<11:55:55,  2.20s/it]  3%|██▎                                                                                  | 557/20117 [20:27<11:57:09,  2.20s/it]  3%|██▎                                                                                  | 558/20117 [20:29<11:59:55,  2.21s/it]  3%|██▎                                                                                  | 559/20117 [20:31<11:59:39,  2.21s/it]  3%|██▎                                                                                  | 560/20117 [20:33<12:02:14,  2.22s/it]                                                                                                                                 {'loss': 0.3238, 'grad_norm': 0.4232983887195587, 'learning_rate': 0.00019974063622457032, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 348.84, 'epoch': 0.06}
  3%|██▎                                                                                  | 560/20117 [20:33<12:02:14,  2.22s/it]  3%|██▎                                                                                  | 561/20117 [20:36<12:08:38,  2.24s/it]  3%|██▎                                                                                  | 562/20117 [20:38<12:05:01,  2.22s/it]  3%|██▍                                                                                  | 563/20117 [20:40<12:06:02,  2.23s/it]  3%|██▍                                                                                  | 564/20117 [20:42<12:05:35,  2.23s/it]  3%|██▍                                                                                  | 565/20117 [20:45<12:07:02,  2.23s/it]  3%|██▍                                                                                  | 566/20117 [20:47<12:08:03,  2.23s/it]  3%|██▍                                                                                  | 567/20117 [20:49<12:08:48,  2.24s/it]  3%|██▍                                                                                  | 568/20117 [20:51<12:06:29,  2.23s/it]  3%|██▍                                                                                  | 569/20117 [20:53<12:07:58,  2.23s/it]  3%|██▍                                                                                  | 570/20117 [20:56<12:10:45,  2.24s/it]                                                                                                                                 {'loss': 0.2722, 'grad_norm': 0.16223137080669403, 'learning_rate': 0.0001997292170220051, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 340.57, 'epoch': 0.06}
  3%|██▍                                                                                  | 570/20117 [20:56<12:10:45,  2.24s/it]  3%|██▍                                                                                  | 571/20117 [20:58<12:15:45,  2.26s/it]  3%|██▍                                                                                  | 572/20117 [21:00<12:15:44,  2.26s/it]  3%|██▍                                                                                  | 573/20117 [21:03<12:16:57,  2.26s/it]  3%|██▍                                                                                  | 574/20117 [21:05<12:23:40,  2.28s/it]  3%|██▍                                                                                  | 575/20117 [21:07<12:23:03,  2.28s/it]  3%|██▍                                                                                  | 576/20117 [21:10<13:00:38,  2.40s/it]  3%|██▍                                                                                  | 577/20117 [21:12<12:55:42,  2.38s/it]  3%|██▍                                                                                  | 578/20117 [21:15<12:48:59,  2.36s/it]  3%|██▍                                                                                  | 579/20117 [21:17<12:47:14,  2.36s/it]  3%|██▍                                                                                  | 580/20117 [21:19<12:33:41,  2.31s/it]                                                                                                                                 {'loss': 0.2801, 'grad_norm': 0.4484419822692871, 'learning_rate': 0.00019971755216530008, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 353.53, 'epoch': 0.06}
  3%|██▍                                                                                  | 580/20117 [21:19<12:33:41,  2.31s/it]  3%|██▍                                                                                  | 581/20117 [21:21<12:24:33,  2.29s/it]  3%|██▍                                                                                  | 582/20117 [21:24<12:20:56,  2.28s/it]  3%|██▍                                                                                  | 583/20117 [21:26<12:21:05,  2.28s/it]  3%|██▍                                                                                  | 584/20117 [21:28<12:20:57,  2.28s/it]  3%|██▍                                                                                  | 585/20117 [21:30<12:21:43,  2.28s/it]  3%|██▍                                                                                  | 586/20117 [21:33<12:24:16,  2.29s/it]  3%|██▍                                                                                  | 587/20117 [21:35<12:32:57,  2.31s/it]  3%|██▍                                                                                  | 588/20117 [21:37<12:32:56,  2.31s/it]  3%|██▍                                                                                  | 589/20117 [21:40<12:23:43,  2.29s/it]  3%|██▍                                                                                  | 590/20117 [21:42<12:29:44,  2.30s/it]                                                                                                                                 {'loss': 0.3015, 'grad_norm': 0.23834413290023804, 'learning_rate': 0.0001997056416831883, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 365.09, 'epoch': 0.06}
  3%|██▍                                                                                  | 590/20117 [21:42<12:29:44,  2.30s/it]  3%|██▍                                                                                  | 591/20117 [21:44<12:21:20,  2.28s/it]  3%|██▌                                                                                  | 592/20117 [21:46<12:22:33,  2.28s/it]  3%|██▌                                                                                  | 593/20117 [21:49<12:23:36,  2.29s/it]  3%|██▌                                                                                  | 594/20117 [21:51<12:16:28,  2.26s/it]  3%|██▌                                                                                  | 595/20117 [21:53<12:09:36,  2.24s/it]  3%|██▌                                                                                  | 596/20117 [21:55<12:06:54,  2.23s/it]  3%|██▌                                                                                  | 597/20117 [21:58<12:02:57,  2.22s/it]  3%|██▌                                                                                  | 598/20117 [22:00<12:03:24,  2.22s/it]  3%|██▌                                                                                  | 599/20117 [22:02<11:59:40,  2.21s/it]  3%|██▌                                                                                  | 600/20117 [22:04<12:00:49,  2.22s/it]                                                                                                                                 {'loss': 0.2959, 'grad_norm': 0.4154009521007538, 'learning_rate': 0.0001996934856050078, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 365.21, 'epoch': 0.06}
  3%|██▌                                                                                  | 600/20117 [22:04<12:00:49,  2.22s/it]  3%|██▌                                                                                  | 601/20117 [22:06<12:02:24,  2.22s/it]  3%|██▌                                                                                  | 602/20117 [22:09<12:03:52,  2.23s/it]  3%|██▌                                                                                  | 603/20117 [22:11<12:05:39,  2.23s/it]  3%|██▌                                                                                  | 604/20117 [22:13<12:04:34,  2.23s/it]  3%|██▌                                                                                  | 605/20117 [22:15<12:05:19,  2.23s/it]  3%|██▌                                                                                  | 606/20117 [22:18<12:02:56,  2.22s/it]  3%|██▌                                                                                  | 607/20117 [22:20<12:12:16,  2.25s/it]  3%|██▌                                                                                  | 608/20117 [22:22<12:09:40,  2.24s/it]  3%|██▌                                                                                  | 609/20117 [22:24<12:05:40,  2.23s/it]  3%|██▌                                                                                  | 610/20117 [22:27<12:08:49,  2.24s/it]                                                                                                                                 {'loss': 0.2563, 'grad_norm': 0.23120558261871338, 'learning_rate': 0.00019968108396070157, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 397.04, 'epoch': 0.06}
  3%|██▌                                                                                  | 610/20117 [22:27<12:08:49,  2.24s/it]  3%|██▌                                                                                  | 611/20117 [22:29<12:08:07,  2.24s/it]  3%|██▌                                                                                  | 612/20117 [22:31<12:08:24,  2.24s/it]  3%|██▌                                                                                  | 613/20117 [22:33<12:09:41,  2.24s/it]  3%|██▌                                                                                  | 614/20117 [22:35<12:03:16,  2.23s/it]  3%|██▌                                                                                  | 615/20117 [22:38<12:01:49,  2.22s/it]  3%|██▌                                                                                  | 616/20117 [22:40<11:59:02,  2.21s/it]  3%|██▌                                                                                  | 617/20117 [22:42<12:05:36,  2.23s/it]  3%|██▌                                                                                  | 618/20117 [22:44<12:00:13,  2.22s/it]  3%|██▌                                                                                  | 619/20117 [22:47<11:57:30,  2.21s/it]  3%|██▌                                                                                  | 620/20117 [22:49<11:58:35,  2.21s/it]                                                                                                                                 {'loss': 0.3025, 'grad_norm': 0.4453487694263458, 'learning_rate': 0.00019966843678081745, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 380.75, 'epoch': 0.06}
  3%|██▌                                                                                  | 620/20117 [22:49<11:58:35,  2.21s/it]  3%|██▌                                                                                  | 621/20117 [22:51<11:59:03,  2.21s/it]  3%|██▋                                                                                  | 622/20117 [22:53<11:56:57,  2.21s/it]  3%|██▋                                                                                  | 623/20117 [22:55<11:54:53,  2.20s/it]  3%|██▋                                                                                  | 624/20117 [22:58<11:53:54,  2.20s/it]  3%|██▋                                                                                  | 625/20117 [23:00<11:59:45,  2.22s/it]  3%|██▋                                                                                  | 626/20117 [23:02<12:01:19,  2.22s/it]  3%|██▋                                                                                  | 627/20117 [23:04<11:59:16,  2.21s/it]  3%|██▋                                                                                  | 628/20117 [23:07<12:31:03,  2.31s/it]  3%|██▋                                                                                  | 629/20117 [23:09<12:18:50,  2.27s/it]  3%|██▋                                                                                  | 630/20117 [23:11<12:11:57,  2.25s/it]                                                                                                                                 {'loss': 0.2248, 'grad_norm': 0.47098028659820557, 'learning_rate': 0.0001996555440965081, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 367.41, 'epoch': 0.06}
  3%|██▋                                                                                  | 630/20117 [23:11<12:11:57,  2.25s/it]  3%|██▋                                                                                  | 631/20117 [23:13<12:04:37,  2.23s/it]  3%|██▋                                                                                  | 632/20117 [23:16<12:02:21,  2.22s/it]  3%|██▋                                                                                  | 633/20117 [23:18<12:04:56,  2.23s/it]  3%|██▋                                                                                  | 634/20117 [23:20<12:02:28,  2.22s/it]  3%|██▋                                                                                  | 635/20117 [23:22<11:57:24,  2.21s/it]  3%|██▋                                                                                  | 636/20117 [23:24<12:03:07,  2.23s/it]  3%|██▋                                                                                  | 637/20117 [23:27<12:00:56,  2.22s/it]  3%|██▋                                                                                  | 638/20117 [23:29<11:59:46,  2.22s/it]  3%|██▋                                                                                  | 639/20117 [23:31<11:56:23,  2.21s/it]  3%|██▋                                                                                  | 640/20117 [23:33<12:02:10,  2.22s/it]                                                                                                                                 {'loss': 0.2597, 'grad_norm': 0.2540164887905121, 'learning_rate': 0.000199642405939531, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 361.69, 'epoch': 0.06}
  3%|██▋                                                                                  | 640/20117 [23:33<12:02:10,  2.22s/it]  3%|██▋                                                                                  | 641/20117 [23:36<12:02:27,  2.23s/it]  3%|██▋                                                                                  | 642/20117 [23:38<12:08:18,  2.24s/it]  3%|██▋                                                                                  | 643/20117 [23:40<12:08:34,  2.24s/it]  3%|██▋                                                                                  | 644/20117 [23:42<12:00:31,  2.22s/it]  3%|██▋                                                                                  | 645/20117 [23:44<11:57:28,  2.21s/it]  3%|██▋                                                                                  | 646/20117 [23:47<11:52:43,  2.20s/it]  3%|██▋                                                                                  | 647/20117 [23:49<12:01:51,  2.22s/it]  3%|██▋                                                                                  | 648/20117 [23:51<11:58:01,  2.21s/it]  3%|██▋                                                                                  | 649/20117 [23:53<11:55:48,  2.21s/it]  3%|██▋                                                                                  | 650/20117 [23:55<11:55:26,  2.21s/it]                                                                                                                                 {'loss': 0.2623, 'grad_norm': 0.30327877402305603, 'learning_rate': 0.00019962902234224816, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 316.43, 'epoch': 0.06}
  3%|██▋                                                                                  | 650/20117 [23:55<11:55:26,  2.21s/it]  3%|██▊                                                                                  | 651/20117 [23:58<11:53:54,  2.20s/it]  3%|██▊                                                                                  | 652/20117 [24:00<11:55:39,  2.21s/it]  3%|██▊                                                                                  | 653/20117 [24:02<11:59:23,  2.22s/it]  3%|██▊                                                                                  | 654/20117 [24:04<11:58:14,  2.21s/it]  3%|██▊                                                                                  | 655/20117 [24:07<11:56:50,  2.21s/it]  3%|██▊                                                                                  | 656/20117 [24:09<11:59:26,  2.22s/it]  3%|██▊                                                                                  | 657/20117 [24:11<12:00:15,  2.22s/it]  3%|██▊                                                                                  | 658/20117 [24:13<12:06:58,  2.24s/it]  3%|██▊                                                                                  | 659/20117 [24:16<12:07:18,  2.24s/it]  3%|██▊                                                                                  | 660/20117 [24:18<12:07:36,  2.24s/it]                                                                                                                                 {'loss': 0.2571, 'grad_norm': 0.3211521804332733, 'learning_rate': 0.00019961539333762622, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 352.04, 'epoch': 0.07}
  3%|██▊                                                                                  | 660/20117 [24:18<12:07:36,  2.24s/it]  3%|██▊                                                                                  | 661/20117 [24:20<12:13:05,  2.26s/it]  3%|██▊                                                                                  | 662/20117 [24:22<12:17:05,  2.27s/it]  3%|██▊                                                                                  | 663/20117 [24:25<12:19:54,  2.28s/it]  3%|██▊                                                                                  | 664/20117 [24:27<12:17:50,  2.28s/it]  3%|██▊                                                                                  | 665/20117 [24:29<12:12:59,  2.26s/it]  3%|██▊                                                                                  | 666/20117 [24:31<12:09:53,  2.25s/it]  3%|██▊                                                                                  | 667/20117 [24:34<12:10:07,  2.25s/it]  3%|██▊                                                                                  | 668/20117 [24:36<12:07:05,  2.24s/it]  3%|██▊                                                                                  | 669/20117 [24:38<12:07:09,  2.24s/it]  3%|██▊                                                                                  | 670/20117 [24:40<12:08:42,  2.25s/it]                                                                                                                                 {'loss': 0.2531, 'grad_norm': 0.19880682229995728, 'learning_rate': 0.00019960151895923628, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 419.33, 'epoch': 0.07}
  3%|██▊                                                                                  | 670/20117 [24:40<12:08:42,  2.25s/it]  3%|██▊                                                                                  | 671/20117 [24:43<12:12:03,  2.26s/it]  3%|██▊                                                                                  | 672/20117 [24:45<12:09:49,  2.25s/it]  3%|██▊                                                                                  | 673/20117 [24:47<12:17:29,  2.28s/it]  3%|██▊                                                                                  | 674/20117 [24:49<12:06:36,  2.24s/it]  3%|██▊                                                                                  | 675/20117 [24:52<12:08:00,  2.25s/it]  3%|██▊                                                                                  | 676/20117 [24:54<12:12:46,  2.26s/it]  3%|██▊                                                                                  | 677/20117 [24:56<12:12:12,  2.26s/it]  3%|██▊                                                                                  | 678/20117 [24:58<12:12:11,  2.26s/it]  3%|██▊                                                                                  | 679/20117 [25:01<12:46:03,  2.36s/it]  3%|██▊                                                                                  | 680/20117 [25:03<12:36:25,  2.33s/it]                                                                                                                                 {'loss': 0.3275, 'grad_norm': 0.3732224702835083, 'learning_rate': 0.0001995873992412539, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 428.82, 'epoch': 0.07}
  3%|██▊                                                                                  | 680/20117 [25:03<12:36:25,  2.33s/it]  3%|██▉                                                                                  | 681/20117 [25:06<12:32:23,  2.32s/it]  3%|██▉                                                                                  | 682/20117 [25:08<12:23:55,  2.30s/it]  3%|██▉                                                                                  | 683/20117 [25:10<12:20:16,  2.29s/it]  3%|██▉                                                                                  | 684/20117 [25:12<12:20:52,  2.29s/it]  3%|██▉                                                                                  | 685/20117 [25:15<12:21:56,  2.29s/it]  3%|██▉                                                                                  | 686/20117 [25:17<12:18:52,  2.28s/it]  3%|██▉                                                                                  | 687/20117 [25:19<12:10:38,  2.26s/it]  3%|██▉                                                                                  | 688/20117 [25:21<12:15:03,  2.27s/it]  3%|██▉                                                                                  | 689/20117 [25:24<12:09:25,  2.25s/it]  3%|██▉                                                                                  | 690/20117 [25:26<12:03:39,  2.24s/it]                                                                                                                                 {'loss': 0.2884, 'grad_norm': 0.2961219847202301, 'learning_rate': 0.00019957303421845889, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 394.73, 'epoch': 0.07}
  3%|██▉                                                                                  | 690/20117 [25:26<12:03:39,  2.24s/it]  3%|██▉                                                                                  | 691/20117 [25:28<12:06:36,  2.24s/it]  3%|██▉                                                                                  | 692/20117 [25:30<12:08:30,  2.25s/it]  3%|██▉                                                                                  | 693/20117 [25:33<12:13:18,  2.27s/it]  3%|██▉                                                                                  | 694/20117 [25:35<12:13:25,  2.27s/it]  3%|██▉                                                                                  | 695/20117 [25:37<12:21:19,  2.29s/it]  3%|██▉                                                                                  | 696/20117 [25:40<12:14:44,  2.27s/it]  3%|██▉                                                                                  | 697/20117 [25:42<12:09:55,  2.26s/it]  3%|██▉                                                                                  | 698/20117 [25:44<12:11:18,  2.26s/it]  3%|██▉                                                                                  | 699/20117 [25:46<12:09:35,  2.25s/it]  3%|██▉                                                                                  | 700/20117 [25:49<12:06:46,  2.25s/it]                                                                                                                                 {'loss': 0.25, 'grad_norm': 0.4014001488685608, 'learning_rate': 0.00019955842392623539, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 402.5, 'epoch': 0.07}
  3%|██▉                                                                                  | 700/20117 [25:49<12:06:46,  2.25s/it]  3%|██▉                                                                                  | 701/20117 [25:51<12:06:50,  2.25s/it]  3%|██▉                                                                                  | 702/20117 [25:53<12:11:45,  2.26s/it]  3%|██▉                                                                                  | 703/20117 [25:55<12:12:51,  2.26s/it]  3%|██▉                                                                                  | 704/20117 [25:58<12:14:09,  2.27s/it]  4%|██▉                                                                                  | 705/20117 [26:00<12:15:13,  2.27s/it]  4%|██▉                                                                                  | 706/20117 [26:02<12:19:09,  2.28s/it]  4%|██▉                                                                                  | 707/20117 [26:04<12:12:16,  2.26s/it]  4%|██▉                                                                                  | 708/20117 [26:07<12:12:09,  2.26s/it]  4%|██▉                                                                                  | 709/20117 [26:09<12:18:36,  2.28s/it]  4%|██▉                                                                                  | 710/20117 [26:11<12:16:30,  2.28s/it]                                                                                                                                 {'loss': 0.3208, 'grad_norm': 0.3465085029602051, 'learning_rate': 0.0001995435684005716, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 418.41, 'epoch': 0.07}
  4%|██▉                                                                                  | 710/20117 [26:11<12:16:30,  2.28s/it]  4%|███                                                                                  | 711/20117 [26:13<12:12:39,  2.27s/it]  4%|███                                                                                  | 712/20117 [26:16<12:06:28,  2.25s/it]  4%|███                                                                                  | 713/20117 [26:18<12:12:03,  2.26s/it]  4%|███                                                                                  | 714/20117 [26:20<12:05:22,  2.24s/it]  4%|███                                                                                  | 715/20117 [26:22<12:04:03,  2.24s/it]  4%|███                                                                                  | 716/20117 [26:25<11:57:56,  2.22s/it]  4%|███                                                                                  | 717/20117 [26:27<12:00:33,  2.23s/it]  4%|███                                                                                  | 718/20117 [26:29<12:02:01,  2.23s/it]  4%|███                                                                                  | 719/20117 [26:31<12:02:06,  2.23s/it]  4%|███                                                                                  | 720/20117 [26:34<11:59:29,  2.23s/it]                                                                                                                                 {'loss': 0.2538, 'grad_norm': 0.302223265171051, 'learning_rate': 0.0001995284676780598, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 303.27, 'epoch': 0.07}
  4%|███                                                                                  | 720/20117 [26:34<11:59:29,  2.23s/it]  4%|███                                                                                  | 721/20117 [26:36<12:02:03,  2.23s/it]  4%|███                                                                                  | 722/20117 [26:38<12:12:16,  2.27s/it]  4%|███                                                                                  | 723/20117 [26:40<12:07:43,  2.25s/it]  4%|███                                                                                  | 724/20117 [26:43<12:07:03,  2.25s/it]  4%|███                                                                                  | 725/20117 [26:45<12:11:01,  2.26s/it]  4%|███                                                                                  | 726/20117 [26:47<12:08:33,  2.25s/it]  4%|███                                                                                  | 727/20117 [26:49<12:06:08,  2.25s/it]  4%|███                                                                                  | 728/20117 [26:52<12:08:00,  2.25s/it]  4%|███                                                                                  | 729/20117 [26:54<12:13:33,  2.27s/it]  4%|███                                                                                  | 730/20117 [26:56<12:14:55,  2.27s/it]                                                                                                                                 {'loss': 0.2559, 'grad_norm': 0.27174416184425354, 'learning_rate': 0.00019951312179589632, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 373.05, 'epoch': 0.07}
  4%|███                                                                                  | 730/20117 [26:56<12:14:55,  2.27s/it]  4%|███                                                                                  | 731/20117 [26:58<12:11:47,  2.26s/it]  4%|███                                                                                  | 732/20117 [27:01<12:40:56,  2.36s/it]  4%|███                                                                                  | 733/20117 [27:03<12:26:03,  2.31s/it]  4%|███                                                                                  | 734/20117 [27:05<12:13:54,  2.27s/it]  4%|███                                                                                  | 735/20117 [27:08<12:05:11,  2.24s/it]  4%|███                                                                                  | 736/20117 [27:10<12:01:28,  2.23s/it]  4%|███                                                                                  | 737/20117 [27:12<11:54:26,  2.21s/it]  4%|███                                                                                  | 738/20117 [27:14<11:55:18,  2.21s/it]  4%|███                                                                                  | 739/20117 [27:16<11:56:04,  2.22s/it]  4%|███▏                                                                                 | 740/20117 [27:19<11:56:35,  2.22s/it]                                                                                                                                 {'loss': 0.2655, 'grad_norm': 0.23477095365524292, 'learning_rate': 0.00019949753079188124, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 334.0, 'epoch': 0.07}
  4%|███▏                                                                                 | 740/20117 [27:19<11:56:35,  2.22s/it]  4%|███▏                                                                                 | 741/20117 [27:21<12:03:37,  2.24s/it]  4%|███▏                                                                                 | 742/20117 [27:23<12:04:16,  2.24s/it]  4%|███▏                                                                                 | 743/20117 [27:25<12:07:45,  2.25s/it]  4%|███▏                                                                                 | 744/20117 [27:28<12:05:17,  2.25s/it]  4%|███▏                                                                                 | 745/20117 [27:30<12:07:21,  2.25s/it]  4%|███▏                                                                                 | 746/20117 [27:32<12:02:13,  2.24s/it]  4%|███▏                                                                                 | 747/20117 [27:34<12:04:18,  2.24s/it]  4%|███▏                                                                                 | 748/20117 [27:37<12:08:49,  2.26s/it]  4%|███▏                                                                                 | 749/20117 [27:39<12:07:33,  2.25s/it]  4%|███▏                                                                                 | 750/20117 [27:41<11:56:10,  2.22s/it]                                                                                                                                 {'loss': 0.2869, 'grad_norm': 0.35739773511886597, 'learning_rate': 0.00019948169470441855, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 398.33, 'epoch': 0.07}
  4%|███▏                                                                                 | 750/20117 [27:41<11:56:10,  2.22s/it]  4%|███▏                                                                                 | 751/20117 [27:43<11:48:49,  2.20s/it]  4%|███▏                                                                                 | 752/20117 [27:45<11:45:26,  2.19s/it]  4%|███▏                                                                                 | 753/20117 [27:48<11:43:36,  2.18s/it]  4%|███▏                                                                                 | 754/20117 [27:50<11:39:19,  2.17s/it]  4%|███▏                                                                                 | 755/20117 [27:52<11:40:01,  2.17s/it]  4%|███▏                                                                                 | 756/20117 [27:54<11:48:15,  2.19s/it]  4%|███▏                                                                                 | 757/20117 [27:56<11:44:59,  2.18s/it]  4%|███▏                                                                                 | 758/20117 [27:58<11:43:52,  2.18s/it]  4%|███▏                                                                                 | 759/20117 [28:01<11:46:45,  2.19s/it]  4%|███▏                                                                                 | 760/20117 [28:03<12:00:56,  2.23s/it]                                                                                                                                 {'loss': 0.254, 'grad_norm': 0.42109552025794983, 'learning_rate': 0.0001994656135725159, 'memory/max_active (GiB)': 19.69, 'memory/max_allocated (GiB)': 19.69, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 303.95, 'epoch': 0.08}
  4%|███▏                                                                                 | 760/20117 [28:03<12:00:56,  2.23s/it]  4%|███▏                                                                                 | 761/20117 [28:05<12:01:50,  2.24s/it]  4%|███▏                                                                                 | 762/20117 [28:07<12:00:49,  2.23s/it]  4%|███▏                                                                                 | 763/20117 [28:10<12:05:36,  2.25s/it]  4%|███▏                                                                                 | 764/20117 [28:12<12:10:27,  2.26s/it]  4%|███▏                                                                                 | 765/20117 [28:14<12:11:03,  2.27s/it]  4%|███▏                                                                                 | 766/20117 [28:17<12:08:11,  2.26s/it]  4%|███▏                                                                                 | 767/20117 [28:19<12:00:58,  2.24s/it]  4%|███▏                                                                                 | 768/20117 [28:21<12:00:32,  2.23s/it]  4%|███▏                                                                                 | 769/20117 [28:23<11:51:38,  2.21s/it]  4%|███▎                                                                                 | 770/20117 [28:25<11:51:06,  2.21s/it]                                                                                                                                 {'loss': 0.2718, 'grad_norm': 0.5730820298194885, 'learning_rate': 0.00019944928743578446, 'memory/max_active (GiB)': 19.09, 'memory/max_allocated (GiB)': 19.09, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 379.86, 'epoch': 0.08}
  4%|███▎                                                                                 | 770/20117 [28:25<11:51:06,  2.21s/it]  4%|███▎                                                                                 | 771/20117 [28:28<12:04:30,  2.25s/it]  4%|███▎                                                                                 | 772/20117 [28:30<12:12:38,  2.27s/it]  4%|███▎                                                                                 | 773/20117 [28:32<12:05:40,  2.25s/it]  4%|███▎                                                                                 | 774/20117 [28:34<11:55:13,  2.22s/it]  4%|███▎                                                                                 | 775/20117 [28:37<11:49:51,  2.20s/it]  4%|███▎                                                                                 | 776/20117 [28:39<11:46:30,  2.19s/it]  4%|███▎                                                                                 | 777/20117 [28:41<11:51:06,  2.21s/it]  4%|███▎                                                                                 | 778/20117 [28:43<11:58:45,  2.23s/it]  4%|███▎                                                                                 | 779/20117 [28:45<12:01:19,  2.24s/it]  4%|███▎                                                                                 | 780/20117 [28:48<11:59:46,  2.23s/it]                                                                                                                                 {'loss': 0.3039, 'grad_norm': 0.3591574430465698, 'learning_rate': 0.000199432716334439, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 387.31, 'epoch': 0.08}
  4%|███▎                                                                                 | 780/20117 [28:48<11:59:46,  2.23s/it]  4%|███▎                                                                                 | 781/20117 [28:50<12:04:45,  2.25s/it]  4%|███▎                                                                                 | 782/20117 [28:52<12:01:01,  2.24s/it]  4%|███▎                                                                                 | 783/20117 [28:54<12:07:51,  2.26s/it]  4%|███▎                                                                                 | 784/20117 [28:57<12:04:21,  2.25s/it]  4%|███▎                                                                                 | 785/20117 [28:59<12:24:44,  2.31s/it]  4%|███▎                                                                                 | 786/20117 [29:01<12:18:57,  2.29s/it]  4%|███▎                                                                                 | 787/20117 [29:04<12:47:34,  2.38s/it]  4%|███▎                                                                                 | 788/20117 [29:06<12:35:54,  2.35s/it]  4%|███▎                                                                                 | 789/20117 [29:09<12:57:21,  2.41s/it]  4%|███▎                                                                                 | 790/20117 [29:11<12:52:52,  2.40s/it]                                                                                                                                 {'loss': 0.3044, 'grad_norm': 0.2447095662355423, 'learning_rate': 0.0001994159003092976, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 332.59, 'epoch': 0.08}
  4%|███▎                                                                                 | 790/20117 [29:11<12:52:52,  2.40s/it]  4%|███▎                                                                                 | 791/20117 [29:13<12:38:45,  2.36s/it]  4%|███▎                                                                                 | 792/20117 [29:16<12:29:14,  2.33s/it]  4%|███▎                                                                                 | 793/20117 [29:18<12:29:06,  2.33s/it]  4%|███▎                                                                                 | 794/20117 [29:20<12:29:19,  2.33s/it]  4%|███▎                                                                                 | 795/20117 [29:23<12:27:14,  2.32s/it]  4%|███▎                                                                                 | 796/20117 [29:25<12:23:36,  2.31s/it]  4%|███▎                                                                                 | 797/20117 [29:27<12:14:56,  2.28s/it]  4%|███▎                                                                                 | 798/20117 [29:29<12:15:50,  2.29s/it]  4%|███▍                                                                                 | 799/20117 [29:32<12:20:32,  2.30s/it]  4%|███▍                                                                                 | 800/20117 [29:34<12:17:13,  2.29s/it]                                                                                                                                 {'loss': 0.21, 'grad_norm': 1.134704351425171, 'learning_rate': 0.0001993988394017817, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 376.02, 'epoch': 0.08}
  4%|███▍                                                                                 | 800/20117 [29:34<12:17:13,  2.29s/it]  4%|███▍                                                                                 | 801/20117 [29:36<12:18:31,  2.29s/it]  4%|███▍                                                                                 | 802/20117 [29:39<12:24:00,  2.31s/it]  4%|███▍                                                                                 | 803/20117 [29:41<12:16:15,  2.29s/it]  4%|███▍                                                                                 | 804/20117 [29:43<12:20:54,  2.30s/it]  4%|███▍                                                                                 | 805/20117 [29:46<12:14:57,  2.28s/it]  4%|███▍                                                                                 | 806/20117 [29:48<12:08:39,  2.26s/it]  4%|███▍                                                                                 | 807/20117 [29:50<12:09:45,  2.27s/it]  4%|███▍                                                                                 | 808/20117 [29:52<12:11:47,  2.27s/it]  4%|███▍                                                                                 | 809/20117 [29:55<12:16:51,  2.29s/it]  4%|███▍                                                                                 | 810/20117 [29:57<12:10:09,  2.27s/it]                                                                                                                                 {'loss': 0.3189, 'grad_norm': 0.4009522795677185, 'learning_rate': 0.00019938153365391595, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 409.87, 'epoch': 0.08}
  4%|███▍                                                                                 | 810/20117 [29:57<12:10:09,  2.27s/it]  4%|███▍                                                                                 | 811/20117 [29:59<12:20:55,  2.30s/it]  4%|███▍                                                                                 | 812/20117 [30:02<12:17:02,  2.29s/it]  4%|███▍                                                                                 | 813/20117 [30:04<12:21:02,  2.30s/it]  4%|███▍                                                                                 | 814/20117 [30:06<12:12:19,  2.28s/it]  4%|███▍                                                                                 | 815/20117 [30:08<12:07:15,  2.26s/it]  4%|███▍                                                                                 | 816/20117 [30:11<12:05:29,  2.26s/it]  4%|███▍                                                                                 | 817/20117 [30:13<12:12:32,  2.28s/it]  4%|███▍                                                                                 | 818/20117 [30:15<12:05:23,  2.26s/it]  4%|███▍                                                                                 | 819/20117 [30:17<12:05:17,  2.26s/it]  4%|███▍                                                                                 | 820/20117 [30:20<12:01:19,  2.24s/it]                                                                                                                                 {'loss': 0.3242, 'grad_norm': 0.32234618067741394, 'learning_rate': 0.00019936398310832802, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 429.4, 'epoch': 0.08}
  4%|███▍                                                                                 | 820/20117 [30:20<12:01:19,  2.24s/it]  4%|███▍                                                                                 | 821/20117 [30:22<12:01:30,  2.24s/it]  4%|███▍                                                                                 | 822/20117 [30:24<12:08:41,  2.27s/it]  4%|███▍                                                                                 | 823/20117 [30:26<12:15:40,  2.29s/it]  4%|███▍                                                                                 | 824/20117 [30:29<12:07:10,  2.26s/it]  4%|███▍                                                                                 | 825/20117 [30:31<11:57:47,  2.23s/it]  4%|███▍                                                                                 | 826/20117 [30:33<12:07:02,  2.26s/it]  4%|███▍                                                                                 | 827/20117 [30:35<12:15:31,  2.29s/it]  4%|███▍                                                                                 | 828/20117 [30:38<12:07:17,  2.26s/it]  4%|███▌                                                                                 | 829/20117 [30:40<12:07:48,  2.26s/it]  4%|███▌                                                                                 | 830/20117 [30:42<12:01:02,  2.24s/it]                                                                                                                                 {'loss': 0.2646, 'grad_norm': 0.42835402488708496, 'learning_rate': 0.00019934618780824865, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 334.03, 'epoch': 0.08}
  4%|███▌                                                                                 | 830/20117 [30:42<12:01:02,  2.24s/it]  4%|███▌                                                                                 | 831/20117 [30:44<11:56:30,  2.23s/it]  4%|███▌                                                                                 | 832/20117 [30:47<12:09:39,  2.27s/it]  4%|███▌                                                                                 | 833/20117 [30:49<12:02:57,  2.25s/it]  4%|███▌                                                                                 | 834/20117 [30:51<12:01:13,  2.24s/it]  4%|███▌                                                                                 | 835/20117 [30:53<11:58:16,  2.24s/it]  4%|███▌                                                                                 | 836/20117 [30:56<11:58:37,  2.24s/it]  4%|███▌                                                                                 | 837/20117 [30:58<11:58:30,  2.24s/it]  4%|███▌                                                                                 | 838/20117 [31:00<12:28:51,  2.33s/it]  4%|███▌                                                                                 | 839/20117 [31:03<12:22:11,  2.31s/it]  4%|███▌                                                                                 | 840/20117 [31:05<12:15:12,  2.29s/it]                                                                                                                                 {'loss': 0.2891, 'grad_norm': 0.39109423756599426, 'learning_rate': 0.00019932814779751143, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 386.65, 'epoch': 0.08}
  4%|███▌                                                                                 | 840/20117 [31:05<12:15:12,  2.29s/it]  4%|███▌                                                                                 | 841/20117 [31:07<12:07:55,  2.27s/it]  4%|███▌                                                                                 | 842/20117 [31:09<12:08:10,  2.27s/it]  4%|███▌                                                                                 | 843/20117 [31:12<12:08:28,  2.27s/it]  4%|███▌                                                                                 | 844/20117 [31:14<12:03:10,  2.25s/it]  4%|███▌                                                                                 | 845/20117 [31:16<12:04:05,  2.25s/it]  4%|███▌                                                                                 | 846/20117 [31:18<11:58:33,  2.24s/it]  4%|███▌                                                                                 | 847/20117 [31:21<12:01:07,  2.25s/it]  4%|███▌                                                                                 | 848/20117 [31:23<11:56:49,  2.23s/it]  4%|███▌                                                                                 | 849/20117 [31:25<11:59:00,  2.24s/it]  4%|███▌                                                                                 | 850/20117 [31:27<12:06:14,  2.26s/it]                                                                                                                                 {'loss': 0.2478, 'grad_norm': 0.30428260564804077, 'learning_rate': 0.00019930986312055268, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 344.59, 'epoch': 0.08}
  4%|███▌                                                                                 | 850/20117 [31:27<12:06:14,  2.26s/it]  4%|███▌                                                                                 | 851/20117 [31:30<12:00:01,  2.24s/it]  4%|███▌                                                                                 | 852/20117 [31:32<12:05:14,  2.26s/it]  4%|███▌                                                                                 | 853/20117 [31:34<11:59:37,  2.24s/it]  4%|███▌                                                                                 | 854/20117 [31:36<12:00:03,  2.24s/it]  4%|███▌                                                                                 | 855/20117 [31:38<11:58:03,  2.24s/it]  4%|███▌                                                                                 | 856/20117 [31:41<11:58:46,  2.24s/it]  4%|███▌                                                                                 | 857/20117 [31:43<12:00:44,  2.25s/it]  4%|███▋                                                                                 | 858/20117 [31:45<11:57:40,  2.24s/it]  4%|███▋                                                                                 | 859/20117 [31:47<11:57:29,  2.24s/it]  4%|███▋                                                                                 | 860/20117 [31:50<11:57:10,  2.23s/it]                                                                                                                                 {'loss': 0.2942, 'grad_norm': 0.41264912486076355, 'learning_rate': 0.00019929133382241146, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 315.94, 'epoch': 0.09}
  4%|███▋                                                                                 | 860/20117 [31:50<11:57:10,  2.23s/it]  4%|███▋                                                                                 | 861/20117 [31:52<11:50:50,  2.21s/it]  4%|███▋                                                                                 | 862/20117 [31:54<11:53:33,  2.22s/it]  4%|███▋                                                                                 | 863/20117 [31:56<11:56:15,  2.23s/it]  4%|███▋                                                                                 | 864/20117 [31:59<11:56:27,  2.23s/it]  4%|███▋                                                                                 | 865/20117 [32:01<11:57:50,  2.24s/it]  4%|███▋                                                                                 | 866/20117 [32:03<11:57:00,  2.23s/it]  4%|███▋                                                                                 | 867/20117 [32:05<11:50:56,  2.22s/it]  4%|███▋                                                                                 | 868/20117 [32:08<11:58:45,  2.24s/it]  4%|███▋                                                                                 | 869/20117 [32:10<11:56:41,  2.23s/it]  4%|███▋                                                                                 | 870/20117 [32:12<11:53:11,  2.22s/it]                                                                                                                                 {'loss': 0.2403, 'grad_norm': 0.3414939045906067, 'learning_rate': 0.00019927255994872932, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 373.66, 'epoch': 0.09}
  4%|███▋                                                                                 | 870/20117 [32:12<11:53:11,  2.22s/it]  4%|███▋                                                                                 | 871/20117 [32:14<11:53:26,  2.22s/it]  4%|███▋                                                                                 | 872/20117 [32:16<11:48:18,  2.21s/it]  4%|███▋                                                                                 | 873/20117 [32:19<11:53:21,  2.22s/it]  4%|███▋                                                                                 | 874/20117 [32:21<11:55:07,  2.23s/it]  4%|███▋                                                                                 | 875/20117 [32:23<11:53:36,  2.23s/it]  4%|███▋                                                                                 | 876/20117 [32:25<11:50:19,  2.22s/it]  4%|███▋                                                                                 | 877/20117 [32:28<11:56:26,  2.23s/it]  4%|███▋                                                                                 | 878/20117 [32:30<12:03:43,  2.26s/it]  4%|███▋                                                                                 | 879/20117 [32:32<12:08:59,  2.27s/it]  4%|███▋                                                                                 | 880/20117 [32:34<12:15:08,  2.29s/it]                                                                                                                                 {'loss': 0.2024, 'grad_norm': 0.3265244662761688, 'learning_rate': 0.00019925354154575028, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 304.2, 'epoch': 0.09}
  4%|███▋                                                                                 | 880/20117 [32:34<12:15:08,  2.29s/it]  4%|███▋                                                                                 | 881/20117 [32:37<12:13:05,  2.29s/it]  4%|███▋                                                                                 | 882/20117 [32:39<12:13:41,  2.29s/it]  4%|███▋                                                                                 | 883/20117 [32:41<12:26:20,  2.33s/it]  4%|███▋                                                                                 | 884/20117 [32:44<12:22:47,  2.32s/it]  4%|███▋                                                                                 | 885/20117 [32:46<12:16:25,  2.30s/it]  4%|███▋                                                                                 | 886/20117 [32:48<12:11:45,  2.28s/it]  4%|███▋                                                                                 | 887/20117 [32:51<12:09:22,  2.28s/it]  4%|███▊                                                                                 | 888/20117 [32:53<12:00:24,  2.25s/it]  4%|███▊                                                                                 | 889/20117 [32:55<12:01:32,  2.25s/it]  4%|███▊                                                                                 | 890/20117 [32:57<12:07:07,  2.27s/it]                                                                                                                                 {'loss': 0.2319, 'grad_norm': 0.45628246665000916, 'learning_rate': 0.00019923427866032074, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 361.11, 'epoch': 0.09}
  4%|███▊                                                                                 | 890/20117 [32:57<12:07:07,  2.27s/it]  4%|███▊                                                                                 | 891/20117 [33:00<12:41:40,  2.38s/it]  4%|███▊                                                                                 | 892/20117 [33:02<12:31:29,  2.35s/it]  4%|███▊                                                                                 | 893/20117 [33:04<12:19:16,  2.31s/it]  4%|███▊                                                                                 | 894/20117 [33:07<12:07:46,  2.27s/it]  4%|███▊                                                                                 | 895/20117 [33:09<12:03:46,  2.26s/it]  4%|███▊                                                                                 | 896/20117 [33:11<12:03:37,  2.26s/it]  4%|███▊                                                                                 | 897/20117 [33:13<12:02:45,  2.26s/it]  4%|███▊                                                                                 | 898/20117 [33:16<11:56:25,  2.24s/it]  4%|███▊                                                                                 | 899/20117 [33:18<11:51:14,  2.22s/it]  4%|███▊                                                                                 | 900/20117 [33:20<11:55:11,  2.23s/it]                                                                                                                                 {'loss': 0.2944, 'grad_norm': 0.15768177807331085, 'learning_rate': 0.00019921477133988917, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 342.86, 'epoch': 0.09}
  4%|███▊                                                                                 | 900/20117 [33:20<11:55:11,  2.23s/it]  4%|███▊                                                                                 | 901/20117 [33:22<11:52:58,  2.23s/it]  4%|███▊                                                                                 | 902/20117 [33:24<11:53:14,  2.23s/it]  4%|███▊                                                                                 | 903/20117 [33:27<11:55:43,  2.24s/it]  4%|███▊                                                                                 | 904/20117 [33:29<11:55:31,  2.23s/it]  4%|███▊                                                                                 | 905/20117 [33:31<11:52:11,  2.22s/it]  5%|███▊                                                                                 | 906/20117 [33:33<11:51:11,  2.22s/it]  5%|███▊                                                                                 | 907/20117 [33:36<12:01:55,  2.25s/it]  5%|███▊                                                                                 | 908/20117 [33:38<11:58:05,  2.24s/it]  5%|███▊                                                                                 | 909/20117 [33:40<11:57:31,  2.24s/it]  5%|███▊                                                                                 | 910/20117 [33:42<11:51:51,  2.22s/it]                                                                                                                                 {'loss': 0.2697, 'grad_norm': 0.31603825092315674, 'learning_rate': 0.0001991950196325063, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 366.81, 'epoch': 0.09}
  5%|███▊                                                                                 | 910/20117 [33:42<11:51:51,  2.22s/it]  5%|███▊                                                                                 | 911/20117 [33:44<11:51:11,  2.22s/it]  5%|███▊                                                                                 | 912/20117 [33:47<11:53:00,  2.23s/it]  5%|███▊                                                                                 | 913/20117 [33:49<11:56:05,  2.24s/it]  5%|███▊                                                                                 | 914/20117 [33:51<11:50:28,  2.22s/it]  5%|███▊                                                                                 | 915/20117 [33:53<11:53:46,  2.23s/it]  5%|███▊                                                                                 | 916/20117 [33:56<11:47:48,  2.21s/it]  5%|███▊                                                                                 | 917/20117 [33:58<11:43:14,  2.20s/it]  5%|███▉                                                                                 | 918/20117 [34:00<11:44:07,  2.20s/it]  5%|███▉                                                                                 | 919/20117 [34:02<11:47:39,  2.21s/it]  5%|███▉                                                                                 | 920/20117 [34:04<11:54:24,  2.23s/it]                                                                                                                                 {'loss': 0.2915, 'grad_norm': 0.30974528193473816, 'learning_rate': 0.00019917502358682474, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 333.81, 'epoch': 0.09}
  5%|███▉                                                                                 | 920/20117 [34:04<11:54:24,  2.23s/it]  5%|███▉                                                                                 | 921/20117 [34:07<11:53:53,  2.23s/it]  5%|███▉                                                                                 | 922/20117 [34:09<11:47:57,  2.21s/it]  5%|███▉                                                                                 | 923/20117 [34:11<11:48:25,  2.21s/it]  5%|███▉                                                                                 | 924/20117 [34:13<11:45:42,  2.21s/it]  5%|███▉                                                                                 | 925/20117 [34:16<11:49:28,  2.22s/it]  5%|███▉                                                                                 | 926/20117 [34:18<11:48:38,  2.22s/it]  5%|███▉                                                                                 | 927/20117 [34:20<11:52:16,  2.23s/it]  5%|███▉                                                                                 | 928/20117 [34:22<11:52:17,  2.23s/it]  5%|███▉                                                                                 | 929/20117 [34:24<11:55:58,  2.24s/it]  5%|███▉                                                                                 | 930/20117 [34:27<12:00:24,  2.25s/it]                                                                                                                                 {'loss': 0.2984, 'grad_norm': 0.5116154551506042, 'learning_rate': 0.00019915478325209892, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 420.26, 'epoch': 0.09}
  5%|███▉                                                                                 | 930/20117 [34:27<12:00:24,  2.25s/it]  5%|███▉                                                                                 | 931/20117 [34:29<11:59:33,  2.25s/it]  5%|███▉                                                                                 | 932/20117 [34:31<11:52:50,  2.23s/it]  5%|███▉                                                                                 | 933/20117 [34:33<11:50:13,  2.22s/it]  5%|███▉                                                                                 | 934/20117 [34:36<11:47:42,  2.21s/it]  5%|███▉                                                                                 | 935/20117 [34:38<11:44:01,  2.20s/it]  5%|███▉                                                                                 | 936/20117 [34:40<11:44:26,  2.20s/it]  5%|███▉                                                                                 | 937/20117 [34:42<11:47:37,  2.21s/it]  5%|███▉                                                                                 | 938/20117 [34:44<11:50:39,  2.22s/it]  5%|███▉                                                                                 | 939/20117 [34:47<11:55:16,  2.24s/it]  5%|███▉                                                                                 | 940/20117 [34:49<11:48:14,  2.22s/it]                                                                                                                                 {'loss': 0.2456, 'grad_norm': 0.4105079174041748, 'learning_rate': 0.00019913429867818517, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 297.2, 'epoch': 0.09}
  5%|███▉                                                                                 | 940/20117 [34:49<11:48:14,  2.22s/it]  5%|███▉                                                                                 | 941/20117 [34:51<11:46:59,  2.21s/it]  5%|███▉                                                                                 | 942/20117 [34:53<11:47:50,  2.21s/it]  5%|███▉                                                                                 | 943/20117 [34:56<12:22:51,  2.32s/it]  5%|███▉                                                                                 | 944/20117 [34:58<12:15:02,  2.30s/it]  5%|███▉                                                                                 | 945/20117 [35:00<12:19:17,  2.31s/it]  5%|███▉                                                                                 | 946/20117 [35:03<12:08:17,  2.28s/it]  5%|████                                                                                 | 947/20117 [35:05<11:58:00,  2.25s/it]  5%|████                                                                                 | 948/20117 [35:07<11:51:01,  2.23s/it]  5%|████                                                                                 | 949/20117 [35:09<11:44:40,  2.21s/it]  5%|████                                                                                 | 950/20117 [35:11<11:48:08,  2.22s/it]                                                                                                                                 {'loss': 0.2974, 'grad_norm': 0.32649049162864685, 'learning_rate': 0.00019911356991554122, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 395.2, 'epoch': 0.09}
  5%|████                                                                                 | 950/20117 [35:11<11:48:08,  2.22s/it]  5%|████                                                                                 | 951/20117 [35:14<11:46:50,  2.21s/it]  5%|████                                                                                 | 952/20117 [35:16<11:40:47,  2.19s/it]  5%|████                                                                                 | 953/20117 [35:18<11:37:33,  2.18s/it]  5%|████                                                                                 | 954/20117 [35:20<11:47:39,  2.22s/it]  5%|████                                                                                 | 955/20117 [35:23<11:53:14,  2.23s/it]  5%|████                                                                                 | 956/20117 [35:25<11:50:07,  2.22s/it]  5%|████                                                                                 | 957/20117 [35:27<11:51:50,  2.23s/it]  5%|████                                                                                 | 958/20117 [35:29<11:45:37,  2.21s/it]  5%|████                                                                                 | 959/20117 [35:31<11:53:16,  2.23s/it]  5%|████                                                                                 | 960/20117 [35:34<11:56:05,  2.24s/it]                                                                                                                                 {'loss': 0.2627, 'grad_norm': 0.3354904353618622, 'learning_rate': 0.00019909259701522645, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 324.97, 'epoch': 0.1}
  5%|████                                                                                 | 960/20117 [35:34<11:56:05,  2.24s/it]  5%|████                                                                                 | 961/20117 [35:36<12:00:08,  2.26s/it]  5%|████                                                                                 | 962/20117 [35:38<11:54:09,  2.24s/it]  5%|████                                                                                 | 963/20117 [35:40<11:51:40,  2.23s/it]  5%|████                                                                                 | 964/20117 [35:43<11:45:43,  2.21s/it]  5%|████                                                                                 | 965/20117 [35:45<11:39:43,  2.19s/it]  5%|████                                                                                 | 966/20117 [35:47<11:41:20,  2.20s/it]  5%|████                                                                                 | 967/20117 [35:49<11:43:43,  2.20s/it]  5%|████                                                                                 | 968/20117 [35:51<11:46:11,  2.21s/it]  5%|████                                                                                 | 969/20117 [35:54<11:54:04,  2.24s/it]  5%|████                                                                                 | 970/20117 [35:56<12:00:51,  2.26s/it]                                                                                                                                 {'loss': 0.2386, 'grad_norm': 0.33404502272605896, 'learning_rate': 0.00019907138002890154, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 317.18, 'epoch': 0.1}
  5%|████                                                                                 | 970/20117 [35:56<12:00:51,  2.26s/it]  5%|████                                                                                 | 971/20117 [35:58<12:05:05,  2.27s/it]  5%|████                                                                                 | 972/20117 [36:00<12:01:46,  2.26s/it]  5%|████                                                                                 | 973/20117 [36:03<11:59:28,  2.25s/it]  5%|████                                                                                 | 974/20117 [36:05<12:04:51,  2.27s/it]  5%|████                                                                                 | 975/20117 [36:07<12:09:01,  2.29s/it]  5%|████                                                                                 | 976/20117 [36:10<11:57:04,  2.25s/it]  5%|████▏                                                                                | 977/20117 [36:12<11:54:36,  2.24s/it]  5%|████▏                                                                                | 978/20117 [36:14<11:59:50,  2.26s/it]  5%|████▏                                                                                | 979/20117 [36:16<12:01:27,  2.26s/it]  5%|████▏                                                                                | 980/20117 [36:19<11:55:07,  2.24s/it]                                                                                                                                 {'loss': 0.2362, 'grad_norm': 0.3161865472793579, 'learning_rate': 0.0001990499190088284, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 310.78, 'epoch': 0.1}
  5%|████▏                                                                                | 980/20117 [36:19<11:55:07,  2.24s/it]  5%|████▏                                                                                | 981/20117 [36:21<11:54:56,  2.24s/it]  5%|████▏                                                                                | 982/20117 [36:23<11:54:07,  2.24s/it]  5%|████▏                                                                                | 983/20117 [36:25<11:52:58,  2.24s/it]  5%|████▏                                                                                | 984/20117 [36:27<11:53:49,  2.24s/it]  5%|████▏                                                                                | 985/20117 [36:30<11:54:39,  2.24s/it]  5%|████▏                                                                                | 986/20117 [36:32<11:55:15,  2.24s/it]  5%|████▏                                                                                | 987/20117 [36:34<11:54:00,  2.24s/it]  5%|████▏                                                                                | 988/20117 [36:36<11:55:45,  2.25s/it]  5%|████▏                                                                                | 989/20117 [36:39<11:49:06,  2.22s/it]  5%|████▏                                                                                | 990/20117 [36:41<11:51:01,  2.23s/it]                                                                                                                                 {'loss': 0.2792, 'grad_norm': 0.33621177077293396, 'learning_rate': 0.00019902821400787004, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 333.17, 'epoch': 0.1}
  5%|████▏                                                                                | 990/20117 [36:41<11:51:01,  2.23s/it]  5%|████▏                                                                                | 991/20117 [36:43<11:50:23,  2.23s/it]  5%|████▏                                                                                | 992/20117 [36:45<11:50:06,  2.23s/it]  5%|████▏                                                                                | 993/20117 [36:47<11:44:53,  2.21s/it]  5%|████▏                                                                                | 994/20117 [36:50<11:40:09,  2.20s/it]  5%|████▏                                                                                | 995/20117 [36:52<12:16:40,  2.31s/it]  5%|████▏                                                                                | 996/20117 [36:54<12:10:31,  2.29s/it]  5%|████▏                                                                                | 997/20117 [36:57<12:04:29,  2.27s/it]  5%|████▏                                                                                | 998/20117 [36:59<12:02:03,  2.27s/it]  5%|████▏                                                                                | 999/20117 [37:01<11:52:09,  2.24s/it]  5%|████▏                                                                               | 1000/20117 [37:03<11:49:55,  2.23s/it]                                                                                                                                 {'loss': 0.2622, 'grad_norm': 0.09930042922496796, 'learning_rate': 0.00019900626507949053, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 365.14, 'epoch': 0.1}
  5%|████▏                                                                               | 1000/20117 [37:03<11:49:55,  2.23s/it]  5%|████▏                                                                               | 1001/20117 [37:06<11:45:39,  2.21s/it]  5%|████▏                                                                               | 1002/20117 [37:08<11:45:56,  2.22s/it]  5%|████▏                                                                               | 1003/20117 [37:10<11:48:34,  2.22s/it]  5%|████▏                                                                               | 1004/20117 [37:12<11:52:26,  2.24s/it]  5%|████▏                                                                               | 1005/20117 [37:14<11:49:53,  2.23s/it]  5%|████▏                                                                               | 1006/20117 [37:17<11:49:01,  2.23s/it]  5%|████▏                                                                               | 1007/20117 [37:19<11:47:09,  2.22s/it]  5%|████▏                                                                               | 1008/20117 [37:21<11:50:47,  2.23s/it]  5%|████▏                                                                               | 1009/20117 [37:23<11:47:59,  2.22s/it]  5%|████▏                                                                               | 1010/20117 [37:26<11:46:44,  2.22s/it]                                                                                                                                 {'loss': 0.2214, 'grad_norm': 0.3367346525192261, 'learning_rate': 0.00019898407227775464, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 342.03, 'epoch': 0.1}
  5%|████▏                                                                               | 1010/20117 [37:26<11:46:44,  2.22s/it]  5%|████▏                                                                               | 1011/20117 [37:28<11:43:31,  2.21s/it]  5%|████▏                                                                               | 1012/20117 [37:30<11:40:48,  2.20s/it]  5%|████▏                                                                               | 1013/20117 [37:32<11:39:44,  2.20s/it]  5%|████▏                                                                               | 1014/20117 [37:34<11:34:55,  2.18s/it]  5%|████▏                                                                               | 1015/20117 [37:36<11:31:29,  2.17s/it]  5%|████▏                                                                               | 1016/20117 [37:39<11:42:47,  2.21s/it]  5%|████▏                                                                               | 1017/20117 [37:41<11:45:09,  2.22s/it]  5%|████▎                                                                               | 1018/20117 [37:43<11:41:26,  2.20s/it]  5%|████▎                                                                               | 1019/20117 [37:45<11:43:01,  2.21s/it]  5%|████▎                                                                               | 1020/20117 [37:48<11:41:26,  2.20s/it]                                                                                                                                 {'loss': 0.3446, 'grad_norm': 0.35485416650772095, 'learning_rate': 0.00019896163565732798, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 378.53, 'epoch': 0.1}
  5%|████▎                                                                               | 1020/20117 [37:48<11:41:26,  2.20s/it]  5%|████▎                                                                               | 1021/20117 [37:50<11:51:42,  2.24s/it]  5%|████▎                                                                               | 1022/20117 [37:52<11:43:08,  2.21s/it]  5%|████▎                                                                               | 1023/20117 [37:54<11:40:38,  2.20s/it]  5%|████▎                                                                               | 1024/20117 [37:56<11:38:13,  2.19s/it]  5%|████▎                                                                               | 1025/20117 [37:59<11:41:03,  2.20s/it]  5%|████▎                                                                               | 1026/20117 [38:01<11:37:30,  2.19s/it]  5%|████▎                                                                               | 1027/20117 [38:03<11:36:30,  2.19s/it]  5%|████▎                                                                               | 1028/20117 [38:05<11:43:53,  2.21s/it]  5%|████▎                                                                               | 1029/20117 [38:07<11:44:19,  2.21s/it]  5%|████▎                                                                               | 1030/20117 [38:10<11:39:15,  2.20s/it]                                                                                                                                 {'loss': 0.2519, 'grad_norm': 0.31839531660079956, 'learning_rate': 0.0001989389552734767, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 390.74, 'epoch': 0.1}
  5%|████▎                                                                               | 1030/20117 [38:10<11:39:15,  2.20s/it]  5%|████▎                                                                               | 1031/20117 [38:12<11:37:04,  2.19s/it]  5%|████▎                                                                               | 1032/20117 [38:14<11:33:31,  2.18s/it]  5%|████▎                                                                               | 1033/20117 [38:16<11:42:02,  2.21s/it]  5%|████▎                                                                               | 1034/20117 [38:18<11:36:38,  2.19s/it]  5%|████▎                                                                               | 1035/20117 [38:21<11:40:51,  2.20s/it]  5%|████▎                                                                               | 1036/20117 [38:23<11:38:48,  2.20s/it]  5%|████▎                                                                               | 1037/20117 [38:25<11:37:59,  2.19s/it]  5%|████▎                                                                               | 1038/20117 [38:27<11:45:46,  2.22s/it]  5%|████▎                                                                               | 1039/20117 [38:30<12:00:54,  2.27s/it]  5%|████▎                                                                               | 1040/20117 [38:32<12:00:38,  2.27s/it]                                                                                                                                 {'loss': 0.3, 'grad_norm': 0.36318239569664, 'learning_rate': 0.0001989160311820673, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 404.8, 'epoch': 0.1}
  5%|████▎                                                                               | 1040/20117 [38:32<12:00:38,  2.27s/it]  5%|████▎                                                                               | 1041/20117 [38:34<11:55:15,  2.25s/it]  5%|████▎                                                                               | 1042/20117 [38:36<11:53:08,  2.24s/it]  5%|████▎                                                                               | 1043/20117 [38:38<11:46:44,  2.22s/it]  5%|████▎                                                                               | 1044/20117 [38:41<11:49:30,  2.23s/it]  5%|████▎                                                                               | 1045/20117 [38:43<11:49:56,  2.23s/it]  5%|████▎                                                                               | 1046/20117 [38:45<11:46:25,  2.22s/it]  5%|████▎                                                                               | 1047/20117 [38:47<11:44:39,  2.22s/it]  5%|████▍                                                                               | 1048/20117 [38:50<12:15:33,  2.31s/it]  5%|████▍                                                                               | 1049/20117 [38:52<12:06:16,  2.29s/it]  5%|████▍                                                                               | 1050/20117 [38:54<11:57:59,  2.26s/it]                                                                                                                                 {'loss': 0.2531, 'grad_norm': 0.20552317798137665, 'learning_rate': 0.00019889286343956677, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 349.7, 'epoch': 0.1}
  5%|████▍                                                                               | 1050/20117 [38:54<11:57:59,  2.26s/it]  5%|████▍                                                                               | 1051/20117 [38:56<11:51:20,  2.24s/it]  5%|████▍                                                                               | 1052/20117 [38:59<11:47:35,  2.23s/it]  5%|████▍                                                                               | 1053/20117 [39:01<11:52:46,  2.24s/it]  5%|████▍                                                                               | 1054/20117 [39:03<11:48:57,  2.23s/it]  5%|████▍                                                                               | 1055/20117 [39:05<11:46:43,  2.22s/it]  5%|████▍                                                                               | 1056/20117 [39:08<11:46:20,  2.22s/it]  5%|████▍                                                                               | 1057/20117 [39:10<11:44:49,  2.22s/it]  5%|████▍                                                                               | 1058/20117 [39:12<11:42:35,  2.21s/it]  5%|████▍                                                                               | 1059/20117 [39:14<11:38:13,  2.20s/it]  5%|████▍                                                                               | 1060/20117 [39:16<11:34:54,  2.19s/it]                                                                                                                                 {'loss': 0.3196, 'grad_norm': 0.4065081477165222, 'learning_rate': 0.00019886945210304208, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 356.46, 'epoch': 0.11}
  5%|████▍                                                                               | 1060/20117 [39:16<11:34:54,  2.19s/it]  5%|████▍                                                                               | 1061/20117 [39:18<11:31:30,  2.18s/it]  5%|████▍                                                                               | 1062/20117 [39:21<11:35:34,  2.19s/it]  5%|████▍                                                                               | 1063/20117 [39:23<11:42:04,  2.21s/it]  5%|████▍                                                                               | 1064/20117 [39:25<11:46:35,  2.23s/it]  5%|████▍                                                                               | 1065/20117 [39:27<11:45:47,  2.22s/it]  5%|████▍                                                                               | 1066/20117 [39:30<11:49:45,  2.24s/it]  5%|████▍                                                                               | 1067/20117 [39:32<11:42:06,  2.21s/it]  5%|████▍                                                                               | 1068/20117 [39:34<12:00:35,  2.27s/it]  5%|████▍                                                                               | 1069/20117 [39:37<12:08:41,  2.30s/it]  5%|████▍                                                                               | 1070/20117 [39:39<12:19:09,  2.33s/it]                                                                                                                                 {'loss': 0.2585, 'grad_norm': 0.3974571228027344, 'learning_rate': 0.00019884579723016037, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 361.16, 'epoch': 0.11}
  5%|████▍                                                                               | 1070/20117 [39:39<12:19:09,  2.33s/it]  5%|████▍                                                                               | 1071/20117 [39:41<12:29:19,  2.36s/it]  5%|████▍                                                                               | 1072/20117 [39:44<12:28:33,  2.36s/it]  5%|████▍                                                                               | 1073/20117 [39:46<12:32:43,  2.37s/it]  5%|████▍                                                                               | 1074/20117 [39:48<12:19:45,  2.33s/it]  5%|████▍                                                                               | 1075/20117 [39:51<12:18:06,  2.33s/it]  5%|████▍                                                                               | 1076/20117 [39:53<12:20:17,  2.33s/it]  5%|████▍                                                                               | 1077/20117 [39:55<12:20:34,  2.33s/it]  5%|████▌                                                                               | 1078/20117 [39:58<12:05:36,  2.29s/it]  5%|████▌                                                                               | 1079/20117 [40:00<11:54:27,  2.25s/it]  5%|████▌                                                                               | 1080/20117 [40:02<11:52:27,  2.25s/it]                                                                                                                                 {'loss': 0.2807, 'grad_norm': 0.39827367663383484, 'learning_rate': 0.0001988218988791885, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 399.5, 'epoch': 0.11}
  5%|████▌                                                                               | 1080/20117 [40:02<11:52:27,  2.25s/it]  5%|████▌                                                                               | 1081/20117 [40:04<11:43:42,  2.22s/it]  5%|████▌                                                                               | 1082/20117 [40:06<11:46:25,  2.23s/it]  5%|████▌                                                                               | 1083/20117 [40:09<11:42:38,  2.21s/it]  5%|████▌                                                                               | 1084/20117 [40:11<11:48:53,  2.23s/it]  5%|████▌                                                                               | 1085/20117 [40:13<11:46:31,  2.23s/it]  5%|████▌                                                                               | 1086/20117 [40:15<11:44:00,  2.22s/it]  5%|████▌                                                                               | 1087/20117 [40:18<11:41:52,  2.21s/it]  5%|████▌                                                                               | 1088/20117 [40:20<11:37:55,  2.20s/it]  5%|████▌                                                                               | 1089/20117 [40:22<11:39:42,  2.21s/it]  5%|████▌                                                                               | 1090/20117 [40:24<11:38:08,  2.20s/it]                                                                                                                                 {'loss': 0.262, 'grad_norm': 0.38661155104637146, 'learning_rate': 0.00019879775710899322, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 311.89, 'epoch': 0.11}
  5%|████▌                                                                               | 1090/20117 [40:24<11:38:08,  2.20s/it]  5%|████▌                                                                               | 1091/20117 [40:26<11:47:02,  2.23s/it]  5%|████▌                                                                               | 1092/20117 [40:29<11:46:09,  2.23s/it]  5%|████▌                                                                               | 1093/20117 [40:31<11:42:52,  2.22s/it]  5%|████▌                                                                               | 1094/20117 [40:33<11:40:46,  2.21s/it]  5%|████▌                                                                               | 1095/20117 [40:35<11:46:06,  2.23s/it]  5%|████▌                                                                               | 1096/20117 [40:37<11:42:25,  2.22s/it]  5%|████▌                                                                               | 1097/20117 [40:40<11:36:17,  2.20s/it]  5%|████▌                                                                               | 1098/20117 [40:42<11:37:55,  2.20s/it]  5%|████▌                                                                               | 1099/20117 [40:44<11:33:02,  2.19s/it]  5%|████▌                                                                               | 1100/20117 [40:46<11:30:18,  2.18s/it]                                                                                                                                 {'loss': 0.1925, 'grad_norm': 0.272942453622818, 'learning_rate': 0.0001987733719790408, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 399.72, 'epoch': 0.11}
  5%|████▌                                                                               | 1100/20117 [40:46<11:30:18,  2.18s/it]  5%|████▌                                                                               | 1101/20117 [40:48<11:30:11,  2.18s/it]  5%|████▌                                                                               | 1102/20117 [40:51<11:58:56,  2.27s/it]  5%|████▌                                                                               | 1103/20117 [40:53<11:51:21,  2.24s/it]  5%|████▌                                                                               | 1104/20117 [40:55<11:49:29,  2.24s/it]  5%|████▌                                                                               | 1105/20117 [40:57<11:51:27,  2.25s/it]  5%|████▌                                                                               | 1106/20117 [41:00<11:48:45,  2.24s/it]  6%|████▌                                                                               | 1107/20117 [41:02<11:54:12,  2.25s/it]  6%|████▋                                                                               | 1108/20117 [41:04<11:56:53,  2.26s/it]  6%|████▋                                                                               | 1109/20117 [41:07<11:56:21,  2.26s/it]  6%|████▋                                                                               | 1110/20117 [41:09<11:56:48,  2.26s/it]                                                                                                                                 {'loss': 0.1643, 'grad_norm': 0.32272958755493164, 'learning_rate': 0.00019874874354939697, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 315.4, 'epoch': 0.11}
  6%|████▋                                                                               | 1110/20117 [41:09<11:56:48,  2.26s/it]  6%|████▋                                                                               | 1111/20117 [41:11<12:04:17,  2.29s/it]  6%|████▋                                                                               | 1112/20117 [41:13<11:55:27,  2.26s/it]  6%|████▋                                                                               | 1113/20117 [41:16<11:49:26,  2.24s/it]  6%|████▋                                                                               | 1114/20117 [41:18<11:48:37,  2.24s/it]  6%|████▋                                                                               | 1115/20117 [41:20<11:41:48,  2.22s/it]  6%|████▋                                                                               | 1116/20117 [41:22<11:44:18,  2.22s/it]  6%|████▋                                                                               | 1117/20117 [41:24<11:42:53,  2.22s/it]  6%|████▋                                                                               | 1118/20117 [41:27<11:49:41,  2.24s/it]  6%|████▋                                                                               | 1119/20117 [41:29<11:50:45,  2.24s/it]  6%|████▋                                                                               | 1120/20117 [41:31<11:54:57,  2.26s/it]                                                                                                                                 {'loss': 0.2834, 'grad_norm': 0.3583936095237732, 'learning_rate': 0.00019872387188072673, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 439.26, 'epoch': 0.11}
  6%|████▋                                                                               | 1120/20117 [41:31<11:54:57,  2.26s/it]  6%|████▋                                                                               | 1121/20117 [41:33<11:51:00,  2.25s/it]  6%|████▋                                                                               | 1122/20117 [41:36<11:50:10,  2.24s/it]  6%|████▋                                                                               | 1123/20117 [41:38<11:56:59,  2.26s/it]  6%|████▋                                                                               | 1124/20117 [41:40<11:53:27,  2.25s/it]  6%|████▋                                                                               | 1125/20117 [41:42<11:49:08,  2.24s/it]  6%|████▋                                                                               | 1126/20117 [41:45<11:46:40,  2.23s/it]  6%|████▋                                                                               | 1127/20117 [41:47<11:43:37,  2.22s/it]  6%|████▋                                                                               | 1128/20117 [41:49<11:52:14,  2.25s/it]  6%|████▋                                                                               | 1129/20117 [41:51<12:01:14,  2.28s/it]  6%|████▋                                                                               | 1130/20117 [41:54<11:58:03,  2.27s/it]                                                                                                                                 {'loss': 0.2157, 'grad_norm': 0.3295114040374756, 'learning_rate': 0.00019869875703429433, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 290.71, 'epoch': 0.11}
  6%|████▋                                                                               | 1130/20117 [41:54<11:58:03,  2.27s/it]  6%|████▋                                                                               | 1131/20117 [41:56<12:01:06,  2.28s/it]  6%|████▋                                                                               | 1132/20117 [41:58<11:50:43,  2.25s/it]  6%|████▋                                                                               | 1133/20117 [42:00<11:49:20,  2.24s/it]  6%|████▋                                                                               | 1134/20117 [42:03<11:43:00,  2.22s/it]  6%|████▋                                                                               | 1135/20117 [42:05<11:38:02,  2.21s/it]  6%|████▋                                                                               | 1136/20117 [42:07<11:34:08,  2.19s/it]  6%|████▋                                                                               | 1137/20117 [42:09<11:40:31,  2.21s/it]  6%|████▊                                                                               | 1138/20117 [42:11<11:39:47,  2.21s/it]  6%|████▊                                                                               | 1139/20117 [42:14<11:37:28,  2.21s/it]  6%|████▊                                                                               | 1140/20117 [42:16<11:30:27,  2.18s/it]                                                                                                                                 {'loss': 0.2848, 'grad_norm': 0.21794943511486053, 'learning_rate': 0.00019867339907196283, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 382.48, 'epoch': 0.11}
  6%|████▊                                                                               | 1140/20117 [42:16<11:30:27,  2.18s/it]  6%|████▊                                                                               | 1141/20117 [42:18<11:31:15,  2.19s/it]  6%|████▊                                                                               | 1142/20117 [42:20<11:34:09,  2.19s/it]  6%|████▊                                                                               | 1143/20117 [42:22<11:35:40,  2.20s/it]  6%|████▊                                                                               | 1144/20117 [42:25<11:44:34,  2.23s/it]  6%|████▊                                                                               | 1145/20117 [42:27<11:40:05,  2.21s/it]  6%|████▊                                                                               | 1146/20117 [42:29<11:38:18,  2.21s/it]  6%|████▊                                                                               | 1147/20117 [42:31<11:35:08,  2.20s/it]  6%|████▊                                                                               | 1148/20117 [42:33<11:35:54,  2.20s/it]  6%|████▊                                                                               | 1149/20117 [42:36<11:31:32,  2.19s/it]  6%|████▊                                                                               | 1150/20117 [42:38<11:36:54,  2.20s/it]                                                                                                                                 {'loss': 0.2497, 'grad_norm': 0.47928282618522644, 'learning_rate': 0.00019864779805619435, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 351.65, 'epoch': 0.11}
  6%|████▊                                                                               | 1150/20117 [42:38<11:36:54,  2.20s/it]  6%|████▊                                                                               | 1151/20117 [42:40<11:38:26,  2.21s/it]  6%|████▊                                                                               | 1152/20117 [42:42<11:30:49,  2.19s/it]  6%|████▊                                                                               | 1153/20117 [42:44<11:35:42,  2.20s/it]  6%|████▊                                                                               | 1154/20117 [42:47<11:56:22,  2.27s/it]  6%|████▊                                                                               | 1155/20117 [42:49<11:43:46,  2.23s/it]  6%|████▊                                                                               | 1156/20117 [42:51<11:45:43,  2.23s/it]  6%|████▊                                                                               | 1157/20117 [42:54<11:55:22,  2.26s/it]  6%|████▊                                                                               | 1158/20117 [42:56<11:53:47,  2.26s/it]  6%|████▊                                                                               | 1159/20117 [42:58<11:50:27,  2.25s/it]  6%|████▊                                                                               | 1160/20117 [43:00<11:45:41,  2.23s/it]                                                                                                                                 {'loss': 0.2885, 'grad_norm': 0.4386768341064453, 'learning_rate': 0.0001986219540500496, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 351.08, 'epoch': 0.12}
  6%|████▊                                                                               | 1160/20117 [43:00<11:45:41,  2.23s/it]  6%|████▊                                                                               | 1161/20117 [43:02<11:47:01,  2.24s/it]  6%|████▊                                                                               | 1162/20117 [43:05<11:49:04,  2.24s/it]  6%|████▊                                                                               | 1163/20117 [43:07<11:44:08,  2.23s/it]  6%|████▊                                                                               | 1164/20117 [43:09<11:39:46,  2.22s/it]  6%|████▊                                                                               | 1165/20117 [43:12<12:07:01,  2.30s/it]  6%|████▊                                                                               | 1166/20117 [43:14<12:01:35,  2.28s/it]  6%|████▊                                                                               | 1167/20117 [43:16<11:51:07,  2.25s/it]  6%|████▉                                                                               | 1168/20117 [43:18<11:46:13,  2.24s/it]  6%|████▉                                                                               | 1169/20117 [43:20<11:42:00,  2.22s/it]  6%|████▉                                                                               | 1170/20117 [43:23<11:36:57,  2.21s/it]                                                                                                                                 {'loss': 0.325, 'grad_norm': 0.46047040820121765, 'learning_rate': 0.00019859586711718776, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 375.58, 'epoch': 0.12}
  6%|████▉                                                                               | 1170/20117 [43:23<11:36:57,  2.21s/it]  6%|████▉                                                                               | 1171/20117 [43:25<11:34:46,  2.20s/it]  6%|████▉                                                                               | 1172/20117 [43:27<11:36:30,  2.21s/it]  6%|████▉                                                                               | 1173/20117 [43:29<11:40:37,  2.22s/it]  6%|████▉                                                                               | 1174/20117 [43:31<11:41:29,  2.22s/it]  6%|████▉                                                                               | 1175/20117 [43:34<11:40:13,  2.22s/it]  6%|████▉                                                                               | 1176/20117 [43:36<11:39:55,  2.22s/it]  6%|████▉                                                                               | 1177/20117 [43:38<11:41:21,  2.22s/it]  6%|████▉                                                                               | 1178/20117 [43:40<11:50:28,  2.25s/it]  6%|████▉                                                                               | 1179/20117 [43:43<11:41:26,  2.22s/it]  6%|████▉                                                                               | 1180/20117 [43:45<11:40:34,  2.22s/it]                                                                                                                                 {'loss': 0.2923, 'grad_norm': 0.4252080023288727, 'learning_rate': 0.00019856953732186653, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 349.36, 'epoch': 0.12}
  6%|████▉                                                                               | 1180/20117 [43:45<11:40:34,  2.22s/it]  6%|████▉                                                                               | 1181/20117 [43:47<11:40:26,  2.22s/it]  6%|████▉                                                                               | 1182/20117 [43:49<11:46:52,  2.24s/it]  6%|████▉                                                                               | 1183/20117 [43:52<11:45:59,  2.24s/it]  6%|████▉                                                                               | 1184/20117 [43:54<11:37:12,  2.21s/it]  6%|████▉                                                                               | 1185/20117 [43:56<11:34:31,  2.20s/it]  6%|████▉                                                                               | 1186/20117 [43:58<11:35:53,  2.21s/it]  6%|████▉                                                                               | 1187/20117 [44:00<11:38:40,  2.21s/it]  6%|████▉                                                                               | 1188/20117 [44:03<11:40:07,  2.22s/it]  6%|████▉                                                                               | 1189/20117 [44:05<11:41:04,  2.22s/it]  6%|████▉                                                                               | 1190/20117 [44:07<11:36:41,  2.21s/it]                                                                                                                                 {'loss': 0.2315, 'grad_norm': 0.3643350899219513, 'learning_rate': 0.00019854296472894168, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 390.28, 'epoch': 0.12}
  6%|████▉                                                                               | 1190/20117 [44:07<11:36:41,  2.21s/it]  6%|████▉                                                                               | 1191/20117 [44:09<11:33:59,  2.20s/it]  6%|████▉                                                                               | 1192/20117 [44:11<11:32:47,  2.20s/it]  6%|████▉                                                                               | 1193/20117 [44:13<11:32:20,  2.20s/it]  6%|████▉                                                                               | 1194/20117 [44:16<11:33:51,  2.20s/it]  6%|████▉                                                                               | 1195/20117 [44:18<11:31:46,  2.19s/it]  6%|████▉                                                                               | 1196/20117 [44:20<11:31:41,  2.19s/it]  6%|████▉                                                                               | 1197/20117 [44:22<11:31:03,  2.19s/it]  6%|█████                                                                               | 1198/20117 [44:25<11:38:07,  2.21s/it]  6%|█████                                                                               | 1199/20117 [44:27<11:40:39,  2.22s/it]  6%|█████                                                                               | 1200/20117 [44:29<11:39:34,  2.22s/it]                                                                                                                                 {'loss': 0.3214, 'grad_norm': 0.4747346341609955, 'learning_rate': 0.00019851614940386722, 'memory/max_active (GiB)': 19.19, 'memory/max_allocated (GiB)': 19.19, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 414.56, 'epoch': 0.12}
  6%|█████                                                                               | 1200/20117 [44:29<11:39:34,  2.22s/it]  6%|█████                                                                               | 1201/20117 [44:31<11:45:10,  2.24s/it]  6%|█████                                                                               | 1202/20117 [44:33<11:42:37,  2.23s/it]  6%|█████                                                                               | 1203/20117 [44:36<11:44:03,  2.23s/it]  6%|█████                                                                               | 1204/20117 [44:38<11:43:41,  2.23s/it]  6%|█████                                                                               | 1205/20117 [44:40<11:45:13,  2.24s/it]  6%|█████                                                                               | 1206/20117 [44:42<11:48:56,  2.25s/it]  6%|█████                                                                               | 1207/20117 [44:45<12:25:28,  2.37s/it]  6%|█████                                                                               | 1208/20117 [44:47<12:16:52,  2.34s/it]  6%|█████                                                                               | 1209/20117 [44:50<12:13:26,  2.33s/it]  6%|█████                                                                               | 1210/20117 [44:52<12:05:22,  2.30s/it]                                                                                                                                 {'loss': 0.2606, 'grad_norm': 0.32468438148498535, 'learning_rate': 0.0001984890914126949, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 393.59, 'epoch': 0.12}
  6%|█████                                                                               | 1210/20117 [44:52<12:05:22,  2.30s/it]  6%|█████                                                                               | 1211/20117 [44:54<11:59:39,  2.28s/it]  6%|█████                                                                               | 1212/20117 [44:57<12:04:38,  2.30s/it]  6%|█████                                                                               | 1213/20117 [44:59<12:10:22,  2.32s/it]  6%|█████                                                                               | 1214/20117 [45:01<12:10:31,  2.32s/it]  6%|█████                                                                               | 1215/20117 [45:03<12:00:04,  2.29s/it]  6%|█████                                                                               | 1216/20117 [45:06<11:50:18,  2.25s/it]  6%|█████                                                                               | 1217/20117 [45:08<11:54:12,  2.27s/it]  6%|█████                                                                               | 1218/20117 [45:10<11:56:43,  2.28s/it]  6%|█████                                                                               | 1219/20117 [45:12<11:54:15,  2.27s/it]  6%|█████                                                                               | 1220/20117 [45:15<11:46:06,  2.24s/it]                                                                                                                                 {'loss': 0.2457, 'grad_norm': 0.27018123865127563, 'learning_rate': 0.00019846179082207429, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 357.71, 'epoch': 0.12}
  6%|█████                                                                               | 1220/20117 [45:15<11:46:06,  2.24s/it]  6%|█████                                                                               | 1221/20117 [45:17<11:50:18,  2.26s/it]  6%|█████                                                                               | 1222/20117 [45:19<11:56:14,  2.27s/it]  6%|█████                                                                               | 1223/20117 [45:21<11:53:21,  2.27s/it]  6%|█████                                                                               | 1224/20117 [45:24<11:49:22,  2.25s/it]  6%|█████                                                                               | 1225/20117 [45:26<11:51:07,  2.26s/it]  6%|█████                                                                               | 1226/20117 [45:28<11:43:37,  2.23s/it]  6%|█████                                                                               | 1227/20117 [45:30<11:47:25,  2.25s/it]  6%|█████▏                                                                              | 1228/20117 [45:33<11:44:53,  2.24s/it]  6%|█████▏                                                                              | 1229/20117 [45:35<11:40:47,  2.23s/it]  6%|█████▏                                                                              | 1230/20117 [45:37<11:45:43,  2.24s/it]                                                                                                                                 {'loss': 0.2618, 'grad_norm': 0.39820268750190735, 'learning_rate': 0.00019843424769925248, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 377.03, 'epoch': 0.12}
  6%|█████▏                                                                              | 1230/20117 [45:37<11:45:43,  2.24s/it]  6%|█████▏                                                                              | 1231/20117 [45:39<11:46:52,  2.25s/it]  6%|█████▏                                                                              | 1232/20117 [45:42<11:47:30,  2.25s/it]  6%|█████▏                                                                              | 1233/20117 [45:44<11:47:18,  2.25s/it]  6%|█████▏                                                                              | 1234/20117 [45:46<11:42:44,  2.23s/it]  6%|█████▏                                                                              | 1235/20117 [45:48<11:39:23,  2.22s/it]  6%|█████▏                                                                              | 1236/20117 [45:50<11:37:23,  2.22s/it]  6%|█████▏                                                                              | 1237/20117 [45:53<11:38:43,  2.22s/it]  6%|█████▏                                                                              | 1238/20117 [45:55<11:55:37,  2.27s/it]  6%|█████▏                                                                              | 1239/20117 [45:57<11:55:01,  2.27s/it]  6%|█████▏                                                                              | 1240/20117 [46:00<11:51:03,  2.26s/it]                                                                                                                                 {'loss': 0.2864, 'grad_norm': 0.4186467230319977, 'learning_rate': 0.00019840646211207407, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 415.58, 'epoch': 0.12}
  6%|█████▏                                                                              | 1240/20117 [46:00<11:51:03,  2.26s/it]  6%|█████▏                                                                              | 1241/20117 [46:02<11:49:50,  2.26s/it]  6%|█████▏                                                                              | 1242/20117 [46:04<11:45:29,  2.24s/it]  6%|█████▏                                                                              | 1243/20117 [46:06<11:45:28,  2.24s/it]  6%|█████▏                                                                              | 1244/20117 [46:09<11:50:08,  2.26s/it]  6%|█████▏                                                                              | 1245/20117 [46:11<11:51:57,  2.26s/it]  6%|█████▏                                                                              | 1246/20117 [46:13<11:50:22,  2.26s/it]  6%|█████▏                                                                              | 1247/20117 [46:15<11:46:45,  2.25s/it]  6%|█████▏                                                                              | 1248/20117 [46:18<11:49:44,  2.26s/it]  6%|█████▏                                                                              | 1249/20117 [46:20<11:50:57,  2.26s/it]  6%|█████▏                                                                              | 1250/20117 [46:22<11:43:15,  2.24s/it]                                                                                                                                 {'loss': 0.1777, 'grad_norm': 0.3047218918800354, 'learning_rate': 0.00019837843412898081, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 376.07, 'epoch': 0.12}
  6%|█████▏                                                                              | 1250/20117 [46:22<11:43:15,  2.24s/it]  6%|█████▏                                                                              | 1251/20117 [46:24<11:43:04,  2.24s/it]  6%|█████▏                                                                              | 1252/20117 [46:27<11:47:53,  2.25s/it]  6%|█████▏                                                                              | 1253/20117 [46:29<11:55:06,  2.27s/it]  6%|█████▏                                                                              | 1254/20117 [46:31<11:54:38,  2.27s/it]  6%|█████▏                                                                              | 1255/20117 [46:33<11:53:54,  2.27s/it]  6%|█████▏                                                                              | 1256/20117 [46:36<11:55:07,  2.27s/it]  6%|█████▏                                                                              | 1257/20117 [46:38<11:53:13,  2.27s/it]  6%|█████▎                                                                              | 1258/20117 [46:40<11:48:12,  2.25s/it]  6%|█████▎                                                                              | 1259/20117 [46:42<11:42:54,  2.24s/it]  6%|█████▎                                                                              | 1260/20117 [46:45<12:13:38,  2.33s/it]                                                                                                                                 {'loss': 0.2906, 'grad_norm': 0.3663698136806488, 'learning_rate': 0.0001983501638190115, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 341.64, 'epoch': 0.13}
  6%|█████▎                                                                              | 1260/20117 [46:45<12:13:38,  2.33s/it]  6%|█████▎                                                                              | 1261/20117 [46:47<12:02:53,  2.30s/it]  6%|█████▎                                                                              | 1262/20117 [46:49<11:59:01,  2.29s/it]  6%|█████▎                                                                              | 1263/20117 [46:52<11:59:46,  2.29s/it]  6%|█████▎                                                                              | 1264/20117 [46:54<11:50:35,  2.26s/it]  6%|█████▎                                                                              | 1265/20117 [46:56<11:48:35,  2.26s/it]  6%|█████▎                                                                              | 1266/20117 [46:58<11:43:34,  2.24s/it]  6%|█████▎                                                                              | 1267/20117 [47:01<11:40:04,  2.23s/it]  6%|█████▎                                                                              | 1268/20117 [47:03<11:39:29,  2.23s/it]  6%|█████▎                                                                              | 1269/20117 [47:05<11:54:09,  2.27s/it]  6%|█████▎                                                                              | 1270/20117 [47:07<11:50:25,  2.26s/it]                                                                                                                                 {'loss': 0.2498, 'grad_norm': 0.5897945761680603, 'learning_rate': 0.00019832165125180194, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 380.54, 'epoch': 0.13}
  6%|█████▎                                                                              | 1270/20117 [47:07<11:50:25,  2.26s/it]  6%|█████▎                                                                              | 1271/20117 [47:10<11:59:08,  2.29s/it]  6%|█████▎                                                                              | 1272/20117 [47:12<11:57:24,  2.28s/it]  6%|█████▎                                                                              | 1273/20117 [47:14<11:55:36,  2.28s/it]  6%|█████▎                                                                              | 1274/20117 [47:17<11:51:16,  2.26s/it]  6%|█████▎                                                                              | 1275/20117 [47:19<11:53:22,  2.27s/it]  6%|█████▎                                                                              | 1276/20117 [47:21<11:49:29,  2.26s/it]  6%|█████▎                                                                              | 1277/20117 [47:23<11:42:28,  2.24s/it]  6%|█████▎                                                                              | 1278/20117 [47:25<11:35:25,  2.21s/it]  6%|█████▎                                                                              | 1279/20117 [47:28<11:32:15,  2.20s/it]  6%|█████▎                                                                              | 1280/20117 [47:30<11:33:11,  2.21s/it]                                                                                                                                 {'loss': 0.2722, 'grad_norm': 0.40836209058761597, 'learning_rate': 0.0001982928964975846, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 387.55, 'epoch': 0.13}
  6%|█████▎                                                                              | 1280/20117 [47:30<11:33:11,  2.21s/it]  6%|█████▎                                                                              | 1281/20117 [47:32<11:36:32,  2.22s/it]  6%|█████▎                                                                              | 1282/20117 [47:34<11:32:29,  2.21s/it]  6%|█████▎                                                                              | 1283/20117 [47:36<11:34:25,  2.21s/it]  6%|█████▎                                                                              | 1284/20117 [47:39<11:30:31,  2.20s/it]  6%|█████▎                                                                              | 1285/20117 [47:41<11:30:54,  2.20s/it]  6%|█████▎                                                                              | 1286/20117 [47:43<11:30:58,  2.20s/it]  6%|█████▎                                                                              | 1287/20117 [47:45<11:30:02,  2.20s/it]  6%|█████▍                                                                              | 1288/20117 [47:47<11:35:21,  2.22s/it]  6%|█████▍                                                                              | 1289/20117 [47:50<11:45:03,  2.25s/it]  6%|█████▍                                                                              | 1290/20117 [47:52<11:41:48,  2.24s/it]                                                                                                                                 {'loss': 0.3202, 'grad_norm': 0.33597612380981445, 'learning_rate': 0.00019826389962718848, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 313.03, 'epoch': 0.13}
  6%|█████▍                                                                              | 1290/20117 [47:52<11:41:48,  2.24s/it]  6%|█████▍                                                                              | 1291/20117 [47:54<11:40:25,  2.23s/it]  6%|█████▍                                                                              | 1292/20117 [47:56<11:41:21,  2.24s/it]  6%|█████▍                                                                              | 1293/20117 [47:59<11:49:59,  2.26s/it]  6%|█████▍                                                                              | 1294/20117 [48:01<11:44:27,  2.25s/it]  6%|█████▍                                                                              | 1295/20117 [48:03<11:40:04,  2.23s/it]  6%|█████▍                                                                              | 1296/20117 [48:05<11:43:32,  2.24s/it]  6%|█████▍                                                                              | 1297/20117 [48:08<11:36:27,  2.22s/it]  6%|█████▍                                                                              | 1298/20117 [48:10<11:38:18,  2.23s/it]  6%|█████▍                                                                              | 1299/20117 [48:12<11:37:07,  2.22s/it]  6%|█████▍                                                                              | 1300/20117 [48:14<11:41:23,  2.24s/it]                                                                                                                                 {'loss': 0.2949, 'grad_norm': 0.44784456491470337, 'learning_rate': 0.00019823466071203902, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 390.03, 'epoch': 0.13}
  6%|█████▍                                                                              | 1300/20117 [48:14<11:41:23,  2.24s/it]  6%|█████▍                                                                              | 1301/20117 [48:17<11:39:43,  2.23s/it]  6%|█████▍                                                                              | 1302/20117 [48:19<11:40:48,  2.23s/it]  6%|█████▍                                                                              | 1303/20117 [48:21<11:40:14,  2.23s/it]  6%|█████▍                                                                              | 1304/20117 [48:23<11:43:07,  2.24s/it]  6%|█████▍                                                                              | 1305/20117 [48:26<11:42:59,  2.24s/it]  6%|█████▍                                                                              | 1306/20117 [48:28<11:51:03,  2.27s/it]  6%|█████▍                                                                              | 1307/20117 [48:30<11:44:58,  2.25s/it]  7%|█████▍                                                                              | 1308/20117 [48:32<11:38:23,  2.23s/it]  7%|█████▍                                                                              | 1309/20117 [48:35<11:44:11,  2.25s/it]  7%|█████▍                                                                              | 1310/20117 [48:37<11:45:56,  2.25s/it]                                                                                                                                 {'loss': 0.2323, 'grad_norm': 0.3199595510959625, 'learning_rate': 0.0001982051798241579, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 415.8, 'epoch': 0.13}
  7%|█████▍                                                                              | 1310/20117 [48:37<11:45:56,  2.25s/it]  7%|█████▍                                                                              | 1311/20117 [48:39<11:41:29,  2.24s/it]  7%|█████▍                                                                              | 1312/20117 [48:41<11:39:28,  2.23s/it]  7%|█████▍                                                                              | 1313/20117 [48:43<11:38:38,  2.23s/it]  7%|█████▍                                                                              | 1314/20117 [48:46<11:39:19,  2.23s/it]  7%|█████▍                                                                              | 1315/20117 [48:48<12:16:07,  2.35s/it]  7%|█████▍                                                                              | 1316/20117 [48:51<12:04:31,  2.31s/it]  7%|█████▍                                                                              | 1317/20117 [48:53<11:53:37,  2.28s/it]  7%|█████▌                                                                              | 1318/20117 [48:55<11:45:08,  2.25s/it]  7%|█████▌                                                                              | 1319/20117 [48:57<11:42:37,  2.24s/it]  7%|█████▌                                                                              | 1320/20117 [48:59<11:38:06,  2.23s/it]                                                                                                                                 {'loss': 0.291, 'grad_norm': 0.4944785535335541, 'learning_rate': 0.0001981754570361627, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 347.08, 'epoch': 0.13}
  7%|█████▌                                                                              | 1320/20117 [48:59<11:38:06,  2.23s/it]  7%|█████▌                                                                              | 1321/20117 [49:02<11:37:47,  2.23s/it]  7%|█████▌                                                                              | 1322/20117 [49:04<11:34:05,  2.22s/it]  7%|█████▌                                                                              | 1323/20117 [49:06<11:35:23,  2.22s/it]  7%|█████▌                                                                              | 1324/20117 [49:08<11:39:35,  2.23s/it]  7%|█████▌                                                                              | 1325/20117 [49:11<11:41:21,  2.24s/it]  7%|█████▌                                                                              | 1326/20117 [49:13<11:35:35,  2.22s/it]  7%|█████▌                                                                              | 1327/20117 [49:15<11:35:49,  2.22s/it]  7%|█████▌                                                                              | 1328/20117 [49:17<11:33:02,  2.21s/it]  7%|█████▌                                                                              | 1329/20117 [49:19<11:26:28,  2.19s/it]  7%|█████▌                                                                              | 1330/20117 [49:21<11:25:24,  2.19s/it]                                                                                                                                 {'loss': 0.2631, 'grad_norm': 0.379162073135376, 'learning_rate': 0.00019814549242126698, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 405.3, 'epoch': 0.13}
  7%|█████▌                                                                              | 1330/20117 [49:21<11:25:24,  2.19s/it]  7%|█████▌                                                                              | 1331/20117 [49:24<11:35:02,  2.22s/it]  7%|█████▌                                                                              | 1332/20117 [49:26<11:38:20,  2.23s/it]  7%|█████▌                                                                              | 1333/20117 [49:28<11:45:12,  2.25s/it]  7%|█████▌                                                                              | 1334/20117 [49:30<11:40:24,  2.24s/it]  7%|█████▌                                                                              | 1335/20117 [49:33<12:00:25,  2.30s/it]  7%|█████▌                                                                              | 1336/20117 [49:35<11:57:13,  2.29s/it]  7%|█████▌                                                                              | 1337/20117 [49:37<11:49:19,  2.27s/it]  7%|█████▌                                                                              | 1338/20117 [49:40<11:44:36,  2.25s/it]  7%|█████▌                                                                              | 1339/20117 [49:42<11:36:31,  2.23s/it]  7%|█████▌                                                                              | 1340/20117 [49:44<11:37:46,  2.23s/it]                                                                                                                                 {'loss': 0.2099, 'grad_norm': 0.20690025389194489, 'learning_rate': 0.00019811528605327992, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 377.71, 'epoch': 0.13}
  7%|█████▌                                                                              | 1340/20117 [49:44<11:37:46,  2.23s/it]  7%|█████▌                                                                              | 1341/20117 [49:46<11:31:27,  2.21s/it]  7%|█████▌                                                                              | 1342/20117 [49:48<11:27:32,  2.20s/it]  7%|█████▌                                                                              | 1343/20117 [49:51<11:24:47,  2.19s/it]  7%|█████▌                                                                              | 1344/20117 [49:53<11:19:12,  2.17s/it]  7%|█████▌                                                                              | 1345/20117 [49:55<11:21:07,  2.18s/it]  7%|█████▌                                                                              | 1346/20117 [49:57<11:26:26,  2.19s/it]  7%|█████▌                                                                              | 1347/20117 [49:59<11:37:08,  2.23s/it]  7%|█████▋                                                                              | 1348/20117 [50:02<11:38:37,  2.23s/it]  7%|█████▋                                                                              | 1349/20117 [50:04<11:37:05,  2.23s/it]  7%|█████▋                                                                              | 1350/20117 [50:06<11:35:21,  2.22s/it]                                                                                                                                 {'loss': 0.2486, 'grad_norm': 0.39738351106643677, 'learning_rate': 0.00019808483800660612, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 333.2, 'epoch': 0.13}
  7%|█████▋                                                                              | 1350/20117 [50:06<11:35:21,  2.22s/it]  7%|█████▋                                                                              | 1351/20117 [50:08<11:38:43,  2.23s/it]  7%|█████▋                                                                              | 1352/20117 [50:11<11:45:21,  2.26s/it]  7%|█████▋                                                                              | 1353/20117 [50:13<11:41:32,  2.24s/it]  7%|█████▋                                                                              | 1354/20117 [50:15<11:34:53,  2.22s/it]  7%|█████▋                                                                              | 1355/20117 [50:17<11:32:25,  2.21s/it]  7%|█████▋                                                                              | 1356/20117 [50:19<11:25:57,  2.19s/it]  7%|█████▋                                                                              | 1357/20117 [50:22<11:29:45,  2.21s/it]  7%|█████▋                                                                              | 1358/20117 [50:24<11:30:08,  2.21s/it]  7%|█████▋                                                                              | 1359/20117 [50:26<11:38:38,  2.23s/it]  7%|█████▋                                                                              | 1360/20117 [50:28<11:38:21,  2.23s/it]                                                                                                                                 {'loss': 0.2407, 'grad_norm': 0.5237305164337158, 'learning_rate': 0.00019805414835624566, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 353.94, 'epoch': 0.14}
  7%|█████▋                                                                              | 1360/20117 [50:28<11:38:21,  2.23s/it]  7%|█████▋                                                                              | 1361/20117 [50:31<11:39:09,  2.24s/it]  7%|█████▋                                                                              | 1362/20117 [50:33<11:34:08,  2.22s/it]  7%|█████▋                                                                              | 1363/20117 [50:35<11:36:24,  2.23s/it]  7%|█████▋                                                                              | 1364/20117 [50:37<11:35:03,  2.22s/it]  7%|█████▋                                                                              | 1365/20117 [50:39<11:33:47,  2.22s/it]  7%|█████▋                                                                              | 1366/20117 [50:42<11:39:54,  2.24s/it]  7%|█████▋                                                                              | 1367/20117 [50:44<11:40:35,  2.24s/it]  7%|█████▋                                                                              | 1368/20117 [50:46<11:40:44,  2.24s/it]  7%|█████▋                                                                              | 1369/20117 [50:49<12:05:45,  2.32s/it]  7%|█████▋                                                                              | 1370/20117 [50:51<12:00:25,  2.31s/it]                                                                                                                                 {'loss': 0.3119, 'grad_norm': 0.2773837447166443, 'learning_rate': 0.00019802321717779354, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 391.05, 'epoch': 0.14}
  7%|█████▋                                                                              | 1370/20117 [50:51<12:00:25,  2.31s/it]  7%|█████▋                                                                              | 1371/20117 [50:53<11:50:35,  2.27s/it]  7%|█████▋                                                                              | 1372/20117 [50:55<11:42:32,  2.25s/it]  7%|█████▋                                                                              | 1373/20117 [50:58<11:34:12,  2.22s/it]  7%|█████▋                                                                              | 1374/20117 [51:00<11:34:19,  2.22s/it]  7%|█████▋                                                                              | 1375/20117 [51:02<11:29:12,  2.21s/it]  7%|█████▋                                                                              | 1376/20117 [51:04<11:33:22,  2.22s/it]  7%|█████▋                                                                              | 1377/20117 [51:06<11:35:36,  2.23s/it]  7%|█████▊                                                                              | 1378/20117 [51:09<11:36:04,  2.23s/it]  7%|█████▊                                                                              | 1379/20117 [51:11<11:41:38,  2.25s/it]  7%|█████▊                                                                              | 1380/20117 [51:13<11:43:56,  2.25s/it]                                                                                                                                 {'loss': 0.2812, 'grad_norm': 0.2825298011302948, 'learning_rate': 0.00019799204454743987, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 348.62, 'epoch': 0.14}
  7%|█████▊                                                                              | 1380/20117 [51:13<11:43:56,  2.25s/it]  7%|█████▊                                                                              | 1381/20117 [51:15<11:40:43,  2.24s/it]  7%|█████▊                                                                              | 1382/20117 [51:18<11:50:08,  2.27s/it]  7%|█████▊                                                                              | 1383/20117 [51:20<11:49:27,  2.27s/it]  7%|█████▊                                                                              | 1384/20117 [51:22<11:44:39,  2.26s/it]  7%|█████▊                                                                              | 1385/20117 [51:24<11:39:59,  2.24s/it]  7%|█████▊                                                                              | 1386/20117 [51:27<11:35:58,  2.23s/it]  7%|█████▊                                                                              | 1387/20117 [51:29<11:42:22,  2.25s/it]  7%|█████▊                                                                              | 1388/20117 [51:31<11:38:42,  2.24s/it]  7%|█████▊                                                                              | 1389/20117 [51:33<11:34:01,  2.22s/it]  7%|█████▊                                                                              | 1390/20117 [51:36<11:33:03,  2.22s/it]                                                                                                                                 {'loss': 0.2506, 'grad_norm': 0.3622908592224121, 'learning_rate': 0.00019796063054196937, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 380.11, 'epoch': 0.14}
  7%|█████▊                                                                              | 1390/20117 [51:36<11:33:03,  2.22s/it]  7%|█████▊                                                                              | 1391/20117 [51:38<11:30:33,  2.21s/it]  7%|█████▊                                                                              | 1392/20117 [51:40<11:28:44,  2.21s/it]  7%|█████▊                                                                              | 1393/20117 [51:42<11:28:06,  2.21s/it]  7%|█████▊                                                                              | 1394/20117 [51:44<11:28:15,  2.21s/it]  7%|█████▊                                                                              | 1395/20117 [51:47<11:29:25,  2.21s/it]  7%|█████▊                                                                              | 1396/20117 [51:49<11:30:05,  2.21s/it]  7%|█████▊                                                                              | 1397/20117 [51:51<11:29:07,  2.21s/it]  7%|█████▊                                                                              | 1398/20117 [51:53<11:27:29,  2.20s/it]  7%|█████▊                                                                              | 1399/20117 [51:55<11:34:19,  2.23s/it]  7%|█████▊                                                                              | 1400/20117 [51:58<11:42:23,  2.25s/it]                                                                                                                                 {'loss': 0.2132, 'grad_norm': 0.3992385268211365, 'learning_rate': 0.0001979289752387614, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 301.04, 'epoch': 0.14}
  7%|█████▊                                                                              | 1400/20117 [51:58<11:42:23,  2.25s/it]  7%|█████▊                                                                              | 1401/20117 [52:00<11:46:46,  2.27s/it]  7%|█████▊                                                                              | 1402/20117 [52:02<11:53:16,  2.29s/it]  7%|█████▊                                                                              | 1403/20117 [52:05<11:56:43,  2.30s/it]  7%|█████▊                                                                              | 1404/20117 [52:07<11:54:53,  2.29s/it]  7%|█████▊                                                                              | 1405/20117 [52:09<11:53:46,  2.29s/it]  7%|█████▊                                                                              | 1406/20117 [52:12<11:50:39,  2.28s/it]  7%|█████▉                                                                              | 1407/20117 [52:14<11:48:35,  2.27s/it]  7%|█████▉                                                                              | 1408/20117 [52:16<11:53:18,  2.29s/it]  7%|█████▉                                                                              | 1409/20117 [52:18<11:53:14,  2.29s/it]  7%|█████▉                                                                              | 1410/20117 [52:21<11:53:26,  2.29s/it]                                                                                                                                 {'loss': 0.1813, 'grad_norm': 0.4148050546646118, 'learning_rate': 0.00019789707871578966, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 296.18, 'epoch': 0.14}
  7%|█████▉                                                                              | 1410/20117 [52:21<11:53:26,  2.29s/it]  7%|█████▉                                                                              | 1411/20117 [52:23<11:50:16,  2.28s/it]  7%|█████▉                                                                              | 1412/20117 [52:25<11:46:20,  2.27s/it]  7%|█████▉                                                                              | 1413/20117 [52:27<11:42:33,  2.25s/it]  7%|█████▉                                                                              | 1414/20117 [52:30<11:43:06,  2.26s/it]  7%|█████▉                                                                              | 1415/20117 [52:32<11:41:54,  2.25s/it]  7%|█████▉                                                                              | 1416/20117 [52:34<11:42:04,  2.25s/it]  7%|█████▉                                                                              | 1417/20117 [52:36<11:37:31,  2.24s/it]  7%|█████▉                                                                              | 1418/20117 [52:39<11:36:59,  2.24s/it]  7%|█████▉                                                                              | 1419/20117 [52:41<11:29:45,  2.21s/it]  7%|█████▉                                                                              | 1420/20117 [52:43<11:30:40,  2.22s/it]                                                                                                                                 {'loss': 0.2607, 'grad_norm': 0.36811864376068115, 'learning_rate': 0.000197864941051622, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 376.48, 'epoch': 0.14}
  7%|█████▉                                                                              | 1420/20117 [52:43<11:30:40,  2.22s/it]  7%|█████▉                                                                              | 1421/20117 [52:45<11:28:31,  2.21s/it]  7%|█████▉                                                                              | 1422/20117 [52:48<11:49:00,  2.28s/it]  7%|█████▉                                                                              | 1423/20117 [52:50<11:43:13,  2.26s/it]  7%|█████▉                                                                              | 1424/20117 [52:52<11:41:24,  2.25s/it]  7%|█████▉                                                                              | 1425/20117 [52:54<11:41:23,  2.25s/it]  7%|█████▉                                                                              | 1426/20117 [52:57<11:37:35,  2.24s/it]  7%|█████▉                                                                              | 1427/20117 [52:59<11:42:24,  2.25s/it]  7%|█████▉                                                                              | 1428/20117 [53:01<11:42:02,  2.25s/it]  7%|█████▉                                                                              | 1429/20117 [53:03<11:38:17,  2.24s/it]  7%|█████▉                                                                              | 1430/20117 [53:06<11:40:10,  2.25s/it]                                                                                                                                 {'loss': 0.2694, 'grad_norm': 0.33353865146636963, 'learning_rate': 0.00019783256232542033, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 352.36, 'epoch': 0.14}
  7%|█████▉                                                                              | 1430/20117 [53:06<11:40:10,  2.25s/it]  7%|█████▉                                                                              | 1431/20117 [53:08<11:39:58,  2.25s/it]  7%|█████▉                                                                              | 1432/20117 [53:10<11:39:58,  2.25s/it]  7%|█████▉                                                                              | 1433/20117 [53:12<11:44:55,  2.26s/it]  7%|█████▉                                                                              | 1434/20117 [53:15<11:47:44,  2.27s/it]  7%|█████▉                                                                              | 1435/20117 [53:17<11:36:45,  2.24s/it]  7%|█████▉                                                                              | 1436/20117 [53:19<11:38:01,  2.24s/it]  7%|██████                                                                              | 1437/20117 [53:21<11:33:32,  2.23s/it]  7%|██████                                                                              | 1438/20117 [53:23<11:32:22,  2.22s/it]  7%|██████                                                                              | 1439/20117 [53:26<11:33:50,  2.23s/it]  7%|██████                                                                              | 1440/20117 [53:28<11:33:39,  2.23s/it]                                                                                                                                 {'loss': 0.2851, 'grad_norm': 0.4390527606010437, 'learning_rate': 0.00019779994261694025, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 332.85, 'epoch': 0.14}
  7%|██████                                                                              | 1440/20117 [53:28<11:33:39,  2.23s/it]  7%|██████                                                                              | 1441/20117 [53:30<11:38:12,  2.24s/it]  7%|██████                                                                              | 1442/20117 [53:33<11:43:03,  2.26s/it]  7%|██████                                                                              | 1443/20117 [53:35<11:31:51,  2.22s/it]  7%|██████                                                                              | 1444/20117 [53:37<11:35:37,  2.24s/it]  7%|██████                                                                              | 1445/20117 [53:39<11:31:28,  2.22s/it]  7%|██████                                                                              | 1446/20117 [53:41<11:33:01,  2.23s/it]  7%|██████                                                                              | 1447/20117 [53:44<11:30:51,  2.22s/it]  7%|██████                                                                              | 1448/20117 [53:46<11:27:16,  2.21s/it]  7%|██████                                                                              | 1449/20117 [53:48<11:27:32,  2.21s/it]  7%|██████                                                                              | 1450/20117 [53:50<11:26:19,  2.21s/it]                                                                                                                                 {'loss': 0.3301, 'grad_norm': 0.4553990066051483, 'learning_rate': 0.00019776708200653102, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 400.73, 'epoch': 0.14}
  7%|██████                                                                              | 1450/20117 [53:50<11:26:19,  2.21s/it]  7%|██████                                                                              | 1451/20117 [53:52<11:29:00,  2.21s/it]  7%|██████                                                                              | 1452/20117 [53:55<11:28:19,  2.21s/it]  7%|██████                                                                              | 1453/20117 [53:57<11:28:32,  2.21s/it]  7%|██████                                                                              | 1454/20117 [53:59<11:28:01,  2.21s/it]  7%|██████                                                                              | 1455/20117 [54:01<11:43:00,  2.26s/it]  7%|██████                                                                              | 1456/20117 [54:04<11:41:29,  2.26s/it]  7%|██████                                                                              | 1457/20117 [54:06<11:34:35,  2.23s/it]  7%|██████                                                                              | 1458/20117 [54:08<11:28:21,  2.21s/it]  7%|██████                                                                              | 1459/20117 [54:10<11:31:35,  2.22s/it]  7%|██████                                                                              | 1460/20117 [54:12<11:29:44,  2.22s/it]                                                                                                                                 {'loss': 0.2276, 'grad_norm': 0.3526112139225006, 'learning_rate': 0.00019773398057513526, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 342.73, 'epoch': 0.15}
  7%|██████                                                                              | 1460/20117 [54:12<11:29:44,  2.22s/it]  7%|██████                                                                              | 1461/20117 [54:15<11:27:17,  2.21s/it]  7%|██████                                                                              | 1462/20117 [54:17<11:29:35,  2.22s/it]  7%|██████                                                                              | 1463/20117 [54:19<11:25:53,  2.21s/it]  7%|██████                                                                              | 1464/20117 [54:21<11:29:08,  2.22s/it]  7%|██████                                                                              | 1465/20117 [54:23<11:23:25,  2.20s/it]  7%|██████                                                                              | 1466/20117 [54:26<11:19:13,  2.19s/it]  7%|██████▏                                                                             | 1467/20117 [54:28<11:30:59,  2.22s/it]  7%|██████▏                                                                             | 1468/20117 [54:30<11:32:20,  2.23s/it]  7%|██████▏                                                                             | 1469/20117 [54:32<11:33:48,  2.23s/it]  7%|██████▏                                                                             | 1470/20117 [54:35<11:31:18,  2.22s/it]                                                                                                                                 {'loss': 0.2185, 'grad_norm': 0.4758242070674896, 'learning_rate': 0.0001977006384042888, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 343.6, 'epoch': 0.15}
  7%|██████▏                                                                             | 1470/20117 [54:35<11:31:18,  2.22s/it]  7%|██████▏                                                                             | 1471/20117 [54:37<11:26:01,  2.21s/it]  7%|██████▏                                                                             | 1472/20117 [54:39<11:31:45,  2.23s/it]  7%|██████▏                                                                             | 1473/20117 [54:41<11:28:51,  2.22s/it]  7%|██████▏                                                                             | 1474/20117 [54:43<11:30:03,  2.22s/it]  7%|██████▏                                                                             | 1475/20117 [54:46<11:31:05,  2.22s/it]  7%|██████▏                                                                             | 1476/20117 [54:48<12:08:31,  2.34s/it]  7%|██████▏                                                                             | 1477/20117 [54:51<11:57:18,  2.31s/it]  7%|██████▏                                                                             | 1478/20117 [54:53<11:51:51,  2.29s/it]  7%|██████▏                                                                             | 1479/20117 [54:55<11:39:42,  2.25s/it]  7%|██████▏                                                                             | 1480/20117 [54:57<11:40:37,  2.26s/it]                                                                                                                                 {'loss': 0.2598, 'grad_norm': 0.4020686447620392, 'learning_rate': 0.00019766705557612045, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 288.04, 'epoch': 0.15}
  7%|██████▏                                                                             | 1480/20117 [54:57<11:40:37,  2.26s/it]  7%|██████▏                                                                             | 1481/20117 [54:59<11:39:55,  2.25s/it]  7%|██████▏                                                                             | 1482/20117 [55:02<11:29:57,  2.22s/it]  7%|██████▏                                                                             | 1483/20117 [55:04<11:37:40,  2.25s/it]  7%|██████▏                                                                             | 1484/20117 [55:06<11:34:20,  2.24s/it]  7%|██████▏                                                                             | 1485/20117 [55:08<11:24:54,  2.21s/it]  7%|██████▏                                                                             | 1486/20117 [55:11<11:30:50,  2.22s/it]  7%|██████▏                                                                             | 1487/20117 [55:13<11:31:11,  2.23s/it]  7%|██████▏                                                                             | 1488/20117 [55:15<11:26:15,  2.21s/it]  7%|██████▏                                                                             | 1489/20117 [55:17<11:33:18,  2.23s/it]  7%|██████▏                                                                             | 1490/20117 [55:19<11:28:13,  2.22s/it]                                                                                                                                 {'loss': 0.3394, 'grad_norm': 0.44152265787124634, 'learning_rate': 0.00019763323217335182, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 408.81, 'epoch': 0.15}
  7%|██████▏                                                                             | 1490/20117 [55:19<11:28:13,  2.22s/it]  7%|██████▏                                                                             | 1491/20117 [55:22<11:24:11,  2.20s/it]  7%|██████▏                                                                             | 1492/20117 [55:24<11:25:55,  2.21s/it]  7%|██████▏                                                                             | 1493/20117 [55:26<11:23:26,  2.20s/it]  7%|██████▏                                                                             | 1494/20117 [55:28<11:21:03,  2.19s/it]  7%|██████▏                                                                             | 1495/20117 [55:30<11:16:51,  2.18s/it]  7%|██████▏                                                                             | 1496/20117 [55:32<11:15:57,  2.18s/it]  7%|██████▎                                                                             | 1497/20117 [55:35<11:20:16,  2.19s/it]  7%|██████▎                                                                             | 1498/20117 [55:37<11:21:11,  2.20s/it]  7%|██████▎                                                                             | 1499/20117 [55:39<11:26:41,  2.21s/it]  7%|██████▎                                                                             | 1500/20117 [55:41<11:33:10,  2.23s/it]                                                                                                                                 {'loss': 0.2692, 'grad_norm': 0.31458431482315063, 'learning_rate': 0.00019759916827929706, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 359.64, 'epoch': 0.15}
  7%|██████▎                                                                             | 1500/20117 [55:41<11:33:10,  2.23s/it]  7%|██████▎                                                                             | 1501/20117 [55:44<11:34:51,  2.24s/it]  7%|██████▎                                                                             | 1502/20117 [55:46<11:31:57,  2.23s/it]  7%|██████▎                                                                             | 1503/20117 [55:48<11:28:50,  2.22s/it]  7%|██████▎                                                                             | 1504/20117 [55:50<11:21:41,  2.20s/it]  7%|██████▎                                                                             | 1505/20117 [55:52<11:23:54,  2.20s/it]  7%|██████▎                                                                             | 1506/20117 [55:55<11:22:16,  2.20s/it]  7%|██████▎                                                                             | 1507/20117 [55:57<11:24:19,  2.21s/it]  7%|██████▎                                                                             | 1508/20117 [55:59<11:22:02,  2.20s/it]  8%|██████▎                                                                             | 1509/20117 [56:01<11:20:27,  2.19s/it]  8%|██████▎                                                                             | 1510/20117 [56:03<11:19:09,  2.19s/it]                                                                                                                                 {'loss': 0.3545, 'grad_norm': 0.48072609305381775, 'learning_rate': 0.0001975648639778628, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 374.78, 'epoch': 0.15}
  8%|██████▎                                                                             | 1510/20117 [56:03<11:19:09,  2.19s/it]  8%|██████▎                                                                             | 1511/20117 [56:06<11:23:44,  2.20s/it]  8%|██████▎                                                                             | 1512/20117 [56:08<11:25:53,  2.21s/it]  8%|██████▎                                                                             | 1513/20117 [56:10<11:27:19,  2.22s/it]  8%|██████▎                                                                             | 1514/20117 [56:12<11:28:25,  2.22s/it]  8%|██████▎                                                                             | 1515/20117 [56:14<11:20:05,  2.19s/it]  8%|██████▎                                                                             | 1516/20117 [56:17<11:21:03,  2.20s/it]  8%|██████▎                                                                             | 1517/20117 [56:19<11:29:08,  2.22s/it]  8%|██████▎                                                                             | 1518/20117 [56:21<11:25:01,  2.21s/it]  8%|██████▎                                                                             | 1519/20117 [56:23<11:18:11,  2.19s/it]  8%|██████▎                                                                             | 1520/20117 [56:25<11:17:13,  2.18s/it]                                                                                                                                 {'loss': 0.2109, 'grad_norm': 0.30275699496269226, 'learning_rate': 0.00019753031935354777, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 371.06, 'epoch': 0.15}
  8%|██████▎                                                                             | 1520/20117 [56:25<11:17:13,  2.18s/it]  8%|██████▎                                                                             | 1521/20117 [56:28<11:29:59,  2.23s/it]  8%|██████▎                                                                             | 1522/20117 [56:30<11:46:42,  2.28s/it]  8%|██████▎                                                                             | 1523/20117 [56:33<12:08:21,  2.35s/it]  8%|██████▎                                                                             | 1524/20117 [56:35<12:07:22,  2.35s/it]  8%|██████▎                                                                             | 1525/20117 [56:37<11:53:04,  2.30s/it]  8%|██████▎                                                                             | 1526/20117 [56:39<11:47:14,  2.28s/it]  8%|██████▍                                                                             | 1527/20117 [56:42<12:07:09,  2.35s/it]  8%|██████▍                                                                             | 1528/20117 [56:44<11:58:12,  2.32s/it]  8%|██████▍                                                                             | 1529/20117 [56:46<11:44:27,  2.27s/it]  8%|██████▍                                                                             | 1530/20117 [56:49<11:38:44,  2.26s/it]                                                                                                                                 {'loss': 0.2435, 'grad_norm': 0.5390923619270325, 'learning_rate': 0.00019749553449144267, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 368.76, 'epoch': 0.15}
  8%|██████▍                                                                             | 1530/20117 [56:49<11:38:44,  2.26s/it]  8%|██████▍                                                                             | 1531/20117 [56:51<11:32:04,  2.23s/it]  8%|██████▍                                                                             | 1532/20117 [56:53<11:32:26,  2.24s/it]  8%|██████▍                                                                             | 1533/20117 [56:55<11:43:14,  2.27s/it]  8%|██████▍                                                                             | 1534/20117 [56:58<11:42:47,  2.27s/it]  8%|██████▍                                                                             | 1535/20117 [57:00<11:41:12,  2.26s/it]  8%|██████▍                                                                             | 1536/20117 [57:02<11:37:03,  2.25s/it]  8%|██████▍                                                                             | 1537/20117 [57:04<11:36:50,  2.25s/it]  8%|██████▍                                                                             | 1538/20117 [57:07<11:38:33,  2.26s/it]  8%|██████▍                                                                             | 1539/20117 [57:09<11:42:02,  2.27s/it]  8%|██████▍                                                                             | 1540/20117 [57:11<11:36:24,  2.25s/it]                                                                                                                                 {'loss': 0.2105, 'grad_norm': 0.28221625089645386, 'learning_rate': 0.00019746050947722993, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 322.88, 'epoch': 0.15}
  8%|██████▍                                                                             | 1540/20117 [57:11<11:36:24,  2.25s/it]  8%|██████▍                                                                             | 1541/20117 [57:13<11:35:19,  2.25s/it]  8%|██████▍                                                                             | 1542/20117 [57:16<11:30:56,  2.23s/it]  8%|██████▍                                                                             | 1543/20117 [57:18<11:25:49,  2.22s/it]  8%|██████▍                                                                             | 1544/20117 [57:20<11:29:06,  2.23s/it]  8%|██████▍                                                                             | 1545/20117 [57:22<11:23:49,  2.21s/it]  8%|██████▍                                                                             | 1546/20117 [57:24<11:30:50,  2.23s/it]  8%|██████▍                                                                             | 1547/20117 [57:27<11:31:58,  2.24s/it]  8%|██████▍                                                                             | 1548/20117 [57:29<11:29:48,  2.23s/it]  8%|██████▍                                                                             | 1549/20117 [57:31<11:31:51,  2.24s/it]  8%|██████▍                                                                             | 1550/20117 [57:33<11:35:10,  2.25s/it]                                                                                                                                 {'loss': 0.2761, 'grad_norm': 0.3471927046775818, 'learning_rate': 0.00019742524439718363, 'memory/max_active (GiB)': 17.4, 'memory/max_allocated (GiB)': 17.4, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 331.45, 'epoch': 0.15}
  8%|██████▍                                                                             | 1550/20117 [57:33<11:35:10,  2.25s/it]  8%|██████▍                                                                             | 1551/20117 [57:36<11:31:20,  2.23s/it]  8%|██████▍                                                                             | 1552/20117 [57:38<11:27:32,  2.22s/it]  8%|██████▍                                                                             | 1553/20117 [57:40<11:28:16,  2.22s/it]  8%|██████▍                                                                             | 1554/20117 [57:42<11:30:40,  2.23s/it]  8%|██████▍                                                                             | 1555/20117 [57:45<11:31:04,  2.23s/it]  8%|██████▍                                                                             | 1556/20117 [57:47<11:40:20,  2.26s/it]  8%|██████▌                                                                             | 1557/20117 [57:49<11:33:59,  2.24s/it]  8%|██████▌                                                                             | 1558/20117 [57:51<11:37:36,  2.26s/it]  8%|██████▌                                                                             | 1559/20117 [57:54<11:34:21,  2.24s/it]  8%|██████▌                                                                             | 1560/20117 [57:56<11:36:25,  2.25s/it]                                                                                                                                 {'loss': 0.2419, 'grad_norm': 0.34601831436157227, 'learning_rate': 0.0001973897393381691, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 330.91, 'epoch': 0.16}
  8%|██████▌                                                                             | 1560/20117 [57:56<11:36:25,  2.25s/it]  8%|██████▌                                                                             | 1561/20117 [57:58<11:31:18,  2.24s/it]  8%|██████▌                                                                             | 1562/20117 [58:00<11:35:22,  2.25s/it]  8%|██████▌                                                                             | 1563/20117 [58:03<11:29:04,  2.23s/it]  8%|██████▌                                                                             | 1564/20117 [58:05<11:24:10,  2.21s/it]  8%|██████▌                                                                             | 1565/20117 [58:07<11:26:13,  2.22s/it]  8%|██████▌                                                                             | 1566/20117 [58:09<11:26:41,  2.22s/it]  8%|██████▌                                                                             | 1567/20117 [58:11<11:28:10,  2.23s/it]  8%|██████▌                                                                             | 1568/20117 [58:14<11:24:15,  2.21s/it]  8%|██████▌                                                                             | 1569/20117 [58:16<11:21:26,  2.20s/it]  8%|██████▌                                                                             | 1570/20117 [58:18<11:22:02,  2.21s/it]                                                                                                                                 {'loss': 0.2948, 'grad_norm': 0.4680122435092926, 'learning_rate': 0.00019735399438764275, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 369.5, 'epoch': 0.16}
  8%|██████▌                                                                             | 1570/20117 [58:18<11:22:02,  2.21s/it]  8%|██████▌                                                                             | 1571/20117 [58:20<11:20:28,  2.20s/it]  8%|██████▌                                                                             | 1572/20117 [58:22<11:30:43,  2.23s/it]  8%|██████▌                                                                             | 1573/20117 [58:25<11:25:38,  2.22s/it]  8%|██████▌                                                                             | 1574/20117 [58:27<11:29:29,  2.23s/it]  8%|██████▌                                                                             | 1575/20117 [58:29<11:29:22,  2.23s/it]  8%|██████▌                                                                             | 1576/20117 [58:32<11:44:21,  2.28s/it]  8%|██████▌                                                                             | 1577/20117 [58:34<11:45:23,  2.28s/it]  8%|██████▌                                                                             | 1578/20117 [58:36<11:59:22,  2.33s/it]  8%|██████▌                                                                             | 1579/20117 [58:39<12:30:54,  2.43s/it]  8%|██████▌                                                                             | 1580/20117 [58:41<12:22:00,  2.40s/it]                                                                                                                                 {'loss': 0.2865, 'grad_norm': 0.35631850361824036, 'learning_rate': 0.000197318009633652, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 331.56, 'epoch': 0.16}
  8%|██████▌                                                                             | 1580/20117 [58:41<12:22:00,  2.40s/it]  8%|██████▌                                                                             | 1581/20117 [58:44<12:17:44,  2.39s/it]  8%|██████▌                                                                             | 1582/20117 [58:46<12:13:01,  2.37s/it]  8%|██████▌                                                                             | 1583/20117 [58:48<12:16:08,  2.38s/it]  8%|██████▌                                                                             | 1584/20117 [58:51<12:02:24,  2.34s/it]  8%|██████▌                                                                             | 1585/20117 [58:53<11:51:50,  2.30s/it]  8%|██████▌                                                                             | 1586/20117 [58:55<11:42:27,  2.27s/it]  8%|██████▋                                                                             | 1587/20117 [58:57<11:40:13,  2.27s/it]  8%|██████▋                                                                             | 1588/20117 [58:59<11:34:29,  2.25s/it]  8%|██████▋                                                                             | 1589/20117 [59:02<11:31:32,  2.24s/it]  8%|██████▋                                                                             | 1590/20117 [59:04<11:35:19,  2.25s/it]                                                                                                                                 {'loss': 0.2912, 'grad_norm': 0.43517372012138367, 'learning_rate': 0.0001972817851648349, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 334.38, 'epoch': 0.16}
  8%|██████▋                                                                             | 1590/20117 [59:04<11:35:19,  2.25s/it]  8%|██████▋                                                                             | 1591/20117 [59:06<11:34:36,  2.25s/it]  8%|██████▋                                                                             | 1592/20117 [59:08<11:31:21,  2.24s/it]  8%|██████▋                                                                             | 1593/20117 [59:11<11:34:18,  2.25s/it]  8%|██████▋                                                                             | 1594/20117 [59:13<11:29:18,  2.23s/it]  8%|██████▋                                                                             | 1595/20117 [59:15<11:29:24,  2.23s/it]  8%|██████▋                                                                             | 1596/20117 [59:17<11:25:55,  2.22s/it]  8%|██████▋                                                                             | 1597/20117 [59:20<11:24:00,  2.22s/it]  8%|██████▋                                                                             | 1598/20117 [59:22<11:34:48,  2.25s/it]  8%|██████▋                                                                             | 1599/20117 [59:24<11:27:47,  2.23s/it]  8%|██████▋                                                                             | 1600/20117 [59:26<11:21:15,  2.21s/it]                                                                                                                                 {'loss': 0.2182, 'grad_norm': 0.3614802360534668, 'learning_rate': 0.00019724532107041995, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 330.71, 'epoch': 0.16}
  8%|██████▋                                                                             | 1600/20117 [59:26<11:21:15,  2.21s/it]  8%|██████▋                                                                             | 1601/20117 [59:28<11:25:22,  2.22s/it]  8%|██████▋                                                                             | 1602/20117 [59:31<11:24:21,  2.22s/it]  8%|██████▋                                                                             | 1603/20117 [59:33<11:33:12,  2.25s/it]  8%|██████▋                                                                             | 1604/20117 [59:35<11:28:27,  2.23s/it]  8%|██████▋                                                                             | 1605/20117 [59:37<11:27:45,  2.23s/it]  8%|██████▋                                                                             | 1606/20117 [59:40<11:25:10,  2.22s/it]  8%|██████▋                                                                             | 1607/20117 [59:42<11:29:19,  2.23s/it]  8%|██████▋                                                                             | 1608/20117 [59:44<11:37:37,  2.26s/it]  8%|██████▋                                                                             | 1609/20117 [59:46<11:30:41,  2.24s/it]  8%|██████▋                                                                             | 1610/20117 [59:49<11:28:44,  2.23s/it]                                                                                                                                 {'loss': 0.1887, 'grad_norm': 0.3124898672103882, 'learning_rate': 0.00019720861744022594, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 419.4, 'epoch': 0.16}
  8%|██████▋                                                                             | 1610/20117 [59:49<11:28:44,  2.23s/it]  8%|██████▋                                                                             | 1611/20117 [59:51<11:27:42,  2.23s/it]  8%|██████▋                                                                             | 1612/20117 [59:53<11:35:56,  2.26s/it]  8%|██████▋                                                                             | 1613/20117 [59:55<11:28:25,  2.23s/it]  8%|██████▋                                                                             | 1614/20117 [59:58<11:31:09,  2.24s/it]  8%|██████▌                                                                           | 1615/20117 [1:00:00<11:35:22,  2.26s/it]  8%|██████▌                                                                           | 1616/20117 [1:00:02<11:36:28,  2.26s/it]  8%|██████▌                                                                           | 1617/20117 [1:00:04<11:33:18,  2.25s/it]  8%|██████▌                                                                           | 1618/20117 [1:00:07<11:33:38,  2.25s/it]  8%|██████▌                                                                           | 1619/20117 [1:00:09<11:40:31,  2.27s/it]  8%|██████▌                                                                           | 1620/20117 [1:00:11<11:48:36,  2.30s/it]                                                                                                                                 {'loss': 0.3199, 'grad_norm': 0.37882000207901, 'learning_rate': 0.00019717167436466166, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 352.77, 'epoch': 0.16}
  8%|██████▌                                                                           | 1620/20117 [1:00:11<11:48:36,  2.30s/it]  8%|██████▌                                                                           | 1621/20117 [1:00:13<11:36:03,  2.26s/it]  8%|██████▌                                                                           | 1622/20117 [1:00:16<11:34:09,  2.25s/it]  8%|██████▌                                                                           | 1623/20117 [1:00:18<11:28:37,  2.23s/it]  8%|██████▌                                                                           | 1624/20117 [1:00:20<11:30:46,  2.24s/it]  8%|██████▌                                                                           | 1625/20117 [1:00:22<11:27:00,  2.23s/it]  8%|██████▋                                                                           | 1626/20117 [1:00:25<11:27:44,  2.23s/it]  8%|██████▋                                                                           | 1627/20117 [1:00:27<11:26:04,  2.23s/it]  8%|██████▋                                                                           | 1628/20117 [1:00:29<11:34:51,  2.25s/it]  8%|██████▋                                                                           | 1629/20117 [1:00:31<11:29:13,  2.24s/it]  8%|██████▋                                                                           | 1630/20117 [1:00:34<11:28:55,  2.24s/it]                                                                                                                                 {'loss': 0.2644, 'grad_norm': 0.47813880443573, 'learning_rate': 0.00019713449193472572, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 328.31, 'epoch': 0.16}
  8%|██████▋                                                                           | 1630/20117 [1:00:34<11:28:55,  2.24s/it]  8%|██████▋                                                                           | 1631/20117 [1:00:36<11:29:24,  2.24s/it]  8%|██████▋                                                                           | 1632/20117 [1:00:38<11:24:05,  2.22s/it]  8%|██████▋                                                                           | 1633/20117 [1:00:40<11:23:52,  2.22s/it]  8%|██████▋                                                                           | 1634/20117 [1:00:43<11:51:05,  2.31s/it]  8%|██████▋                                                                           | 1635/20117 [1:00:45<11:48:03,  2.30s/it]  8%|██████▋                                                                           | 1636/20117 [1:00:47<11:46:47,  2.29s/it]  8%|██████▋                                                                           | 1637/20117 [1:00:49<11:37:10,  2.26s/it]  8%|██████▋                                                                           | 1638/20117 [1:00:52<11:32:46,  2.25s/it]  8%|██████▋                                                                           | 1639/20117 [1:00:54<11:28:02,  2.23s/it]  8%|██████▋                                                                           | 1640/20117 [1:00:56<11:23:57,  2.22s/it]                                                                                                                                 {'loss': 0.2157, 'grad_norm': 0.4390937387943268, 'learning_rate': 0.00019709707024200633, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 337.97, 'epoch': 0.16}
  8%|██████▋                                                                           | 1640/20117 [1:00:56<11:23:57,  2.22s/it]  8%|██████▋                                                                           | 1641/20117 [1:00:58<11:38:57,  2.27s/it]  8%|██████▋                                                                           | 1642/20117 [1:01:01<11:44:37,  2.29s/it]  8%|██████▋                                                                           | 1643/20117 [1:01:03<11:48:52,  2.30s/it]  8%|██████▋                                                                           | 1644/20117 [1:01:05<11:42:55,  2.28s/it]  8%|██████▋                                                                           | 1645/20117 [1:01:08<11:38:45,  2.27s/it]  8%|██████▋                                                                           | 1646/20117 [1:01:10<11:38:53,  2.27s/it]  8%|██████▋                                                                           | 1647/20117 [1:01:12<11:43:08,  2.28s/it]  8%|██████▋                                                                           | 1648/20117 [1:01:14<11:33:33,  2.25s/it]  8%|██████▋                                                                           | 1649/20117 [1:01:17<11:29:26,  2.24s/it]  8%|██████▋                                                                           | 1650/20117 [1:01:19<11:25:09,  2.23s/it]                                                                                                                                 {'loss': 0.2301, 'grad_norm': 0.28492602705955505, 'learning_rate': 0.00019705940937868096, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 377.43, 'epoch': 0.16}
  8%|██████▋                                                                           | 1650/20117 [1:01:19<11:25:09,  2.23s/it]  8%|██████▋                                                                           | 1651/20117 [1:01:21<11:26:20,  2.23s/it]  8%|██████▋                                                                           | 1652/20117 [1:01:23<11:25:41,  2.23s/it]  8%|██████▋                                                                           | 1653/20117 [1:01:26<11:35:21,  2.26s/it]  8%|██████▋                                                                           | 1654/20117 [1:01:28<11:29:18,  2.24s/it]  8%|██████▋                                                                           | 1655/20117 [1:01:30<11:30:14,  2.24s/it]  8%|██████▊                                                                           | 1656/20117 [1:01:32<11:28:18,  2.24s/it]  8%|██████▊                                                                           | 1657/20117 [1:01:34<11:24:37,  2.23s/it]  8%|██████▊                                                                           | 1658/20117 [1:01:37<11:29:34,  2.24s/it]  8%|██████▊                                                                           | 1659/20117 [1:01:39<11:21:10,  2.21s/it]  8%|██████▊                                                                           | 1660/20117 [1:01:41<11:25:05,  2.23s/it]                                                                                                                                 {'loss': 0.2755, 'grad_norm': 0.41057339310646057, 'learning_rate': 0.00019702150943751636, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 364.78, 'epoch': 0.17}
  8%|██████▊                                                                           | 1660/20117 [1:01:41<11:25:05,  2.23s/it]  8%|██████▊                                                                           | 1661/20117 [1:01:43<11:25:21,  2.23s/it]  8%|██████▊                                                                           | 1662/20117 [1:01:46<11:23:15,  2.22s/it]  8%|██████▊                                                                           | 1663/20117 [1:01:48<11:28:10,  2.24s/it]  8%|██████▊                                                                           | 1664/20117 [1:01:50<11:32:40,  2.25s/it]  8%|██████▊                                                                           | 1665/20117 [1:01:52<11:40:26,  2.28s/it]  8%|██████▊                                                                           | 1666/20117 [1:01:55<11:41:53,  2.28s/it]  8%|██████▊                                                                           | 1667/20117 [1:01:57<11:37:50,  2.27s/it]  8%|██████▊                                                                           | 1668/20117 [1:01:59<11:38:12,  2.27s/it]  8%|██████▊                                                                           | 1669/20117 [1:02:02<11:41:55,  2.28s/it]  8%|██████▊                                                                           | 1670/20117 [1:02:04<11:43:29,  2.29s/it]                                                                                                                                 {'loss': 0.2254, 'grad_norm': 0.36004287004470825, 'learning_rate': 0.00019698337051186803, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 348.57, 'epoch': 0.17}
  8%|██████▊                                                                           | 1670/20117 [1:02:04<11:43:29,  2.29s/it]  8%|██████▊                                                                           | 1671/20117 [1:02:06<11:37:56,  2.27s/it]  8%|██████▊                                                                           | 1672/20117 [1:02:08<11:34:03,  2.26s/it]  8%|██████▊                                                                           | 1673/20117 [1:02:11<11:44:33,  2.29s/it]  8%|██████▊                                                                           | 1674/20117 [1:02:13<11:42:58,  2.29s/it]  8%|██████▊                                                                           | 1675/20117 [1:02:15<11:39:47,  2.28s/it]  8%|██████▊                                                                           | 1676/20117 [1:02:17<11:36:22,  2.27s/it]  8%|██████▊                                                                           | 1677/20117 [1:02:20<11:29:25,  2.24s/it]  8%|██████▊                                                                           | 1678/20117 [1:02:22<11:26:36,  2.23s/it]  8%|██████▊                                                                           | 1679/20117 [1:02:24<11:38:01,  2.27s/it]  8%|██████▊                                                                           | 1680/20117 [1:02:26<11:34:02,  2.26s/it]                                                                                                                                 {'loss': 0.2556, 'grad_norm': 0.36488133668899536, 'learning_rate': 0.00019694499269568022, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 338.09, 'epoch': 0.17}
  8%|██████▊                                                                           | 1680/20117 [1:02:26<11:34:02,  2.26s/it]  8%|██████▊                                                                           | 1681/20117 [1:02:29<11:32:40,  2.25s/it]  8%|██████▊                                                                           | 1682/20117 [1:02:31<11:33:57,  2.26s/it]  8%|██████▊                                                                           | 1683/20117 [1:02:33<11:30:20,  2.25s/it]  8%|██████▊                                                                           | 1684/20117 [1:02:35<11:28:30,  2.24s/it]  8%|██████▊                                                                           | 1685/20117 [1:02:38<11:33:14,  2.26s/it]  8%|██████▊                                                                           | 1686/20117 [1:02:40<11:28:43,  2.24s/it]  8%|██████▉                                                                           | 1687/20117 [1:02:43<12:01:00,  2.35s/it]  8%|██████▉                                                                           | 1688/20117 [1:02:45<11:56:27,  2.33s/it]  8%|██████▉                                                                           | 1689/20117 [1:02:47<11:46:21,  2.30s/it]  8%|██████▉                                                                           | 1690/20117 [1:02:49<11:50:44,  2.31s/it]                                                                                                                                 {'loss': 0.2765, 'grad_norm': 0.1386333703994751, 'learning_rate': 0.00019690637608348562, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 279.22, 'epoch': 0.17}
  8%|██████▉                                                                           | 1690/20117 [1:02:49<11:50:44,  2.31s/it]  8%|██████▉                                                                           | 1691/20117 [1:02:52<11:47:07,  2.30s/it]  8%|██████▉                                                                           | 1692/20117 [1:02:54<11:41:54,  2.29s/it]  8%|██████▉                                                                           | 1693/20117 [1:02:56<11:37:24,  2.27s/it]  8%|██████▉                                                                           | 1694/20117 [1:02:58<11:36:21,  2.27s/it]  8%|██████▉                                                                           | 1695/20117 [1:03:01<11:34:36,  2.26s/it]  8%|██████▉                                                                           | 1696/20117 [1:03:03<11:35:12,  2.26s/it]  8%|██████▉                                                                           | 1697/20117 [1:03:05<11:34:42,  2.26s/it]  8%|██████▉                                                                           | 1698/20117 [1:03:07<11:32:51,  2.26s/it]  8%|██████▉                                                                           | 1699/20117 [1:03:10<11:25:10,  2.23s/it]  8%|██████▉                                                                           | 1700/20117 [1:03:12<11:24:43,  2.23s/it]                                                                                                                                 {'loss': 0.2745, 'grad_norm': 0.30839794874191284, 'learning_rate': 0.00019686752077040505, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 363.03, 'epoch': 0.17}
  8%|██████▉                                                                           | 1700/20117 [1:03:12<11:24:43,  2.23s/it]  8%|██████▉                                                                           | 1701/20117 [1:03:14<11:22:45,  2.22s/it]  8%|██████▉                                                                           | 1702/20117 [1:03:16<11:20:30,  2.22s/it]  8%|██████▉                                                                           | 1703/20117 [1:03:18<11:18:59,  2.21s/it]  8%|██████▉                                                                           | 1704/20117 [1:03:21<11:19:33,  2.21s/it]  8%|██████▉                                                                           | 1705/20117 [1:03:23<11:28:40,  2.24s/it]  8%|██████▉                                                                           | 1706/20117 [1:03:25<11:34:21,  2.26s/it]  8%|██████▉                                                                           | 1707/20117 [1:03:27<11:30:10,  2.25s/it]  8%|██████▉                                                                           | 1708/20117 [1:03:30<11:28:21,  2.24s/it]  8%|██████▉                                                                           | 1709/20117 [1:03:32<11:29:59,  2.25s/it]  9%|██████▉                                                                           | 1710/20117 [1:03:34<11:30:03,  2.25s/it]                                                                                                                                 {'loss': 0.2415, 'grad_norm': 0.48986828327178955, 'learning_rate': 0.00019682842685214745, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 321.69, 'epoch': 0.17}
  9%|██████▉                                                                           | 1710/20117 [1:03:34<11:30:03,  2.25s/it]  9%|██████▉                                                                           | 1711/20117 [1:03:36<11:28:57,  2.25s/it]  9%|██████▉                                                                           | 1712/20117 [1:03:39<11:27:40,  2.24s/it]  9%|██████▉                                                                           | 1713/20117 [1:03:41<11:22:21,  2.22s/it]  9%|██████▉                                                                           | 1714/20117 [1:03:43<11:19:44,  2.22s/it]  9%|██████▉                                                                           | 1715/20117 [1:03:45<11:16:10,  2.20s/it]  9%|██████▉                                                                           | 1716/20117 [1:03:48<11:18:39,  2.21s/it]  9%|██████▉                                                                           | 1717/20117 [1:03:50<11:25:09,  2.23s/it]  9%|███████                                                                           | 1718/20117 [1:03:52<11:34:10,  2.26s/it]  9%|███████                                                                           | 1719/20117 [1:03:54<11:35:49,  2.27s/it]  9%|███████                                                                           | 1720/20117 [1:03:57<11:32:35,  2.26s/it]                                                                                                                                 {'loss': 0.2618, 'grad_norm': 0.25523653626441956, 'learning_rate': 0.00019678909442500937, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 312.41, 'epoch': 0.17}
  9%|███████                                                                           | 1720/20117 [1:03:57<11:32:35,  2.26s/it]  9%|███████                                                                           | 1721/20117 [1:03:59<11:32:43,  2.26s/it]  9%|███████                                                                           | 1722/20117 [1:04:01<11:48:55,  2.31s/it]  9%|███████                                                                           | 1723/20117 [1:04:04<11:43:53,  2.30s/it]  9%|███████                                                                           | 1724/20117 [1:04:06<11:39:04,  2.28s/it]  9%|███████                                                                           | 1725/20117 [1:04:08<11:33:38,  2.26s/it]  9%|███████                                                                           | 1726/20117 [1:04:10<11:34:19,  2.27s/it]  9%|███████                                                                           | 1727/20117 [1:04:13<11:34:52,  2.27s/it]  9%|███████                                                                           | 1728/20117 [1:04:15<11:36:57,  2.27s/it]  9%|███████                                                                           | 1729/20117 [1:04:17<11:35:01,  2.27s/it]  9%|███████                                                                           | 1730/20117 [1:04:19<11:33:38,  2.26s/it]                                                                                                                                 {'loss': 0.2569, 'grad_norm': 0.1990797519683838, 'learning_rate': 0.00019674952358587488, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 338.48, 'epoch': 0.17}
  9%|███████                                                                           | 1730/20117 [1:04:19<11:33:38,  2.26s/it]  9%|███████                                                                           | 1731/20117 [1:04:22<11:29:27,  2.25s/it]  9%|███████                                                                           | 1732/20117 [1:04:24<11:28:22,  2.25s/it]  9%|███████                                                                           | 1733/20117 [1:04:26<11:32:28,  2.26s/it]  9%|███████                                                                           | 1734/20117 [1:04:29<12:04:23,  2.36s/it]  9%|███████                                                                           | 1735/20117 [1:04:31<12:00:03,  2.35s/it]  9%|███████                                                                           | 1736/20117 [1:04:33<12:02:20,  2.36s/it]  9%|███████                                                                           | 1737/20117 [1:04:36<11:58:39,  2.35s/it]  9%|███████                                                                           | 1738/20117 [1:04:38<11:53:03,  2.33s/it]  9%|███████                                                                           | 1739/20117 [1:04:41<12:11:17,  2.39s/it]  9%|███████                                                                           | 1740/20117 [1:04:43<11:51:37,  2.32s/it]                                                                                                                                 {'loss': 0.2789, 'grad_norm': 1.2966935634613037, 'learning_rate': 0.00019670971443221528, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 377.96, 'epoch': 0.17}
  9%|███████                                                                           | 1740/20117 [1:04:43<11:51:37,  2.32s/it]  9%|███████                                                                           | 1741/20117 [1:04:45<11:47:14,  2.31s/it]  9%|███████                                                                           | 1742/20117 [1:04:47<11:38:26,  2.28s/it]  9%|███████                                                                           | 1743/20117 [1:04:50<11:39:24,  2.28s/it]  9%|███████                                                                           | 1744/20117 [1:04:52<11:39:58,  2.29s/it]  9%|███████                                                                           | 1745/20117 [1:04:54<11:44:40,  2.30s/it]  9%|███████                                                                           | 1746/20117 [1:04:56<11:47:29,  2.31s/it]  9%|███████                                                                           | 1747/20117 [1:04:59<11:39:22,  2.28s/it]  9%|███████▏                                                                          | 1748/20117 [1:05:01<11:35:49,  2.27s/it]  9%|███████▏                                                                          | 1749/20117 [1:05:03<11:31:34,  2.26s/it]  9%|███████▏                                                                          | 1750/20117 [1:05:05<11:26:08,  2.24s/it]                                                                                                                                 {'loss': 0.2537, 'grad_norm': 0.4156191647052765, 'learning_rate': 0.00019666966706208898, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 423.21, 'epoch': 0.17}
  9%|███████▏                                                                          | 1750/20117 [1:05:05<11:26:08,  2.24s/it]  9%|███████▏                                                                          | 1751/20117 [1:05:08<11:21:25,  2.23s/it]  9%|███████▏                                                                          | 1752/20117 [1:05:10<11:24:04,  2.23s/it]  9%|███████▏                                                                          | 1753/20117 [1:05:12<11:24:27,  2.24s/it]  9%|███████▏                                                                          | 1754/20117 [1:05:14<11:30:20,  2.26s/it]  9%|███████▏                                                                          | 1755/20117 [1:05:17<11:34:28,  2.27s/it]  9%|███████▏                                                                          | 1756/20117 [1:05:19<11:32:33,  2.26s/it]  9%|███████▏                                                                          | 1757/20117 [1:05:21<11:27:48,  2.25s/it]  9%|███████▏                                                                          | 1758/20117 [1:05:23<11:26:19,  2.24s/it]  9%|███████▏                                                                          | 1759/20117 [1:05:26<11:22:52,  2.23s/it]  9%|███████▏                                                                          | 1760/20117 [1:05:28<11:27:59,  2.25s/it]                                                                                                                                 {'loss': 0.3316, 'grad_norm': 0.43119025230407715, 'learning_rate': 0.00019662938157414113, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 385.14, 'epoch': 0.17}
  9%|███████▏                                                                          | 1760/20117 [1:05:28<11:27:59,  2.25s/it]  9%|███████▏                                                                          | 1761/20117 [1:05:30<11:28:32,  2.25s/it]  9%|███████▏                                                                          | 1762/20117 [1:05:32<11:28:37,  2.25s/it]  9%|███████▏                                                                          | 1763/20117 [1:05:35<11:24:03,  2.24s/it]  9%|███████▏                                                                          | 1764/20117 [1:05:37<11:27:35,  2.25s/it]  9%|███████▏                                                                          | 1765/20117 [1:05:39<11:49:39,  2.32s/it]  9%|███████▏                                                                          | 1766/20117 [1:05:42<11:44:26,  2.30s/it]  9%|███████▏                                                                          | 1767/20117 [1:05:44<11:35:15,  2.27s/it]  9%|███████▏                                                                          | 1768/20117 [1:05:46<11:30:24,  2.26s/it]  9%|███████▏                                                                          | 1769/20117 [1:05:48<11:24:06,  2.24s/it]  9%|███████▏                                                                          | 1770/20117 [1:05:50<11:16:10,  2.21s/it]                                                                                                                                 {'loss': 0.2969, 'grad_norm': 0.4492398798465729, 'learning_rate': 0.00019658885806760336, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 364.99, 'epoch': 0.18}
  9%|███████▏                                                                          | 1770/20117 [1:05:50<11:16:10,  2.21s/it]  9%|███████▏                                                                          | 1771/20117 [1:05:53<11:20:24,  2.23s/it]  9%|███████▏                                                                          | 1772/20117 [1:05:55<11:15:06,  2.21s/it]  9%|███████▏                                                                          | 1773/20117 [1:05:57<11:11:47,  2.20s/it]  9%|███████▏                                                                          | 1774/20117 [1:05:59<11:19:41,  2.22s/it]  9%|███████▏                                                                          | 1775/20117 [1:06:01<11:15:38,  2.21s/it]  9%|███████▏                                                                          | 1776/20117 [1:06:04<11:14:59,  2.21s/it]  9%|███████▏                                                                          | 1777/20117 [1:06:06<11:18:33,  2.22s/it]  9%|███████▏                                                                          | 1778/20117 [1:06:08<11:18:50,  2.22s/it]  9%|███████▎                                                                          | 1779/20117 [1:06:10<11:29:33,  2.26s/it]  9%|███████▎                                                                          | 1780/20117 [1:06:13<11:32:47,  2.27s/it]                                                                                                                                 {'loss': 0.291, 'grad_norm': 0.4664623737335205, 'learning_rate': 0.00019654809664229364, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 295.57, 'epoch': 0.18}
  9%|███████▎                                                                          | 1780/20117 [1:06:13<11:32:47,  2.27s/it]  9%|███████▎                                                                          | 1781/20117 [1:06:15<11:28:37,  2.25s/it]  9%|███████▎                                                                          | 1782/20117 [1:06:17<11:29:54,  2.26s/it]  9%|███████▎                                                                          | 1783/20117 [1:06:19<11:30:33,  2.26s/it]  9%|███████▎                                                                          | 1784/20117 [1:06:22<11:32:06,  2.27s/it]  9%|███████▎                                                                          | 1785/20117 [1:06:24<11:27:34,  2.25s/it]  9%|███████▎                                                                          | 1786/20117 [1:06:26<11:28:58,  2.26s/it]  9%|███████▎                                                                          | 1787/20117 [1:06:29<11:44:16,  2.31s/it]  9%|███████▎                                                                          | 1788/20117 [1:06:31<11:45:07,  2.31s/it]  9%|███████▎                                                                          | 1789/20117 [1:06:33<11:38:09,  2.29s/it]  9%|███████▎                                                                          | 1790/20117 [1:06:36<12:09:46,  2.39s/it]                                                                                                                                 {'loss': 0.295, 'grad_norm': 0.46608006954193115, 'learning_rate': 0.000196507097398616, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 284.68, 'epoch': 0.18}
  9%|███████▎                                                                          | 1790/20117 [1:06:36<12:09:46,  2.39s/it]  9%|███████▎                                                                          | 1791/20117 [1:06:38<11:50:26,  2.33s/it]  9%|███████▎                                                                          | 1792/20117 [1:06:40<11:42:12,  2.30s/it]  9%|███████▎                                                                          | 1793/20117 [1:06:42<11:34:49,  2.28s/it]  9%|███████▎                                                                          | 1794/20117 [1:06:45<11:38:33,  2.29s/it]  9%|███████▎                                                                          | 1795/20117 [1:06:47<11:57:16,  2.35s/it]  9%|███████▎                                                                          | 1796/20117 [1:06:50<11:51:45,  2.33s/it]  9%|███████▎                                                                          | 1797/20117 [1:06:52<11:45:43,  2.31s/it]  9%|███████▎                                                                          | 1798/20117 [1:06:54<11:44:23,  2.31s/it]  9%|███████▎                                                                          | 1799/20117 [1:06:56<11:45:13,  2.31s/it]  9%|███████▎                                                                          | 1800/20117 [1:06:59<11:34:52,  2.28s/it]                                                                                                                                 {'loss': 0.2396, 'grad_norm': 0.378513365983963, 'learning_rate': 0.00019646586043756023, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 325.71, 'epoch': 0.18}
  9%|███████▎                                                                          | 1800/20117 [1:06:59<11:34:52,  2.28s/it]  9%|███████▎                                                                          | 1801/20117 [1:07:01<11:30:19,  2.26s/it]  9%|███████▎                                                                          | 1802/20117 [1:07:03<11:19:22,  2.23s/it]  9%|███████▎                                                                          | 1803/20117 [1:07:05<11:20:34,  2.23s/it]  9%|███████▎                                                                          | 1804/20117 [1:07:08<11:23:40,  2.24s/it]  9%|███████▎                                                                          | 1805/20117 [1:07:10<11:29:20,  2.26s/it]  9%|███████▎                                                                          | 1806/20117 [1:07:12<11:29:05,  2.26s/it]  9%|███████▎                                                                          | 1807/20117 [1:07:14<11:31:11,  2.26s/it]  9%|███████▎                                                                          | 1808/20117 [1:07:17<11:30:11,  2.26s/it]  9%|███████▎                                                                          | 1809/20117 [1:07:19<11:29:19,  2.26s/it]  9%|███████▍                                                                          | 1810/20117 [1:07:21<11:24:04,  2.24s/it]                                                                                                                                 {'loss': 0.2364, 'grad_norm': 0.36179885268211365, 'learning_rate': 0.00019642438586070168, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 349.38, 'epoch': 0.18}
  9%|███████▍                                                                          | 1810/20117 [1:07:21<11:24:04,  2.24s/it]  9%|███████▍                                                                          | 1811/20117 [1:07:23<11:19:49,  2.23s/it]  9%|███████▍                                                                          | 1812/20117 [1:07:26<11:22:46,  2.24s/it]  9%|███████▍                                                                          | 1813/20117 [1:07:28<11:17:22,  2.22s/it]  9%|███████▍                                                                          | 1814/20117 [1:07:30<11:22:26,  2.24s/it]  9%|███████▍                                                                          | 1815/20117 [1:07:32<11:17:32,  2.22s/it]  9%|███████▍                                                                          | 1816/20117 [1:07:34<11:24:27,  2.24s/it]  9%|███████▍                                                                          | 1817/20117 [1:07:37<11:24:51,  2.25s/it]  9%|███████▍                                                                          | 1818/20117 [1:07:39<11:21:10,  2.23s/it]  9%|███████▍                                                                          | 1819/20117 [1:07:41<11:22:22,  2.24s/it]  9%|███████▍                                                                          | 1820/20117 [1:07:43<11:22:01,  2.24s/it]                                                                                                                                 {'loss': 0.223, 'grad_norm': 0.31644207239151, 'learning_rate': 0.000196382673770201, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 320.59, 'epoch': 0.18}
  9%|███████▍                                                                          | 1820/20117 [1:07:43<11:22:01,  2.24s/it]  9%|███████▍                                                                          | 1821/20117 [1:07:46<11:19:25,  2.23s/it]  9%|███████▍                                                                          | 1822/20117 [1:07:48<11:18:27,  2.23s/it]  9%|███████▍                                                                          | 1823/20117 [1:07:50<11:19:01,  2.23s/it]  9%|███████▍                                                                          | 1824/20117 [1:07:52<11:21:01,  2.23s/it]  9%|███████▍                                                                          | 1825/20117 [1:07:55<11:23:57,  2.24s/it]  9%|███████▍                                                                          | 1826/20117 [1:07:57<11:27:03,  2.25s/it]  9%|███████▍                                                                          | 1827/20117 [1:07:59<11:27:45,  2.26s/it]  9%|███████▍                                                                          | 1828/20117 [1:08:01<11:31:53,  2.27s/it]  9%|███████▍                                                                          | 1829/20117 [1:08:04<11:31:45,  2.27s/it]  9%|███████▍                                                                          | 1830/20117 [1:08:06<11:24:50,  2.25s/it]                                                                                                                                 {'loss': 0.2641, 'grad_norm': 0.27562472224235535, 'learning_rate': 0.00019634072426880382, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 386.93, 'epoch': 0.18}
  9%|███████▍                                                                          | 1830/20117 [1:08:06<11:24:50,  2.25s/it]  9%|███████▍                                                                          | 1831/20117 [1:08:08<11:21:05,  2.23s/it]  9%|███████▍                                                                          | 1832/20117 [1:08:10<11:18:33,  2.23s/it]  9%|███████▍                                                                          | 1833/20117 [1:08:13<11:21:40,  2.24s/it]  9%|███████▍                                                                          | 1834/20117 [1:08:15<11:16:51,  2.22s/it]  9%|███████▍                                                                          | 1835/20117 [1:08:17<11:16:17,  2.22s/it]  9%|███████▍                                                                          | 1836/20117 [1:08:19<11:17:00,  2.22s/it]  9%|███████▍                                                                          | 1837/20117 [1:08:21<11:13:31,  2.21s/it]  9%|███████▍                                                                          | 1838/20117 [1:08:24<11:17:22,  2.22s/it]  9%|███████▍                                                                          | 1839/20117 [1:08:26<11:20:49,  2.23s/it]  9%|███████▌                                                                          | 1840/20117 [1:08:28<11:16:40,  2.22s/it]                                                                                                                                 {'loss': 0.2167, 'grad_norm': 0.4514225423336029, 'learning_rate': 0.00019629853745984076, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 350.66, 'epoch': 0.18}
  9%|███████▌                                                                          | 1840/20117 [1:08:28<11:16:40,  2.22s/it]  9%|███████▌                                                                          | 1841/20117 [1:08:30<11:18:14,  2.23s/it]  9%|███████▌                                                                          | 1842/20117 [1:08:33<11:52:12,  2.34s/it]  9%|███████▌                                                                          | 1843/20117 [1:08:35<11:43:00,  2.31s/it]  9%|███████▌                                                                          | 1844/20117 [1:08:37<11:40:44,  2.30s/it]  9%|███████▌                                                                          | 1845/20117 [1:08:40<11:30:06,  2.27s/it]  9%|███████▌                                                                          | 1846/20117 [1:08:42<11:25:57,  2.25s/it]  9%|███████▌                                                                          | 1847/20117 [1:08:44<11:26:41,  2.26s/it]  9%|███████▌                                                                          | 1848/20117 [1:08:46<11:31:56,  2.27s/it]  9%|███████▌                                                                          | 1849/20117 [1:08:49<11:24:28,  2.25s/it]  9%|███████▌                                                                          | 1850/20117 [1:08:51<11:21:20,  2.24s/it]                                                                                                                                 {'loss': 0.2429, 'grad_norm': 0.43807438015937805, 'learning_rate': 0.00019625611344722675, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 390.61, 'epoch': 0.18}
  9%|███████▌                                                                          | 1850/20117 [1:08:51<11:21:20,  2.24s/it]  9%|███████▌                                                                          | 1851/20117 [1:08:53<11:23:38,  2.25s/it]  9%|███████▌                                                                          | 1852/20117 [1:08:55<11:26:16,  2.25s/it]  9%|███████▌                                                                          | 1853/20117 [1:08:58<11:23:29,  2.25s/it]  9%|███████▌                                                                          | 1854/20117 [1:09:00<11:21:30,  2.24s/it]  9%|███████▌                                                                          | 1855/20117 [1:09:02<11:23:05,  2.24s/it]  9%|███████▌                                                                          | 1856/20117 [1:09:04<11:22:58,  2.24s/it]  9%|███████▌                                                                          | 1857/20117 [1:09:07<11:26:51,  2.26s/it]  9%|███████▌                                                                          | 1858/20117 [1:09:09<11:24:10,  2.25s/it]  9%|███████▌                                                                          | 1859/20117 [1:09:11<11:30:09,  2.27s/it]  9%|███████▌                                                                          | 1860/20117 [1:09:13<11:29:48,  2.27s/it]                                                                                                                                 {'loss': 0.2565, 'grad_norm': 0.3817460536956787, 'learning_rate': 0.00019621345233546115, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 366.89, 'epoch': 0.18}
  9%|███████▌                                                                          | 1860/20117 [1:09:13<11:29:48,  2.27s/it]  9%|███████▌                                                                          | 1861/20117 [1:09:16<11:28:14,  2.26s/it]  9%|███████▌                                                                          | 1862/20117 [1:09:18<11:22:37,  2.24s/it]  9%|███████▌                                                                          | 1863/20117 [1:09:20<11:24:07,  2.25s/it]  9%|███████▌                                                                          | 1864/20117 [1:09:22<11:32:48,  2.28s/it]  9%|███████▌                                                                          | 1865/20117 [1:09:25<11:29:46,  2.27s/it]  9%|███████▌                                                                          | 1866/20117 [1:09:27<11:25:11,  2.25s/it]  9%|███████▌                                                                          | 1867/20117 [1:09:29<11:20:41,  2.24s/it]  9%|███████▌                                                                          | 1868/20117 [1:09:31<11:22:43,  2.24s/it]  9%|███████▌                                                                          | 1869/20117 [1:09:34<11:34:58,  2.29s/it]  9%|███████▌                                                                          | 1870/20117 [1:09:36<11:33:48,  2.28s/it]                                                                                                                                 {'loss': 0.2815, 'grad_norm': 0.51050865650177, 'learning_rate': 0.0001961705542296272, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 314.07, 'epoch': 0.19}
  9%|███████▌                                                                          | 1870/20117 [1:09:36<11:33:48,  2.28s/it]  9%|███████▋                                                                          | 1871/20117 [1:09:38<11:31:14,  2.27s/it]  9%|███████▋                                                                          | 1872/20117 [1:09:41<11:45:16,  2.32s/it]  9%|███████▋                                                                          | 1873/20117 [1:09:43<11:55:14,  2.35s/it]  9%|███████▋                                                                          | 1874/20117 [1:09:46<12:05:35,  2.39s/it]  9%|███████▋                                                                          | 1875/20117 [1:09:48<12:04:41,  2.38s/it]  9%|███████▋                                                                          | 1876/20117 [1:09:50<12:13:28,  2.41s/it]  9%|███████▋                                                                          | 1877/20117 [1:09:53<12:16:25,  2.42s/it]  9%|███████▋                                                                          | 1878/20117 [1:09:55<12:20:27,  2.44s/it]  9%|███████▋                                                                          | 1879/20117 [1:09:58<12:06:49,  2.39s/it]  9%|███████▋                                                                          | 1880/20117 [1:10:00<12:02:15,  2.38s/it]                                                                                                                                 {'loss': 0.2353, 'grad_norm': 0.34085533022880554, 'learning_rate': 0.00019612741923539218, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 366.93, 'epoch': 0.19}
  9%|███████▋                                                                          | 1880/20117 [1:10:00<12:02:15,  2.38s/it]  9%|███████▋                                                                          | 1881/20117 [1:10:02<11:51:58,  2.34s/it]  9%|███████▋                                                                          | 1882/20117 [1:10:05<11:43:09,  2.31s/it]  9%|███████▋                                                                          | 1883/20117 [1:10:07<11:34:46,  2.29s/it]  9%|███████▋                                                                          | 1884/20117 [1:10:09<11:23:44,  2.25s/it]  9%|███████▋                                                                          | 1885/20117 [1:10:11<11:20:48,  2.24s/it]  9%|███████▋                                                                          | 1886/20117 [1:10:13<11:14:10,  2.22s/it]  9%|███████▋                                                                          | 1887/20117 [1:10:15<11:07:53,  2.20s/it]  9%|███████▋                                                                          | 1888/20117 [1:10:18<11:08:01,  2.20s/it]  9%|███████▋                                                                          | 1889/20117 [1:10:20<11:21:01,  2.24s/it]  9%|███████▋                                                                          | 1890/20117 [1:10:22<11:27:07,  2.26s/it]                                                                                                                                 {'loss': 0.2679, 'grad_norm': 0.38903746008872986, 'learning_rate': 0.00019608404745900652, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 407.24, 'epoch': 0.19}
  9%|███████▋                                                                          | 1890/20117 [1:10:22<11:27:07,  2.26s/it]  9%|███████▋                                                                          | 1891/20117 [1:10:25<11:32:52,  2.28s/it]  9%|███████▋                                                                          | 1892/20117 [1:10:27<11:36:34,  2.29s/it]  9%|███████▋                                                                          | 1893/20117 [1:10:29<11:35:58,  2.29s/it]  9%|███████▋                                                                          | 1894/20117 [1:10:31<11:34:27,  2.29s/it]  9%|███████▋                                                                          | 1895/20117 [1:10:34<11:36:20,  2.29s/it]  9%|███████▋                                                                          | 1896/20117 [1:10:36<11:38:44,  2.30s/it]  9%|███████▋                                                                          | 1897/20117 [1:10:39<12:07:12,  2.39s/it]  9%|███████▋                                                                          | 1898/20117 [1:10:41<11:55:02,  2.35s/it]  9%|███████▋                                                                          | 1899/20117 [1:10:43<11:38:55,  2.30s/it]  9%|███████▋                                                                          | 1900/20117 [1:10:45<11:32:05,  2.28s/it]                                                                                                                                 {'loss': 0.299, 'grad_norm': 0.3198868930339813, 'learning_rate': 0.00019604043900730414, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 420.81, 'epoch': 0.19}
  9%|███████▋                                                                          | 1900/20117 [1:10:45<11:32:05,  2.28s/it]  9%|███████▋                                                                          | 1901/20117 [1:10:48<11:38:02,  2.30s/it]  9%|███████▊                                                                          | 1902/20117 [1:10:50<11:39:34,  2.30s/it]  9%|███████▊                                                                          | 1903/20117 [1:10:52<11:36:15,  2.29s/it]  9%|███████▊                                                                          | 1904/20117 [1:10:55<11:34:20,  2.29s/it]  9%|███████▊                                                                          | 1905/20117 [1:10:57<11:34:42,  2.29s/it]  9%|███████▊                                                                          | 1906/20117 [1:10:59<11:34:25,  2.29s/it]  9%|███████▊                                                                          | 1907/20117 [1:11:01<11:28:18,  2.27s/it]  9%|███████▊                                                                          | 1908/20117 [1:11:04<11:24:52,  2.26s/it]  9%|███████▊                                                                          | 1909/20117 [1:11:06<11:21:11,  2.24s/it]  9%|███████▊                                                                          | 1910/20117 [1:11:08<11:14:20,  2.22s/it]                                                                                                                                 {'loss': 0.2957, 'grad_norm': 0.43067699670791626, 'learning_rate': 0.0001959965939877019, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 467.81, 'epoch': 0.19}
  9%|███████▊                                                                          | 1910/20117 [1:11:08<11:14:20,  2.22s/it]  9%|███████▊                                                                          | 1911/20117 [1:11:10<11:14:03,  2.22s/it] 10%|███████▊                                                                          | 1912/20117 [1:11:12<11:09:36,  2.21s/it] 10%|███████▊                                                                          | 1913/20117 [1:11:15<11:07:36,  2.20s/it] 10%|███████▊                                                                          | 1914/20117 [1:11:17<11:05:58,  2.20s/it] 10%|███████▊                                                                          | 1915/20117 [1:11:19<11:05:26,  2.19s/it] 10%|███████▊                                                                          | 1916/20117 [1:11:21<11:04:41,  2.19s/it] 10%|███████▊                                                                          | 1917/20117 [1:11:23<11:06:34,  2.20s/it] 10%|███████▊                                                                          | 1918/20117 [1:11:26<11:06:38,  2.20s/it] 10%|███████▊                                                                          | 1919/20117 [1:11:28<11:04:58,  2.19s/it] 10%|███████▊                                                                          | 1920/20117 [1:11:30<11:00:52,  2.18s/it]                                                                                                                                 {'loss': 0.2512, 'grad_norm': 0.3319333493709564, 'learning_rate': 0.00019595251250819932, 'memory/max_active (GiB)': 17.12, 'memory/max_allocated (GiB)': 17.12, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 283.54, 'epoch': 0.19}
 10%|███████▊                                                                          | 1920/20117 [1:11:30<11:00:52,  2.18s/it] 10%|███████▊                                                                          | 1921/20117 [1:11:32<11:04:23,  2.19s/it] 10%|███████▊                                                                          | 1922/20117 [1:11:34<11:08:04,  2.20s/it] 10%|███████▊                                                                          | 1923/20117 [1:11:37<11:09:49,  2.21s/it] 10%|███████▊                                                                          | 1924/20117 [1:11:39<11:08:32,  2.20s/it] 10%|███████▊                                                                          | 1925/20117 [1:11:41<11:09:51,  2.21s/it] 10%|███████▊                                                                          | 1926/20117 [1:11:43<11:09:12,  2.21s/it] 10%|███████▊                                                                          | 1927/20117 [1:11:45<11:09:18,  2.21s/it] 10%|███████▊                                                                          | 1928/20117 [1:11:48<11:06:16,  2.20s/it] 10%|███████▊                                                                          | 1929/20117 [1:11:50<11:09:28,  2.21s/it] 10%|███████▊                                                                          | 1930/20117 [1:11:52<11:23:30,  2.25s/it]                                                                                                                                 {'loss': 0.2627, 'grad_norm': 0.399617999792099, 'learning_rate': 0.00019590819467737837, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 314.92, 'epoch': 0.19}
 10%|███████▊                                                                          | 1930/20117 [1:11:52<11:23:30,  2.25s/it] 10%|███████▊                                                                          | 1931/20117 [1:11:54<11:29:40,  2.28s/it] 10%|███████▉                                                                          | 1932/20117 [1:11:57<11:34:37,  2.29s/it] 10%|███████▉                                                                          | 1933/20117 [1:11:59<11:34:56,  2.29s/it] 10%|███████▉                                                                          | 1934/20117 [1:12:01<11:33:23,  2.29s/it] 10%|███████▉                                                                          | 1935/20117 [1:12:04<11:24:59,  2.26s/it] 10%|███████▉                                                                          | 1936/20117 [1:12:06<11:24:53,  2.26s/it] 10%|███████▉                                                                          | 1937/20117 [1:12:08<11:21:49,  2.25s/it] 10%|███████▉                                                                          | 1938/20117 [1:12:10<11:17:24,  2.24s/it] 10%|███████▉                                                                          | 1939/20117 [1:12:12<11:10:59,  2.21s/it] 10%|███████▉                                                                          | 1940/20117 [1:12:15<11:15:21,  2.23s/it]                                                                                                                                 {'loss': 0.2705, 'grad_norm': 0.31971487402915955, 'learning_rate': 0.00019586364060440332, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 371.9, 'epoch': 0.19}
 10%|███████▉                                                                          | 1940/20117 [1:12:15<11:15:21,  2.23s/it] 10%|███████▉                                                                          | 1941/20117 [1:12:17<11:15:50,  2.23s/it] 10%|███████▉                                                                          | 1942/20117 [1:12:19<11:10:45,  2.21s/it] 10%|███████▉                                                                          | 1943/20117 [1:12:21<11:09:48,  2.21s/it] 10%|███████▉                                                                          | 1944/20117 [1:12:24<11:23:43,  2.26s/it] 10%|███████▉                                                                          | 1945/20117 [1:12:26<11:18:53,  2.24s/it] 10%|███████▉                                                                          | 1946/20117 [1:12:28<11:18:32,  2.24s/it] 10%|███████▉                                                                          | 1947/20117 [1:12:30<11:18:57,  2.24s/it] 10%|███████▉                                                                          | 1948/20117 [1:12:33<11:15:26,  2.23s/it] 10%|███████▉                                                                          | 1949/20117 [1:12:35<11:13:09,  2.22s/it] 10%|███████▉                                                                          | 1950/20117 [1:12:37<11:07:19,  2.20s/it]                                                                                                                                 {'loss': 0.2479, 'grad_norm': 0.22907792031764984, 'learning_rate': 0.0001958188503990202, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 329.97, 'epoch': 0.19}
 10%|███████▉                                                                          | 1950/20117 [1:12:37<11:07:19,  2.20s/it] 10%|███████▉                                                                          | 1951/20117 [1:12:40<11:39:50,  2.31s/it] 10%|███████▉                                                                          | 1952/20117 [1:12:42<11:30:24,  2.28s/it] 10%|███████▉                                                                          | 1953/20117 [1:12:44<11:32:42,  2.29s/it] 10%|███████▉                                                                          | 1954/20117 [1:12:46<11:22:29,  2.25s/it] 10%|███████▉                                                                          | 1955/20117 [1:12:48<11:17:27,  2.24s/it] 10%|███████▉                                                                          | 1956/20117 [1:12:51<11:14:09,  2.23s/it] 10%|███████▉                                                                          | 1957/20117 [1:12:53<11:13:19,  2.22s/it] 10%|███████▉                                                                          | 1958/20117 [1:12:55<11:14:12,  2.23s/it] 10%|███████▉                                                                          | 1959/20117 [1:12:57<11:14:03,  2.23s/it] 10%|███████▉                                                                          | 1960/20117 [1:12:59<11:11:07,  2.22s/it]                                                                                                                                 {'loss': 0.309, 'grad_norm': 0.3624865710735321, 'learning_rate': 0.00019577382417155676, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 466.41, 'epoch': 0.19}
 10%|███████▉                                                                          | 1960/20117 [1:12:59<11:11:07,  2.22s/it] 10%|███████▉                                                                          | 1961/20117 [1:13:02<11:14:26,  2.23s/it] 10%|███████▉                                                                          | 1962/20117 [1:13:04<11:08:53,  2.21s/it] 10%|████████                                                                          | 1963/20117 [1:13:06<11:13:22,  2.23s/it] 10%|████████                                                                          | 1964/20117 [1:13:08<11:08:46,  2.21s/it] 10%|████████                                                                          | 1965/20117 [1:13:11<11:08:34,  2.21s/it] 10%|████████                                                                          | 1966/20117 [1:13:13<11:10:24,  2.22s/it] 10%|████████                                                                          | 1967/20117 [1:13:15<11:10:02,  2.22s/it] 10%|████████                                                                          | 1968/20117 [1:13:17<11:11:41,  2.22s/it] 10%|████████                                                                          | 1969/20117 [1:13:19<11:08:14,  2.21s/it] 10%|████████                                                                          | 1970/20117 [1:13:22<11:11:08,  2.22s/it]                                                                                                                                 {'loss': 0.2188, 'grad_norm': 0.3285991847515106, 'learning_rate': 0.00019572856203292215, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 336.34, 'epoch': 0.2}
 10%|████████                                                                          | 1970/20117 [1:13:22<11:11:08,  2.22s/it] 10%|████████                                                                          | 1971/20117 [1:13:24<11:11:35,  2.22s/it] 10%|████████                                                                          | 1972/20117 [1:13:26<11:16:09,  2.24s/it] 10%|████████                                                                          | 1973/20117 [1:13:28<11:19:14,  2.25s/it] 10%|████████                                                                          | 1974/20117 [1:13:31<11:14:30,  2.23s/it] 10%|████████                                                                          | 1975/20117 [1:13:33<11:07:21,  2.21s/it] 10%|████████                                                                          | 1976/20117 [1:13:35<11:27:45,  2.27s/it] 10%|████████                                                                          | 1977/20117 [1:13:37<11:21:29,  2.25s/it] 10%|████████                                                                          | 1978/20117 [1:13:40<11:20:09,  2.25s/it] 10%|████████                                                                          | 1979/20117 [1:13:42<11:14:39,  2.23s/it] 10%|████████                                                                          | 1980/20117 [1:13:44<11:10:40,  2.22s/it]                                                                                                                                 {'loss': 0.2277, 'grad_norm': 0.22952090203762054, 'learning_rate': 0.00019568306409460654, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 377.15, 'epoch': 0.2}
 10%|████████                                                                          | 1980/20117 [1:13:44<11:10:40,  2.22s/it] 10%|████████                                                                          | 1981/20117 [1:13:46<11:08:22,  2.21s/it] 10%|████████                                                                          | 1982/20117 [1:13:48<11:10:32,  2.22s/it] 10%|████████                                                                          | 1983/20117 [1:13:51<11:11:28,  2.22s/it] 10%|████████                                                                          | 1984/20117 [1:13:53<11:06:25,  2.21s/it] 10%|████████                                                                          | 1985/20117 [1:13:55<11:03:08,  2.19s/it] 10%|████████                                                                          | 1986/20117 [1:13:57<11:03:49,  2.20s/it] 10%|████████                                                                          | 1987/20117 [1:13:59<11:02:19,  2.19s/it] 10%|████████                                                                          | 1988/20117 [1:14:02<11:02:00,  2.19s/it] 10%|████████                                                                          | 1989/20117 [1:14:04<11:05:21,  2.20s/it] 10%|████████                                                                          | 1990/20117 [1:14:06<11:04:52,  2.20s/it]                                                                                                                                 {'loss': 0.299, 'grad_norm': 0.38868236541748047, 'learning_rate': 0.000195637330468681, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 371.96, 'epoch': 0.2}
 10%|████████                                                                          | 1990/20117 [1:14:06<11:04:52,  2.20s/it] 10%|████████                                                                          | 1991/20117 [1:14:08<11:05:58,  2.20s/it] 10%|████████                                                                          | 1992/20117 [1:14:10<11:06:46,  2.21s/it] 10%|████████                                                                          | 1993/20117 [1:14:13<11:16:35,  2.24s/it] 10%|████████▏                                                                         | 1994/20117 [1:14:15<11:11:34,  2.22s/it] 10%|████████▏                                                                         | 1995/20117 [1:14:17<11:11:17,  2.22s/it] 10%|████████▏                                                                         | 1996/20117 [1:14:19<11:14:23,  2.23s/it] 10%|████████▏                                                                         | 1997/20117 [1:14:22<11:12:13,  2.23s/it] 10%|████████▏                                                                         | 1998/20117 [1:14:24<11:07:51,  2.21s/it] 10%|████████▏                                                                         | 1999/20117 [1:14:26<11:12:37,  2.23s/it] 10%|████████▏                                                                         | 2000/20117 [1:14:28<11:10:17,  2.22s/it]                                                                                                                                 {'loss': 0.2993, 'grad_norm': 0.30058735609054565, 'learning_rate': 0.0001955913612677971, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 418.93, 'epoch': 0.2}
 10%|████████▏                                                                         | 2000/20117 [1:14:28<11:10:17,  2.22s/it] 10%|████████▏                                                                         | 2001/20117 [1:14:30<11:10:16,  2.22s/it] 10%|████████▏                                                                         | 2002/20117 [1:14:33<11:42:13,  2.33s/it] 10%|████████▏                                                                         | 2003/20117 [1:14:35<11:34:24,  2.30s/it] 10%|████████▏                                                                         | 2004/20117 [1:14:38<11:26:35,  2.27s/it] 10%|████████▏                                                                         | 2005/20117 [1:14:40<11:21:38,  2.26s/it] 10%|████████▏                                                                         | 2006/20117 [1:14:42<11:32:56,  2.30s/it] 10%|████████▏                                                                         | 2007/20117 [1:14:44<11:39:01,  2.32s/it] 10%|████████▏                                                                         | 2008/20117 [1:14:47<11:45:57,  2.34s/it] 10%|████████▏                                                                         | 2009/20117 [1:14:49<11:45:55,  2.34s/it] 10%|████████▏                                                                         | 2010/20117 [1:14:52<11:44:49,  2.34s/it]                                                                                                                                 {'loss': 0.2894, 'grad_norm': 0.25231263041496277, 'learning_rate': 0.00019554515660518668, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 372.68, 'epoch': 0.2}
 10%|████████▏                                                                         | 2010/20117 [1:14:52<11:44:49,  2.34s/it] 10%|████████▏                                                                         | 2011/20117 [1:14:54<11:37:30,  2.31s/it] 10%|████████▏                                                                         | 2012/20117 [1:14:56<11:30:28,  2.29s/it] 10%|████████▏                                                                         | 2013/20117 [1:14:58<11:27:18,  2.28s/it] 10%|████████▏                                                                         | 2014/20117 [1:15:00<11:17:53,  2.25s/it] 10%|████████▏                                                                         | 2015/20117 [1:15:03<11:21:25,  2.26s/it] 10%|████████▏                                                                         | 2016/20117 [1:15:05<11:22:52,  2.26s/it] 10%|████████▏                                                                         | 2017/20117 [1:15:07<11:16:21,  2.24s/it] 10%|████████▏                                                                         | 2018/20117 [1:15:09<11:16:02,  2.24s/it] 10%|████████▏                                                                         | 2019/20117 [1:15:12<11:15:14,  2.24s/it] 10%|████████▏                                                                         | 2020/20117 [1:15:14<11:12:36,  2.23s/it]                                                                                                                                 {'loss': 0.2152, 'grad_norm': 0.5621690154075623, 'learning_rate': 0.00019549871659466165, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 358.38, 'epoch': 0.2}
 10%|████████▏                                                                         | 2020/20117 [1:15:14<11:12:36,  2.23s/it] 10%|████████▏                                                                         | 2021/20117 [1:15:16<11:11:03,  2.23s/it] 10%|████████▏                                                                         | 2022/20117 [1:15:18<11:03:31,  2.20s/it] 10%|████████▏                                                                         | 2023/20117 [1:15:21<11:08:50,  2.22s/it] 10%|████████▎                                                                         | 2024/20117 [1:15:23<11:09:24,  2.22s/it] 10%|████████▎                                                                         | 2025/20117 [1:15:25<11:03:31,  2.20s/it] 10%|████████▎                                                                         | 2026/20117 [1:15:27<11:04:52,  2.21s/it] 10%|████████▎                                                                         | 2027/20117 [1:15:29<11:02:28,  2.20s/it] 10%|████████▎                                                                         | 2028/20117 [1:15:31<11:02:21,  2.20s/it] 10%|████████▎                                                                         | 2029/20117 [1:15:34<11:02:50,  2.20s/it] 10%|████████▎                                                                         | 2030/20117 [1:15:36<11:04:02,  2.20s/it]                                                                                                                                 {'loss': 0.2398, 'grad_norm': 0.3443728983402252, 'learning_rate': 0.0001954520413506135, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 306.09, 'epoch': 0.2}
 10%|████████▎                                                                         | 2030/20117 [1:15:36<11:04:02,  2.20s/it] 10%|████████▎                                                                         | 2031/20117 [1:15:38<11:11:03,  2.23s/it] 10%|████████▎                                                                         | 2032/20117 [1:15:40<11:08:59,  2.22s/it] 10%|████████▎                                                                         | 2033/20117 [1:15:43<11:08:17,  2.22s/it] 10%|████████▎                                                                         | 2034/20117 [1:15:45<11:08:31,  2.22s/it] 10%|████████▎                                                                         | 2035/20117 [1:15:47<11:12:38,  2.23s/it] 10%|████████▎                                                                         | 2036/20117 [1:15:49<11:08:31,  2.22s/it] 10%|████████▎                                                                         | 2037/20117 [1:15:52<11:13:19,  2.23s/it] 10%|████████▎                                                                         | 2038/20117 [1:15:54<11:18:13,  2.25s/it] 10%|████████▎                                                                         | 2039/20117 [1:15:56<11:13:52,  2.24s/it] 10%|████████▎                                                                         | 2040/20117 [1:15:58<11:10:12,  2.22s/it]                                                                                                                                 {'loss': 0.2259, 'grad_norm': 0.40964439511299133, 'learning_rate': 0.0001954051309880133, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 371.31, 'epoch': 0.2}
 10%|████████▎                                                                         | 2040/20117 [1:15:58<11:10:12,  2.22s/it] 10%|████████▎                                                                         | 2041/20117 [1:16:00<11:06:58,  2.21s/it] 10%|████████▎                                                                         | 2042/20117 [1:16:03<11:11:04,  2.23s/it] 10%|████████▎                                                                         | 2043/20117 [1:16:05<11:28:39,  2.29s/it] 10%|████████▎                                                                         | 2044/20117 [1:16:07<11:29:59,  2.29s/it] 10%|████████▎                                                                         | 2045/20117 [1:16:10<11:26:12,  2.28s/it] 10%|████████▎                                                                         | 2046/20117 [1:16:12<11:20:51,  2.26s/it] 10%|████████▎                                                                         | 2047/20117 [1:16:14<11:19:50,  2.26s/it] 10%|████████▎                                                                         | 2048/20117 [1:16:16<11:20:30,  2.26s/it] 10%|████████▎                                                                         | 2049/20117 [1:16:19<11:21:29,  2.26s/it] 10%|████████▎                                                                         | 2050/20117 [1:16:21<11:18:13,  2.25s/it]                                                                                                                                 {'loss': 0.2681, 'grad_norm': 0.49535176157951355, 'learning_rate': 0.0001953579856224111, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 368.85, 'epoch': 0.2}
 10%|████████▎                                                                         | 2050/20117 [1:16:21<11:18:13,  2.25s/it] 10%|████████▎                                                                         | 2051/20117 [1:16:23<11:18:53,  2.25s/it] 10%|████████▎                                                                         | 2052/20117 [1:16:25<11:13:38,  2.24s/it] 10%|████████▎                                                                         | 2053/20117 [1:16:28<11:11:13,  2.23s/it] 10%|████████▎                                                                         | 2054/20117 [1:16:30<11:49:56,  2.36s/it] 10%|████████▍                                                                         | 2055/20117 [1:16:32<11:36:55,  2.32s/it] 10%|████████▍                                                                         | 2056/20117 [1:16:35<11:35:48,  2.31s/it] 10%|████████▍                                                                         | 2057/20117 [1:16:37<11:35:13,  2.31s/it] 10%|████████▍                                                                         | 2058/20117 [1:16:39<11:31:52,  2.30s/it] 10%|████████▍                                                                         | 2059/20117 [1:16:42<11:33:33,  2.30s/it] 10%|████████▍                                                                         | 2060/20117 [1:16:44<11:30:58,  2.30s/it]                                                                                                                                 {'loss': 0.2309, 'grad_norm': 0.4733733832836151, 'learning_rate': 0.00019531060536993598, 'memory/max_active (GiB)': 21.47, 'memory/max_allocated (GiB)': 21.47, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 378.99, 'epoch': 0.2}
 10%|████████▍                                                                         | 2060/20117 [1:16:44<11:30:58,  2.30s/it] 10%|████████▍                                                                         | 2061/20117 [1:16:46<11:26:28,  2.28s/it] 10%|████████▍                                                                         | 2062/20117 [1:16:48<11:25:34,  2.28s/it] 10%|████████▍                                                                         | 2063/20117 [1:16:51<11:27:44,  2.29s/it] 10%|████████▍                                                                         | 2064/20117 [1:16:53<11:43:03,  2.34s/it] 10%|████████▍                                                                         | 2065/20117 [1:16:56<11:53:18,  2.37s/it] 10%|████████▍                                                                         | 2066/20117 [1:16:58<11:50:55,  2.36s/it] 10%|████████▍                                                                         | 2067/20117 [1:17:00<11:46:26,  2.35s/it] 10%|████████▍                                                                         | 2068/20117 [1:17:02<11:32:51,  2.30s/it] 10%|████████▍                                                                         | 2069/20117 [1:17:05<11:25:03,  2.28s/it] 10%|████████▍                                                                         | 2070/20117 [1:17:07<11:15:39,  2.25s/it]                                                                                                                                 {'loss': 0.2717, 'grad_norm': 0.4018077552318573, 'learning_rate': 0.00019526299034729544, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 373.89, 'epoch': 0.21}
 10%|████████▍                                                                         | 2070/20117 [1:17:07<11:15:39,  2.25s/it] 10%|████████▍                                                                         | 2071/20117 [1:17:09<11:13:15,  2.24s/it] 10%|████████▍                                                                         | 2072/20117 [1:17:11<11:09:36,  2.23s/it] 10%|████████▍                                                                         | 2073/20117 [1:17:14<11:09:42,  2.23s/it] 10%|████████▍                                                                         | 2074/20117 [1:17:16<11:13:04,  2.24s/it] 10%|████████▍                                                                         | 2075/20117 [1:17:18<11:25:57,  2.28s/it] 10%|████████▍                                                                         | 2076/20117 [1:17:20<11:29:42,  2.29s/it] 10%|████████▍                                                                         | 2077/20117 [1:17:23<11:25:00,  2.28s/it] 10%|████████▍                                                                         | 2078/20117 [1:17:25<11:25:25,  2.28s/it] 10%|████████▍                                                                         | 2079/20117 [1:17:27<11:19:16,  2.26s/it] 10%|████████▍                                                                         | 2080/20117 [1:17:29<11:17:43,  2.25s/it]                                                                                                                                 {'loss': 0.2604, 'grad_norm': 0.18390242755413055, 'learning_rate': 0.0001952151406717754, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 339.07, 'epoch': 0.21}
 10%|████████▍                                                                         | 2080/20117 [1:17:29<11:17:43,  2.25s/it] 10%|████████▍                                                                         | 2081/20117 [1:17:32<11:22:41,  2.27s/it] 10%|████████▍                                                                         | 2082/20117 [1:17:34<11:11:03,  2.23s/it] 10%|████████▍                                                                         | 2083/20117 [1:17:36<11:21:41,  2.27s/it] 10%|████████▍                                                                         | 2084/20117 [1:17:38<11:12:23,  2.24s/it] 10%|████████▍                                                                         | 2085/20117 [1:17:41<11:10:15,  2.23s/it] 10%|████████▌                                                                         | 2086/20117 [1:17:43<11:29:05,  2.29s/it] 10%|████████▌                                                                         | 2087/20117 [1:17:45<11:33:15,  2.31s/it] 10%|████████▌                                                                         | 2088/20117 [1:17:48<11:39:09,  2.33s/it] 10%|████████▌                                                                         | 2089/20117 [1:17:50<11:33:21,  2.31s/it] 10%|████████▌                                                                         | 2090/20117 [1:17:52<11:27:27,  2.29s/it]                                                                                                                                 {'loss': 0.2528, 'grad_norm': 0.31284740567207336, 'learning_rate': 0.0001951670564612397, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 358.01, 'epoch': 0.21}
 10%|████████▌                                                                         | 2090/20117 [1:17:52<11:27:27,  2.29s/it] 10%|████████▌                                                                         | 2091/20117 [1:17:55<11:27:08,  2.29s/it] 10%|████████▌                                                                         | 2092/20117 [1:17:57<11:22:05,  2.27s/it] 10%|████████▌                                                                         | 2093/20117 [1:17:59<11:18:44,  2.26s/it] 10%|████████▌                                                                         | 2094/20117 [1:18:01<11:22:53,  2.27s/it] 10%|████████▌                                                                         | 2095/20117 [1:18:04<11:21:12,  2.27s/it] 10%|████████▌                                                                         | 2096/20117 [1:18:06<11:18:56,  2.26s/it] 10%|████████▌                                                                         | 2097/20117 [1:18:08<11:21:44,  2.27s/it] 10%|████████▌                                                                         | 2098/20117 [1:18:10<11:20:05,  2.26s/it] 10%|████████▌                                                                         | 2099/20117 [1:18:13<11:24:20,  2.28s/it] 10%|████████▌                                                                         | 2100/20117 [1:18:15<11:30:57,  2.30s/it]                                                                                                                                 {'loss': 0.2753, 'grad_norm': 0.1479184776544571, 'learning_rate': 0.0001951187378341299, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 373.8, 'epoch': 0.21}
 10%|████████▌                                                                         | 2100/20117 [1:18:15<11:30:57,  2.30s/it] 10%|████████▌                                                                         | 2101/20117 [1:18:17<11:34:14,  2.31s/it] 10%|████████▌                                                                         | 2102/20117 [1:18:20<11:27:14,  2.29s/it] 10%|████████▌                                                                         | 2103/20117 [1:18:22<11:29:09,  2.30s/it] 10%|████████▌                                                                         | 2104/20117 [1:18:24<11:31:04,  2.30s/it] 10%|████████▌                                                                         | 2105/20117 [1:18:27<11:31:12,  2.30s/it] 10%|████████▌                                                                         | 2106/20117 [1:18:29<12:00:34,  2.40s/it] 10%|████████▌                                                                         | 2107/20117 [1:18:32<12:00:22,  2.40s/it] 10%|████████▌                                                                         | 2108/20117 [1:18:34<11:44:35,  2.35s/it] 10%|████████▌                                                                         | 2109/20117 [1:18:36<11:39:53,  2.33s/it] 10%|████████▌                                                                         | 2110/20117 [1:18:38<11:29:47,  2.30s/it]                                                                                                                                 {'loss': 0.2799, 'grad_norm': 0.35812532901763916, 'learning_rate': 0.00019507018490946503, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 394.97, 'epoch': 0.21}
 10%|████████▌                                                                         | 2110/20117 [1:18:38<11:29:47,  2.30s/it] 10%|████████▌                                                                         | 2111/20117 [1:18:41<11:30:00,  2.30s/it] 10%|████████▌                                                                         | 2112/20117 [1:18:43<11:26:29,  2.29s/it] 11%|████████▌                                                                         | 2113/20117 [1:18:45<11:15:19,  2.25s/it] 11%|████████▌                                                                         | 2114/20117 [1:18:47<11:10:27,  2.23s/it] 11%|████████▌                                                                         | 2115/20117 [1:18:50<11:10:56,  2.24s/it] 11%|████████▋                                                                         | 2116/20117 [1:18:52<11:09:33,  2.23s/it] 11%|████████▋                                                                         | 2117/20117 [1:18:54<11:07:24,  2.22s/it] 11%|████████▋                                                                         | 2118/20117 [1:18:56<11:05:37,  2.22s/it] 11%|████████▋                                                                         | 2119/20117 [1:18:58<11:05:59,  2.22s/it] 11%|████████▋                                                                         | 2120/20117 [1:19:01<11:08:18,  2.23s/it]                                                                                                                                 {'loss': 0.2785, 'grad_norm': 0.39899322390556335, 'learning_rate': 0.00019502139780684118, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 371.01, 'epoch': 0.21}
 11%|████████▋                                                                         | 2120/20117 [1:19:01<11:08:18,  2.23s/it] 11%|████████▋                                                                         | 2121/20117 [1:19:03<11:07:11,  2.22s/it] 11%|████████▋                                                                         | 2122/20117 [1:19:05<11:06:05,  2.22s/it] 11%|████████▋                                                                         | 2123/20117 [1:19:07<11:05:22,  2.22s/it] 11%|████████▋                                                                         | 2124/20117 [1:19:09<11:05:06,  2.22s/it] 11%|████████▋                                                                         | 2125/20117 [1:19:12<11:07:11,  2.22s/it] 11%|████████▋                                                                         | 2126/20117 [1:19:14<11:05:52,  2.22s/it] 11%|████████▋                                                                         | 2127/20117 [1:19:16<11:04:40,  2.22s/it] 11%|████████▋                                                                         | 2128/20117 [1:19:18<11:07:04,  2.22s/it] 11%|████████▋                                                                         | 2129/20117 [1:19:21<11:04:45,  2.22s/it] 11%|████████▋                                                                         | 2130/20117 [1:19:23<11:04:21,  2.22s/it]                                                                                                                                 {'loss': 0.2985, 'grad_norm': 0.30708786845207214, 'learning_rate': 0.00019497237664643132, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 393.88, 'epoch': 0.21}
 11%|████████▋                                                                         | 2130/20117 [1:19:23<11:04:21,  2.22s/it] 11%|████████▋                                                                         | 2131/20117 [1:19:25<11:16:56,  2.26s/it] 11%|████████▋                                                                         | 2132/20117 [1:19:27<11:18:23,  2.26s/it] 11%|████████▋                                                                         | 2133/20117 [1:19:30<11:17:24,  2.26s/it] 11%|████████▋                                                                         | 2134/20117 [1:19:32<11:18:35,  2.26s/it] 11%|████████▋                                                                         | 2135/20117 [1:19:34<11:14:06,  2.25s/it] 11%|████████▋                                                                         | 2136/20117 [1:19:36<11:12:28,  2.24s/it] 11%|████████▋                                                                         | 2137/20117 [1:19:39<11:10:06,  2.24s/it] 11%|████████▋                                                                         | 2138/20117 [1:19:41<11:08:29,  2.23s/it] 11%|████████▋                                                                         | 2139/20117 [1:19:43<11:13:37,  2.25s/it] 11%|████████▋                                                                         | 2140/20117 [1:19:45<11:10:35,  2.24s/it]                                                                                                                                 {'loss': 0.2661, 'grad_norm': 0.280734658241272, 'learning_rate': 0.00019492312154898488, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 401.57, 'epoch': 0.21}
 11%|████████▋                                                                         | 2140/20117 [1:19:45<11:10:35,  2.24s/it] 11%|████████▋                                                                         | 2141/20117 [1:19:48<11:08:01,  2.23s/it] 11%|████████▋                                                                         | 2142/20117 [1:19:50<11:11:11,  2.24s/it] 11%|████████▋                                                                         | 2143/20117 [1:19:52<11:15:28,  2.25s/it] 11%|████████▋                                                                         | 2144/20117 [1:19:54<11:16:57,  2.26s/it] 11%|████████▋                                                                         | 2145/20117 [1:19:57<11:19:52,  2.27s/it] 11%|████████▋                                                                         | 2146/20117 [1:19:59<11:25:10,  2.29s/it] 11%|████████▊                                                                         | 2147/20117 [1:20:01<11:25:40,  2.29s/it] 11%|████████▊                                                                         | 2148/20117 [1:20:04<11:20:09,  2.27s/it] 11%|████████▊                                                                         | 2149/20117 [1:20:06<11:16:55,  2.26s/it] 11%|████████▊                                                                         | 2150/20117 [1:20:08<11:24:18,  2.29s/it]                                                                                                                                 {'loss': 0.2197, 'grad_norm': 0.19114673137664795, 'learning_rate': 0.00019487363263582765, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 346.18, 'epoch': 0.21}
 11%|████████▊                                                                         | 2150/20117 [1:20:08<11:24:18,  2.29s/it] 11%|████████▊                                                                         | 2151/20117 [1:20:10<11:21:17,  2.28s/it] 11%|████████▊                                                                         | 2152/20117 [1:20:13<11:19:44,  2.27s/it] 11%|████████▊                                                                         | 2153/20117 [1:20:15<11:13:15,  2.25s/it] 11%|████████▊                                                                         | 2154/20117 [1:20:17<11:08:37,  2.23s/it] 11%|████████▊                                                                         | 2155/20117 [1:20:19<11:07:15,  2.23s/it] 11%|████████▊                                                                         | 2156/20117 [1:20:21<11:05:32,  2.22s/it] 11%|████████▊                                                                         | 2157/20117 [1:20:24<11:34:06,  2.32s/it] 11%|████████▊                                                                         | 2158/20117 [1:20:26<11:27:21,  2.30s/it] 11%|████████▊                                                                         | 2159/20117 [1:20:28<11:24:29,  2.29s/it] 11%|████████▊                                                                         | 2160/20117 [1:20:31<11:22:06,  2.28s/it]                                                                                                                                 {'loss': 0.2724, 'grad_norm': 0.5506372451782227, 'learning_rate': 0.00019482391002886122, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 281.19, 'epoch': 0.21}
 11%|████████▊                                                                         | 2160/20117 [1:20:31<11:22:06,  2.28s/it] 11%|████████▊                                                                         | 2161/20117 [1:20:33<11:20:01,  2.27s/it] 11%|████████▊                                                                         | 2162/20117 [1:20:35<11:11:55,  2.25s/it] 11%|████████▊                                                                         | 2163/20117 [1:20:37<11:14:47,  2.26s/it] 11%|████████▊                                                                         | 2164/20117 [1:20:40<11:12:43,  2.25s/it] 11%|████████▊                                                                         | 2165/20117 [1:20:42<11:07:23,  2.23s/it] 11%|████████▊                                                                         | 2166/20117 [1:20:44<11:19:53,  2.27s/it] 11%|████████▊                                                                         | 2167/20117 [1:20:47<11:21:52,  2.28s/it] 11%|████████▊                                                                         | 2168/20117 [1:20:49<11:15:28,  2.26s/it] 11%|████████▊                                                                         | 2169/20117 [1:20:51<11:21:56,  2.28s/it] 11%|████████▊                                                                         | 2170/20117 [1:20:53<11:19:04,  2.27s/it]                                                                                                                                 {'loss': 0.2758, 'grad_norm': 0.3187256157398224, 'learning_rate': 0.0001947739538505629, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 405.51, 'epoch': 0.22}
 11%|████████▊                                                                         | 2170/20117 [1:20:53<11:19:04,  2.27s/it] 11%|████████▊                                                                         | 2171/20117 [1:20:56<11:17:55,  2.27s/it] 11%|████████▊                                                                         | 2172/20117 [1:20:58<11:23:04,  2.28s/it] 11%|████████▊                                                                         | 2173/20117 [1:21:00<11:20:13,  2.27s/it] 11%|████████▊                                                                         | 2174/20117 [1:21:02<11:13:34,  2.25s/it] 11%|████████▊                                                                         | 2175/20117 [1:21:05<11:12:15,  2.25s/it] 11%|████████▊                                                                         | 2176/20117 [1:21:07<11:11:44,  2.25s/it] 11%|████████▊                                                                         | 2177/20117 [1:21:09<11:08:58,  2.24s/it] 11%|████████▉                                                                         | 2178/20117 [1:21:11<11:04:48,  2.22s/it] 11%|████████▉                                                                         | 2179/20117 [1:21:14<11:17:58,  2.27s/it] 11%|████████▉                                                                         | 2180/20117 [1:21:16<11:18:52,  2.27s/it]                                                                                                                                 {'loss': 0.2792, 'grad_norm': 0.4344545602798462, 'learning_rate': 0.00019472376422398528, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 313.61, 'epoch': 0.22}
 11%|████████▉                                                                         | 2180/20117 [1:21:16<11:18:52,  2.27s/it] 11%|████████▉                                                                         | 2181/20117 [1:21:18<11:11:29,  2.25s/it] 11%|████████▉                                                                         | 2182/20117 [1:21:20<11:22:23,  2.28s/it] 11%|████████▉                                                                         | 2183/20117 [1:21:23<11:14:17,  2.26s/it] 11%|████████▉                                                                         | 2184/20117 [1:21:25<11:19:43,  2.27s/it] 11%|████████▉                                                                         | 2185/20117 [1:21:27<11:17:26,  2.27s/it] 11%|████████▉                                                                         | 2186/20117 [1:21:29<11:11:42,  2.25s/it] 11%|████████▉                                                                         | 2187/20117 [1:21:32<11:09:15,  2.24s/it] 11%|████████▉                                                                         | 2188/20117 [1:21:34<11:07:50,  2.23s/it] 11%|████████▉                                                                         | 2189/20117 [1:21:36<11:06:04,  2.23s/it] 11%|████████▉                                                                         | 2190/20117 [1:21:38<11:02:54,  2.22s/it]                                                                                                                                 {'loss': 0.2474, 'grad_norm': 0.4564478099346161, 'learning_rate': 0.00019467334127275606, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 328.19, 'epoch': 0.22}
 11%|████████▉                                                                         | 2190/20117 [1:21:38<11:02:54,  2.22s/it] 11%|████████▉                                                                         | 2191/20117 [1:21:40<11:00:41,  2.21s/it] 11%|████████▉                                                                         | 2192/20117 [1:21:43<11:03:12,  2.22s/it] 11%|████████▉                                                                         | 2193/20117 [1:21:45<11:05:08,  2.23s/it] 11%|████████▉                                                                         | 2194/20117 [1:21:47<11:15:26,  2.26s/it] 11%|████████▉                                                                         | 2195/20117 [1:21:49<11:10:07,  2.24s/it] 11%|████████▉                                                                         | 2196/20117 [1:21:52<11:08:45,  2.24s/it] 11%|████████▉                                                                         | 2197/20117 [1:21:54<11:05:29,  2.23s/it] 11%|████████▉                                                                         | 2198/20117 [1:21:56<11:10:17,  2.24s/it] 11%|████████▉                                                                         | 2199/20117 [1:21:58<11:08:52,  2.24s/it] 11%|████████▉                                                                         | 2200/20117 [1:22:01<11:18:27,  2.27s/it]                                                                                                                                 {'loss': 0.2877, 'grad_norm': 0.25979048013687134, 'learning_rate': 0.00019462268512107766, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 360.33, 'epoch': 0.22}
 11%|████████▉                                                                         | 2200/20117 [1:22:01<11:18:27,  2.27s/it] 11%|████████▉                                                                         | 2201/20117 [1:22:03<11:18:26,  2.27s/it] 11%|████████▉                                                                         | 2202/20117 [1:22:05<11:16:09,  2.26s/it] 11%|████████▉                                                                         | 2203/20117 [1:22:08<11:11:32,  2.25s/it] 11%|████████▉                                                                         | 2204/20117 [1:22:10<11:10:29,  2.25s/it] 11%|████████▉                                                                         | 2205/20117 [1:22:12<11:09:10,  2.24s/it] 11%|████████▉                                                                         | 2206/20117 [1:22:14<11:10:18,  2.25s/it] 11%|████████▉                                                                         | 2207/20117 [1:22:17<11:12:39,  2.25s/it] 11%|█████████                                                                         | 2208/20117 [1:22:19<11:14:34,  2.26s/it] 11%|█████████                                                                         | 2209/20117 [1:22:21<11:24:41,  2.29s/it] 11%|█████████                                                                         | 2210/20117 [1:22:23<11:19:32,  2.28s/it]                                                                                                                                 {'loss': 0.3336, 'grad_norm': 0.3676232397556305, 'learning_rate': 0.00019457179589372684, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 368.21, 'epoch': 0.22}
 11%|█████████                                                                         | 2210/20117 [1:22:23<11:19:32,  2.28s/it] 11%|█████████                                                                         | 2211/20117 [1:22:26<11:51:59,  2.39s/it] 11%|█████████                                                                         | 2212/20117 [1:22:28<11:45:39,  2.36s/it] 11%|█████████                                                                         | 2213/20117 [1:22:31<11:41:57,  2.35s/it] 11%|█████████                                                                         | 2214/20117 [1:22:33<11:35:45,  2.33s/it] 11%|█████████                                                                         | 2215/20117 [1:22:35<11:29:11,  2.31s/it] 11%|█████████                                                                         | 2216/20117 [1:22:38<11:28:06,  2.31s/it] 11%|█████████                                                                         | 2217/20117 [1:22:40<11:24:04,  2.29s/it] 11%|█████████                                                                         | 2218/20117 [1:22:42<11:20:45,  2.28s/it] 11%|█████████                                                                         | 2219/20117 [1:22:44<11:16:25,  2.27s/it] 11%|█████████                                                                         | 2220/20117 [1:22:47<11:21:15,  2.28s/it]                                                                                                                                 {'loss': 0.253, 'grad_norm': 0.7130278944969177, 'learning_rate': 0.0001945206737160545, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 305.12, 'epoch': 0.22}
 11%|█████████                                                                         | 2220/20117 [1:22:47<11:21:15,  2.28s/it] 11%|█████████                                                                         | 2221/20117 [1:22:49<11:18:05,  2.27s/it] 11%|█████████                                                                         | 2222/20117 [1:22:51<11:16:00,  2.27s/it] 11%|█████████                                                                         | 2223/20117 [1:22:53<11:18:39,  2.28s/it] 11%|█████████                                                                         | 2224/20117 [1:22:56<11:15:11,  2.26s/it] 11%|█████████                                                                         | 2225/20117 [1:22:58<11:19:42,  2.28s/it] 11%|█████████                                                                         | 2226/20117 [1:23:00<11:17:14,  2.27s/it] 11%|█████████                                                                         | 2227/20117 [1:23:02<11:16:40,  2.27s/it] 11%|█████████                                                                         | 2228/20117 [1:23:05<11:23:36,  2.29s/it] 11%|█████████                                                                         | 2229/20117 [1:23:07<11:15:18,  2.27s/it] 11%|█████████                                                                         | 2230/20117 [1:23:09<11:17:32,  2.27s/it]                                                                                                                                 {'loss': 0.3069, 'grad_norm': 2.660079002380371, 'learning_rate': 0.0001944693187139854, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 337.93, 'epoch': 0.22}
 11%|█████████                                                                         | 2230/20117 [1:23:09<11:17:32,  2.27s/it] 11%|█████████                                                                         | 2231/20117 [1:23:12<11:20:45,  2.28s/it] 11%|█████████                                                                         | 2232/20117 [1:23:14<11:17:44,  2.27s/it] 11%|█████████                                                                         | 2233/20117 [1:23:16<11:18:14,  2.28s/it] 11%|█████████                                                                         | 2234/20117 [1:23:18<11:15:15,  2.27s/it] 11%|█████████                                                                         | 2235/20117 [1:23:21<11:13:57,  2.26s/it] 11%|█████████                                                                         | 2236/20117 [1:23:23<11:10:20,  2.25s/it] 11%|█████████                                                                         | 2237/20117 [1:23:25<11:04:14,  2.23s/it] 11%|█████████                                                                         | 2238/20117 [1:23:27<11:02:32,  2.22s/it] 11%|█████████▏                                                                        | 2239/20117 [1:23:30<11:09:21,  2.25s/it] 11%|█████████▏                                                                        | 2240/20117 [1:23:32<11:14:26,  2.26s/it]                                                                                                                                 {'loss': 0.2744, 'grad_norm': 0.2822214663028717, 'learning_rate': 0.00019441773101401777, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 349.76, 'epoch': 0.22}
 11%|█████████▏                                                                        | 2240/20117 [1:23:32<11:14:26,  2.26s/it] 11%|█████████▏                                                                        | 2241/20117 [1:23:34<11:10:32,  2.25s/it] 11%|█████████▏                                                                        | 2242/20117 [1:23:36<11:15:29,  2.27s/it] 11%|█████████▏                                                                        | 2243/20117 [1:23:39<11:18:47,  2.28s/it] 11%|█████████▏                                                                        | 2244/20117 [1:23:41<11:17:39,  2.27s/it] 11%|█████████▏                                                                        | 2245/20117 [1:23:43<11:13:47,  2.26s/it] 11%|█████████▏                                                                        | 2246/20117 [1:23:45<11:10:44,  2.25s/it] 11%|█████████▏                                                                        | 2247/20117 [1:23:48<11:11:02,  2.25s/it] 11%|█████████▏                                                                        | 2248/20117 [1:23:50<11:08:54,  2.25s/it] 11%|█████████▏                                                                        | 2249/20117 [1:23:52<11:10:33,  2.25s/it] 11%|█████████▏                                                                        | 2250/20117 [1:23:54<11:06:01,  2.24s/it]                                                                                                                                 {'loss': 0.245, 'grad_norm': 0.45128756761550903, 'learning_rate': 0.00019436591074322302, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 390.72, 'epoch': 0.22}
 11%|█████████▏                                                                        | 2250/20117 [1:23:54<11:06:01,  2.24s/it] 11%|█████████▏                                                                        | 2251/20117 [1:23:57<11:00:55,  2.22s/it] 11%|█████████▏                                                                        | 2252/20117 [1:23:59<10:59:48,  2.22s/it] 11%|█████████▏                                                                        | 2253/20117 [1:24:01<11:05:59,  2.24s/it] 11%|█████████▏                                                                        | 2254/20117 [1:24:03<10:57:20,  2.21s/it] 11%|█████████▏                                                                        | 2255/20117 [1:24:05<10:54:38,  2.20s/it] 11%|█████████▏                                                                        | 2256/20117 [1:24:07<10:49:35,  2.18s/it] 11%|█████████▏                                                                        | 2257/20117 [1:24:10<10:49:13,  2.18s/it] 11%|█████████▏                                                                        | 2258/20117 [1:24:12<10:54:37,  2.20s/it] 11%|█████████▏                                                                        | 2259/20117 [1:24:14<11:11:20,  2.26s/it] 11%|█████████▏                                                                        | 2260/20117 [1:24:17<11:39:36,  2.35s/it]                                                                                                                                 {'loss': 0.2625, 'grad_norm': 0.2821448743343353, 'learning_rate': 0.00019431385802924539, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 321.11, 'epoch': 0.22}
 11%|█████████▏                                                                        | 2260/20117 [1:24:17<11:39:36,  2.35s/it] 11%|█████████▏                                                                        | 2261/20117 [1:24:19<11:32:39,  2.33s/it] 11%|█████████▏                                                                        | 2262/20117 [1:24:21<11:26:59,  2.31s/it] 11%|█████████▏                                                                        | 2263/20117 [1:24:24<11:57:17,  2.41s/it] 11%|█████████▏                                                                        | 2264/20117 [1:24:26<11:42:57,  2.36s/it] 11%|█████████▏                                                                        | 2265/20117 [1:24:29<11:39:26,  2.35s/it] 11%|█████████▏                                                                        | 2266/20117 [1:24:31<11:24:50,  2.30s/it] 11%|█████████▏                                                                        | 2267/20117 [1:24:33<11:12:30,  2.26s/it] 11%|█████████▏                                                                        | 2268/20117 [1:24:35<11:03:41,  2.23s/it] 11%|█████████▏                                                                        | 2269/20117 [1:24:37<11:00:21,  2.22s/it] 11%|█████████▎                                                                        | 2270/20117 [1:24:40<11:10:38,  2.25s/it]                                                                                                                                 {'loss': 0.2116, 'grad_norm': 0.3787819445133209, 'learning_rate': 0.00019426157300030176, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 345.13, 'epoch': 0.23}
 11%|█████████▎                                                                        | 2270/20117 [1:24:40<11:10:38,  2.25s/it] 11%|█████████▎                                                                        | 2271/20117 [1:24:42<11:19:55,  2.29s/it] 11%|█████████▎                                                                        | 2272/20117 [1:24:44<11:21:45,  2.29s/it] 11%|█████████▎                                                                        | 2273/20117 [1:24:47<11:19:32,  2.28s/it] 11%|█████████▎                                                                        | 2274/20117 [1:24:49<11:17:29,  2.28s/it] 11%|█████████▎                                                                        | 2275/20117 [1:24:51<11:16:21,  2.27s/it] 11%|█████████▎                                                                        | 2276/20117 [1:24:53<11:17:15,  2.28s/it] 11%|█████████▎                                                                        | 2277/20117 [1:24:56<11:14:36,  2.27s/it] 11%|█████████▎                                                                        | 2278/20117 [1:24:58<11:16:33,  2.28s/it] 11%|█████████▎                                                                        | 2279/20117 [1:25:00<11:11:19,  2.26s/it] 11%|█████████▎                                                                        | 2280/20117 [1:25:02<11:11:58,  2.26s/it]                                                                                                                                 {'loss': 0.2618, 'grad_norm': 0.36587730050086975, 'learning_rate': 0.0001942090557851812, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 288.26, 'epoch': 0.23}
 11%|█████████▎                                                                        | 2280/20117 [1:25:02<11:11:58,  2.26s/it] 11%|█████████▎                                                                        | 2281/20117 [1:25:05<11:18:24,  2.28s/it] 11%|█████████▎                                                                        | 2282/20117 [1:25:07<11:12:41,  2.26s/it] 11%|█████████▎                                                                        | 2283/20117 [1:25:09<11:08:28,  2.25s/it] 11%|█████████▎                                                                        | 2284/20117 [1:25:11<11:09:28,  2.25s/it] 11%|█████████▎                                                                        | 2285/20117 [1:25:14<11:08:24,  2.25s/it] 11%|█████████▎                                                                        | 2286/20117 [1:25:16<11:02:18,  2.23s/it] 11%|█████████▎                                                                        | 2287/20117 [1:25:18<11:05:42,  2.24s/it] 11%|█████████▎                                                                        | 2288/20117 [1:25:20<11:02:19,  2.23s/it] 11%|█████████▎                                                                        | 2289/20117 [1:25:23<11:01:50,  2.23s/it] 11%|█████████▎                                                                        | 2290/20117 [1:25:25<11:03:45,  2.23s/it]                                                                                                                                 {'loss': 0.307, 'grad_norm': 0.5559653639793396, 'learning_rate': 0.0001941563065132447, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 402.27, 'epoch': 0.23}
 11%|█████████▎                                                                        | 2290/20117 [1:25:25<11:03:45,  2.23s/it] 11%|█████████▎                                                                        | 2291/20117 [1:25:27<11:04:10,  2.24s/it] 11%|█████████▎                                                                        | 2292/20117 [1:25:29<11:05:21,  2.24s/it] 11%|█████████▎                                                                        | 2293/20117 [1:25:32<10:59:56,  2.22s/it] 11%|█████████▎                                                                        | 2294/20117 [1:25:34<11:00:40,  2.22s/it] 11%|█████████▎                                                                        | 2295/20117 [1:25:36<11:09:18,  2.25s/it] 11%|█████████▎                                                                        | 2296/20117 [1:25:38<11:18:01,  2.28s/it] 11%|█████████▎                                                                        | 2297/20117 [1:25:41<11:16:03,  2.28s/it] 11%|█████████▎                                                                        | 2298/20117 [1:25:43<11:11:20,  2.26s/it] 11%|█████████▎                                                                        | 2299/20117 [1:25:45<11:03:46,  2.24s/it] 11%|█████████▍                                                                        | 2300/20117 [1:25:47<11:04:46,  2.24s/it]                                                                                                                                 {'loss': 0.2583, 'grad_norm': 0.26463228464126587, 'learning_rate': 0.0001941033253144249, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 388.95, 'epoch': 0.23}
 11%|█████████▍                                                                        | 2300/20117 [1:25:47<11:04:46,  2.24s/it] 11%|█████████▍                                                                        | 2301/20117 [1:25:50<11:05:06,  2.24s/it] 11%|█████████▍                                                                        | 2302/20117 [1:25:52<11:05:15,  2.24s/it] 11%|█████████▍                                                                        | 2303/20117 [1:25:54<11:06:21,  2.24s/it] 11%|█████████▍                                                                        | 2304/20117 [1:25:56<11:02:30,  2.23s/it] 11%|█████████▍                                                                        | 2305/20117 [1:25:59<11:05:03,  2.24s/it] 11%|█████████▍                                                                        | 2306/20117 [1:26:01<11:06:55,  2.25s/it] 11%|█████████▍                                                                        | 2307/20117 [1:26:03<11:05:45,  2.24s/it] 11%|█████████▍                                                                        | 2308/20117 [1:26:05<10:59:35,  2.22s/it] 11%|█████████▍                                                                        | 2309/20117 [1:26:07<10:55:50,  2.21s/it] 11%|█████████▍                                                                        | 2310/20117 [1:26:10<10:57:49,  2.22s/it]                                                                                                                                 {'loss': 0.2671, 'grad_norm': 0.330695778131485, 'learning_rate': 0.0001940501123192256, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 353.27, 'epoch': 0.23}
 11%|█████████▍                                                                        | 2310/20117 [1:26:10<10:57:49,  2.22s/it] 11%|█████████▍                                                                        | 2311/20117 [1:26:12<10:54:14,  2.20s/it] 11%|█████████▍                                                                        | 2312/20117 [1:26:14<10:51:13,  2.19s/it] 11%|█████████▍                                                                        | 2313/20117 [1:26:16<10:53:19,  2.20s/it] 12%|█████████▍                                                                        | 2314/20117 [1:26:18<10:54:30,  2.21s/it] 12%|█████████▍                                                                        | 2315/20117 [1:26:21<11:04:13,  2.24s/it] 12%|█████████▍                                                                        | 2316/20117 [1:26:23<11:23:23,  2.30s/it] 12%|█████████▍                                                                        | 2317/20117 [1:26:25<11:15:32,  2.28s/it] 12%|█████████▍                                                                        | 2318/20117 [1:26:28<11:09:16,  2.26s/it] 12%|█████████▍                                                                        | 2319/20117 [1:26:30<11:06:43,  2.25s/it] 12%|█████████▍                                                                        | 2320/20117 [1:26:32<11:08:55,  2.26s/it]                                                                                                                                 {'loss': 0.2023, 'grad_norm': 0.3159751892089844, 'learning_rate': 0.00019399666765872176, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 347.05, 'epoch': 0.23}
 12%|█████████▍                                                                        | 2320/20117 [1:26:32<11:08:55,  2.26s/it] 12%|█████████▍                                                                        | 2321/20117 [1:26:34<11:03:24,  2.24s/it] 12%|█████████▍                                                                        | 2322/20117 [1:26:36<11:00:29,  2.23s/it] 12%|█████████▍                                                                        | 2323/20117 [1:26:39<10:57:43,  2.22s/it] 12%|█████████▍                                                                        | 2324/20117 [1:26:41<10:54:36,  2.21s/it] 12%|█████████▍                                                                        | 2325/20117 [1:26:43<10:57:04,  2.22s/it] 12%|█████████▍                                                                        | 2326/20117 [1:26:45<10:52:31,  2.20s/it] 12%|█████████▍                                                                        | 2327/20117 [1:26:47<10:50:14,  2.19s/it] 12%|█████████▍                                                                        | 2328/20117 [1:26:50<10:51:30,  2.20s/it] 12%|█████████▍                                                                        | 2329/20117 [1:26:52<10:51:48,  2.20s/it] 12%|█████████▍                                                                        | 2330/20117 [1:26:54<10:47:44,  2.19s/it]                                                                                                                                 {'loss': 0.2633, 'grad_norm': 0.4762006103992462, 'learning_rate': 0.0001939429914645588, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 292.93, 'epoch': 0.23}
 12%|█████████▍                                                                        | 2330/20117 [1:26:54<10:47:44,  2.19s/it] 12%|█████████▌                                                                        | 2331/20117 [1:26:56<10:47:00,  2.18s/it] 12%|█████████▌                                                                        | 2332/20117 [1:26:58<10:55:47,  2.21s/it] 12%|█████████▌                                                                        | 2333/20117 [1:27:01<10:52:36,  2.20s/it] 12%|█████████▌                                                                        | 2334/20117 [1:27:03<10:56:00,  2.21s/it] 12%|█████████▌                                                                        | 2335/20117 [1:27:05<10:56:13,  2.21s/it] 12%|█████████▌                                                                        | 2336/20117 [1:27:07<11:02:55,  2.24s/it] 12%|█████████▌                                                                        | 2337/20117 [1:27:10<11:05:19,  2.25s/it] 12%|█████████▌                                                                        | 2338/20117 [1:27:12<11:07:29,  2.25s/it] 12%|█████████▌                                                                        | 2339/20117 [1:27:14<11:03:47,  2.24s/it] 12%|█████████▌                                                                        | 2340/20117 [1:27:16<11:01:57,  2.23s/it]                                                                                                                                 {'loss': 0.2381, 'grad_norm': 0.47802531719207764, 'learning_rate': 0.00019388908386895254, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 366.87, 'epoch': 0.23}
 12%|█████████▌                                                                        | 2340/20117 [1:27:16<11:01:57,  2.23s/it] 12%|█████████▌                                                                        | 2341/20117 [1:27:19<11:02:14,  2.24s/it] 12%|█████████▌                                                                        | 2342/20117 [1:27:21<10:55:34,  2.21s/it] 12%|█████████▌                                                                        | 2343/20117 [1:27:23<10:55:02,  2.21s/it] 12%|█████████▌                                                                        | 2344/20117 [1:27:25<10:53:11,  2.21s/it] 12%|█████████▌                                                                        | 2345/20117 [1:27:27<10:55:22,  2.21s/it] 12%|█████████▌                                                                        | 2346/20117 [1:27:30<10:54:36,  2.21s/it] 12%|█████████▌                                                                        | 2347/20117 [1:27:32<10:54:35,  2.21s/it] 12%|█████████▌                                                                        | 2348/20117 [1:27:34<10:55:20,  2.21s/it] 12%|█████████▌                                                                        | 2349/20117 [1:27:36<10:53:14,  2.21s/it] 12%|█████████▌                                                                        | 2350/20117 [1:27:38<10:51:43,  2.20s/it]                                                                                                                                 {'loss': 0.3052, 'grad_norm': 0.4501783847808838, 'learning_rate': 0.00019383494500468883, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 412.56, 'epoch': 0.23}
 12%|█████████▌                                                                        | 2350/20117 [1:27:38<10:51:43,  2.20s/it] 12%|█████████▌                                                                        | 2351/20117 [1:27:41<10:49:29,  2.19s/it] 12%|█████████▌                                                                        | 2352/20117 [1:27:43<10:51:42,  2.20s/it] 12%|█████████▌                                                                        | 2353/20117 [1:27:45<10:56:57,  2.22s/it] 12%|█████████▌                                                                        | 2354/20117 [1:27:47<10:56:09,  2.22s/it] 12%|█████████▌                                                                        | 2355/20117 [1:27:49<10:53:24,  2.21s/it] 12%|█████████▌                                                                        | 2356/20117 [1:27:52<10:51:59,  2.20s/it] 12%|█████████▌                                                                        | 2357/20117 [1:27:54<10:50:58,  2.20s/it] 12%|█████████▌                                                                        | 2358/20117 [1:27:56<10:50:32,  2.20s/it] 12%|█████████▌                                                                        | 2359/20117 [1:27:58<10:48:01,  2.19s/it] 12%|█████████▌                                                                        | 2360/20117 [1:28:00<10:53:17,  2.21s/it]                                                                                                                                 {'loss': 0.2495, 'grad_norm': 0.3937671184539795, 'learning_rate': 0.0001937805750051231, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 374.59, 'epoch': 0.23}
 12%|█████████▌                                                                        | 2360/20117 [1:28:00<10:53:17,  2.21s/it] 12%|█████████▌                                                                        | 2361/20117 [1:28:03<10:50:27,  2.20s/it] 12%|█████████▋                                                                        | 2362/20117 [1:28:05<10:51:35,  2.20s/it] 12%|█████████▋                                                                        | 2363/20117 [1:28:07<10:56:26,  2.22s/it] 12%|█████████▋                                                                        | 2364/20117 [1:28:09<10:58:07,  2.22s/it] 12%|█████████▋                                                                        | 2365/20117 [1:28:12<10:57:22,  2.22s/it] 12%|█████████▋                                                                        | 2366/20117 [1:28:14<10:56:59,  2.22s/it] 12%|█████████▋                                                                        | 2367/20117 [1:28:16<10:58:07,  2.22s/it] 12%|█████████▋                                                                        | 2368/20117 [1:28:18<11:02:08,  2.24s/it] 12%|█████████▋                                                                        | 2369/20117 [1:28:21<11:33:56,  2.35s/it] 12%|█████████▋                                                                        | 2370/20117 [1:28:23<11:36:57,  2.36s/it]                                                                                                                                 {'loss': 0.1679, 'grad_norm': 0.2139206975698471, 'learning_rate': 0.00019372597400418019, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 329.45, 'epoch': 0.24}
 12%|█████████▋                                                                        | 2370/20117 [1:28:23<11:36:57,  2.36s/it] 12%|█████████▋                                                                        | 2371/20117 [1:28:25<11:26:54,  2.32s/it] 12%|█████████▋                                                                        | 2372/20117 [1:28:28<11:21:12,  2.30s/it] 12%|█████████▋                                                                        | 2373/20117 [1:28:30<11:17:16,  2.29s/it] 12%|█████████▋                                                                        | 2374/20117 [1:28:32<11:12:24,  2.27s/it] 12%|█████████▋                                                                        | 2375/20117 [1:28:34<11:06:21,  2.25s/it] 12%|█████████▋                                                                        | 2376/20117 [1:28:37<11:03:53,  2.25s/it] 12%|█████████▋                                                                        | 2377/20117 [1:28:39<11:02:06,  2.24s/it] 12%|█████████▋                                                                        | 2378/20117 [1:28:41<10:59:15,  2.23s/it] 12%|█████████▋                                                                        | 2379/20117 [1:28:43<11:01:20,  2.24s/it] 12%|█████████▋                                                                        | 2380/20117 [1:28:46<11:04:55,  2.25s/it]                                                                                                                                 {'loss': 0.2242, 'grad_norm': 0.4658704102039337, 'learning_rate': 0.00019367114213635382, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 302.96, 'epoch': 0.24}
 12%|█████████▋                                                                        | 2380/20117 [1:28:46<11:04:55,  2.25s/it] 12%|█████████▋                                                                        | 2381/20117 [1:28:48<11:10:11,  2.27s/it] 12%|█████████▋                                                                        | 2382/20117 [1:28:50<11:07:55,  2.26s/it] 12%|█████████▋                                                                        | 2383/20117 [1:28:52<11:12:34,  2.28s/it] 12%|█████████▋                                                                        | 2384/20117 [1:28:55<11:09:16,  2.26s/it] 12%|█████████▋                                                                        | 2385/20117 [1:28:57<11:02:41,  2.24s/it] 12%|█████████▋                                                                        | 2386/20117 [1:28:59<11:03:27,  2.25s/it] 12%|█████████▋                                                                        | 2387/20117 [1:29:01<11:00:34,  2.24s/it] 12%|█████████▋                                                                        | 2388/20117 [1:29:04<11:00:38,  2.24s/it] 12%|█████████▋                                                                        | 2389/20117 [1:29:06<11:01:26,  2.24s/it] 12%|█████████▋                                                                        | 2390/20117 [1:29:08<10:56:44,  2.22s/it]                                                                                                                                 {'loss': 0.2632, 'grad_norm': 0.34023401141166687, 'learning_rate': 0.00019361607953670654, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 387.14, 'epoch': 0.24}
 12%|█████████▋                                                                        | 2390/20117 [1:29:08<10:56:44,  2.22s/it] 12%|█████████▋                                                                        | 2391/20117 [1:29:10<11:00:06,  2.23s/it] 12%|█████████▊                                                                        | 2392/20117 [1:29:12<10:55:05,  2.22s/it] 12%|█████████▊                                                                        | 2393/20117 [1:29:15<10:53:38,  2.21s/it] 12%|█████████▊                                                                        | 2394/20117 [1:29:17<10:56:19,  2.22s/it] 12%|█████████▊                                                                        | 2395/20117 [1:29:19<11:01:22,  2.24s/it] 12%|█████████▊                                                                        | 2396/20117 [1:29:21<11:00:56,  2.24s/it] 12%|█████████▊                                                                        | 2397/20117 [1:29:24<10:55:52,  2.22s/it] 12%|█████████▊                                                                        | 2398/20117 [1:29:26<11:01:13,  2.24s/it] 12%|█████████▊                                                                        | 2399/20117 [1:29:28<11:08:38,  2.26s/it] 12%|█████████▊                                                                        | 2400/20117 [1:29:30<11:05:43,  2.25s/it]                                                                                                                                 {'loss': 0.251, 'grad_norm': 0.4804230034351349, 'learning_rate': 0.00019356078634086914, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 410.6, 'epoch': 0.24}
 12%|█████████▊                                                                        | 2400/20117 [1:29:30<11:05:43,  2.25s/it] 12%|█████████▊                                                                        | 2401/20117 [1:29:33<11:03:53,  2.25s/it] 12%|█████████▊                                                                        | 2402/20117 [1:29:35<11:03:03,  2.25s/it] 12%|█████████▊                                                                        | 2403/20117 [1:29:37<11:08:44,  2.27s/it] 12%|█████████▊                                                                        | 2404/20117 [1:29:39<11:04:07,  2.25s/it] 12%|█████████▊                                                                        | 2405/20117 [1:29:42<10:59:26,  2.23s/it] 12%|█████████▊                                                                        | 2406/20117 [1:29:44<10:55:14,  2.22s/it] 12%|█████████▊                                                                        | 2407/20117 [1:29:46<11:04:44,  2.25s/it] 12%|█████████▊                                                                        | 2408/20117 [1:29:48<11:03:24,  2.25s/it] 12%|█████████▊                                                                        | 2409/20117 [1:29:51<11:02:27,  2.24s/it] 12%|█████████▊                                                                        | 2410/20117 [1:29:53<11:01:03,  2.24s/it]                                                                                                                                 {'loss': 0.176, 'grad_norm': 0.32980307936668396, 'learning_rate': 0.00019350526268504048, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 372.52, 'epoch': 0.24}
 12%|█████████▊                                                                        | 2410/20117 [1:29:53<11:01:03,  2.24s/it] 12%|█████████▊                                                                        | 2411/20117 [1:29:55<10:57:46,  2.23s/it] 12%|█████████▊                                                                        | 2412/20117 [1:29:57<10:58:56,  2.23s/it] 12%|█████████▊                                                                        | 2413/20117 [1:30:00<10:56:43,  2.23s/it] 12%|█████████▊                                                                        | 2414/20117 [1:30:02<11:02:48,  2.25s/it] 12%|█████████▊                                                                        | 2415/20117 [1:30:04<11:02:59,  2.25s/it] 12%|█████████▊                                                                        | 2416/20117 [1:30:06<11:01:19,  2.24s/it] 12%|█████████▊                                                                        | 2417/20117 [1:30:08<10:57:08,  2.23s/it] 12%|█████████▊                                                                        | 2418/20117 [1:30:11<11:00:13,  2.24s/it] 12%|█████████▊                                                                        | 2419/20117 [1:30:13<10:59:04,  2.23s/it] 12%|█████████▊                                                                        | 2420/20117 [1:30:15<10:56:09,  2.22s/it]                                                                                                                                 {'loss': 0.2976, 'grad_norm': 0.25920480489730835, 'learning_rate': 0.00019344950870598703, 'memory/max_active (GiB)': 19.83, 'memory/max_allocated (GiB)': 19.83, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 330.47, 'epoch': 0.24}
 12%|█████████▊                                                                        | 2420/20117 [1:30:15<10:56:09,  2.22s/it] 12%|█████████▊                                                                        | 2421/20117 [1:30:17<10:51:05,  2.21s/it] 12%|█████████▊                                                                        | 2422/20117 [1:30:20<10:48:41,  2.20s/it] 12%|█████████▉                                                                        | 2423/20117 [1:30:22<10:54:13,  2.22s/it] 12%|█████████▉                                                                        | 2424/20117 [1:30:24<11:24:22,  2.32s/it] 12%|█████████▉                                                                        | 2425/20117 [1:30:27<11:16:04,  2.29s/it] 12%|█████████▉                                                                        | 2426/20117 [1:30:29<11:10:25,  2.27s/it] 12%|█████████▉                                                                        | 2427/20117 [1:30:31<11:07:28,  2.26s/it] 12%|█████████▉                                                                        | 2428/20117 [1:30:33<11:07:05,  2.26s/it] 12%|█████████▉                                                                        | 2429/20117 [1:30:36<11:11:54,  2.28s/it] 12%|█████████▉                                                                        | 2430/20117 [1:30:38<11:07:45,  2.27s/it]                                                                                                                                 {'loss': 0.2976, 'grad_norm': 0.455616295337677, 'learning_rate': 0.00019339352454104264, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 340.47, 'epoch': 0.24}
 12%|█████████▉                                                                        | 2430/20117 [1:30:38<11:07:45,  2.27s/it] 12%|█████████▉                                                                        | 2431/20117 [1:30:40<11:04:24,  2.25s/it] 12%|█████████▉                                                                        | 2432/20117 [1:30:42<11:11:14,  2.28s/it] 12%|█████████▉                                                                        | 2433/20117 [1:30:45<11:02:24,  2.25s/it] 12%|█████████▉                                                                        | 2434/20117 [1:30:47<10:57:10,  2.23s/it] 12%|█████████▉                                                                        | 2435/20117 [1:30:49<11:00:48,  2.24s/it] 12%|█████████▉                                                                        | 2436/20117 [1:30:51<10:57:25,  2.23s/it] 12%|█████████▉                                                                        | 2437/20117 [1:30:53<10:55:06,  2.22s/it] 12%|█████████▉                                                                        | 2438/20117 [1:30:56<10:49:49,  2.21s/it] 12%|█████████▉                                                                        | 2439/20117 [1:30:58<10:53:40,  2.22s/it] 12%|█████████▉                                                                        | 2440/20117 [1:31:00<10:47:32,  2.20s/it]                                                                                                                                 {'loss': 0.2732, 'grad_norm': 0.3018989562988281, 'learning_rate': 0.00019333731032810812, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 462.35, 'epoch': 0.24}
 12%|█████████▉                                                                        | 2440/20117 [1:31:00<10:47:32,  2.20s/it] 12%|█████████▉                                                                        | 2441/20117 [1:31:02<10:47:37,  2.20s/it] 12%|█████████▉                                                                        | 2442/20117 [1:31:04<10:42:20,  2.18s/it] 12%|█████████▉                                                                        | 2443/20117 [1:31:07<10:42:47,  2.18s/it] 12%|█████████▉                                                                        | 2444/20117 [1:31:09<10:46:43,  2.20s/it] 12%|█████████▉                                                                        | 2445/20117 [1:31:11<11:08:19,  2.27s/it] 12%|█████████▉                                                                        | 2446/20117 [1:31:14<11:11:24,  2.28s/it] 12%|█████████▉                                                                        | 2447/20117 [1:31:16<11:07:55,  2.27s/it] 12%|█████████▉                                                                        | 2448/20117 [1:31:18<11:05:15,  2.26s/it] 12%|█████████▉                                                                        | 2449/20117 [1:31:20<11:05:50,  2.26s/it] 12%|█████████▉                                                                        | 2450/20117 [1:31:23<11:05:46,  2.26s/it]                                                                                                                                 {'loss': 0.2886, 'grad_norm': 0.4818798303604126, 'learning_rate': 0.00019328086620565095, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 323.25, 'epoch': 0.24}
 12%|█████████▉                                                                        | 2450/20117 [1:31:23<11:05:46,  2.26s/it] 12%|█████████▉                                                                        | 2451/20117 [1:31:25<11:10:15,  2.28s/it] 12%|█████████▉                                                                        | 2452/20117 [1:31:27<11:07:22,  2.27s/it] 12%|█████████▉                                                                        | 2453/20117 [1:31:29<11:04:05,  2.26s/it] 12%|██████████                                                                        | 2454/20117 [1:31:32<10:58:05,  2.24s/it] 12%|██████████                                                                        | 2455/20117 [1:31:34<11:00:54,  2.25s/it] 12%|██████████                                                                        | 2456/20117 [1:31:36<10:58:03,  2.24s/it] 12%|██████████                                                                        | 2457/20117 [1:31:38<11:06:57,  2.27s/it] 12%|██████████                                                                        | 2458/20117 [1:31:41<11:12:02,  2.28s/it] 12%|██████████                                                                        | 2459/20117 [1:31:43<11:12:41,  2.29s/it] 12%|██████████                                                                        | 2460/20117 [1:31:45<11:09:08,  2.27s/it]                                                                                                                                 {'loss': 0.3325, 'grad_norm': 0.4339105188846588, 'learning_rate': 0.000193224192312705, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 401.06, 'epoch': 0.24}
 12%|██████████                                                                        | 2460/20117 [1:31:45<11:09:08,  2.27s/it] 12%|██████████                                                                        | 2461/20117 [1:31:47<11:07:28,  2.27s/it] 12%|██████████                                                                        | 2462/20117 [1:31:50<11:10:51,  2.28s/it] 12%|██████████                                                                        | 2463/20117 [1:31:52<11:14:44,  2.29s/it] 12%|██████████                                                                        | 2464/20117 [1:31:54<11:08:52,  2.27s/it] 12%|██████████                                                                        | 2465/20117 [1:31:57<11:09:08,  2.27s/it] 12%|██████████                                                                        | 2466/20117 [1:31:59<11:11:19,  2.28s/it] 12%|██████████                                                                        | 2467/20117 [1:32:01<11:07:26,  2.27s/it] 12%|██████████                                                                        | 2468/20117 [1:32:03<11:01:24,  2.25s/it] 12%|██████████                                                                        | 2469/20117 [1:32:06<11:06:03,  2.26s/it] 12%|██████████                                                                        | 2470/20117 [1:32:08<11:01:53,  2.25s/it]                                                                                                                                 {'loss': 0.2369, 'grad_norm': 0.2657606303691864, 'learning_rate': 0.00019316728878887, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 344.52, 'epoch': 0.25}
 12%|██████████                                                                        | 2470/20117 [1:32:08<11:01:53,  2.25s/it] 12%|██████████                                                                        | 2471/20117 [1:32:10<11:00:57,  2.25s/it] 12%|██████████                                                                        | 2472/20117 [1:32:12<11:02:22,  2.25s/it] 12%|██████████                                                                        | 2473/20117 [1:32:15<11:01:55,  2.25s/it] 12%|██████████                                                                        | 2474/20117 [1:32:17<10:56:16,  2.23s/it] 12%|██████████                                                                        | 2475/20117 [1:32:19<11:04:13,  2.26s/it] 12%|██████████                                                                        | 2476/20117 [1:32:21<11:03:37,  2.26s/it] 12%|██████████                                                                        | 2477/20117 [1:32:24<11:35:30,  2.37s/it] 12%|██████████                                                                        | 2478/20117 [1:32:26<11:23:38,  2.33s/it] 12%|██████████                                                                        | 2479/20117 [1:32:28<11:17:58,  2.31s/it] 12%|██████████                                                                        | 2480/20117 [1:32:31<11:15:20,  2.30s/it]                                                                                                                                 {'loss': 0.2481, 'grad_norm': 0.2487681657075882, 'learning_rate': 0.0001931101557743113, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 338.75, 'epoch': 0.25}
 12%|██████████                                                                        | 2480/20117 [1:32:31<11:15:20,  2.30s/it] 12%|██████████                                                                        | 2481/20117 [1:32:33<11:11:45,  2.29s/it] 12%|██████████                                                                        | 2482/20117 [1:32:35<11:02:27,  2.25s/it] 12%|██████████                                                                        | 2483/20117 [1:32:37<11:01:41,  2.25s/it] 12%|██████████▏                                                                       | 2484/20117 [1:32:40<10:59:50,  2.25s/it] 12%|██████████▏                                                                       | 2485/20117 [1:32:42<10:57:33,  2.24s/it] 12%|██████████▏                                                                       | 2486/20117 [1:32:44<11:04:02,  2.26s/it] 12%|██████████▏                                                                       | 2487/20117 [1:32:46<10:59:37,  2.24s/it] 12%|██████████▏                                                                       | 2488/20117 [1:32:49<11:01:15,  2.25s/it] 12%|██████████▏                                                                       | 2489/20117 [1:32:51<11:03:47,  2.26s/it] 12%|██████████▏                                                                       | 2490/20117 [1:32:53<11:01:00,  2.25s/it]                                                                                                                                 {'loss': 0.2467, 'grad_norm': 0.4219423532485962, 'learning_rate': 0.0001930527934097597, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 357.65, 'epoch': 0.25}
 12%|██████████▏                                                                       | 2490/20117 [1:32:53<11:01:00,  2.25s/it] 12%|██████████▏                                                                       | 2491/20117 [1:32:55<11:06:44,  2.27s/it] 12%|██████████▏                                                                       | 2492/20117 [1:32:58<11:04:42,  2.26s/it] 12%|██████████▏                                                                       | 2493/20117 [1:33:00<10:54:14,  2.23s/it] 12%|██████████▏                                                                       | 2494/20117 [1:33:02<10:52:15,  2.22s/it] 12%|██████████▏                                                                       | 2495/20117 [1:33:04<10:51:46,  2.22s/it] 12%|██████████▏                                                                       | 2496/20117 [1:33:06<10:50:26,  2.21s/it] 12%|██████████▏                                                                       | 2497/20117 [1:33:09<10:56:27,  2.24s/it] 12%|██████████▏                                                                       | 2498/20117 [1:33:11<10:52:18,  2.22s/it] 12%|██████████▏                                                                       | 2499/20117 [1:33:13<10:59:12,  2.25s/it] 12%|██████████▏                                                                       | 2500/20117 [1:33:16<10:58:32,  2.24s/it]                                                                                                                                 {'loss': 0.2844, 'grad_norm': 0.3535912334918976, 'learning_rate': 0.00019299520183651075, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 341.4, 'epoch': 0.25}
 12%|██████████▏                                                                       | 2500/20117 [1:33:16<10:58:32,  2.24s/it] 12%|██████████▏                                                                       | 2501/20117 [1:33:18<10:55:58,  2.23s/it] 12%|██████████▏                                                                       | 2502/20117 [1:33:20<10:53:07,  2.22s/it] 12%|██████████▏                                                                       | 2503/20117 [1:33:22<10:55:52,  2.23s/it] 12%|██████████▏                                                                       | 2504/20117 [1:33:24<10:52:12,  2.22s/it] 12%|██████████▏                                                                       | 2505/20117 [1:33:27<10:56:32,  2.24s/it] 12%|██████████▏                                                                       | 2506/20117 [1:33:29<10:54:58,  2.23s/it] 12%|██████████▏                                                                       | 2507/20117 [1:33:31<10:52:58,  2.22s/it] 12%|██████████▏                                                                       | 2508/20117 [1:33:33<10:52:49,  2.22s/it] 12%|██████████▏                                                                       | 2509/20117 [1:33:36<10:56:41,  2.24s/it] 12%|██████████▏                                                                       | 2510/20117 [1:33:38<11:06:39,  2.27s/it]                                                                                                                                 {'loss': 0.2335, 'grad_norm': 0.28509521484375, 'learning_rate': 0.0001929373811964247, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 325.61, 'epoch': 0.25}
 12%|██████████▏                                                                       | 2510/20117 [1:33:38<11:06:39,  2.27s/it] 12%|██████████▏                                                                       | 2511/20117 [1:33:40<11:12:55,  2.29s/it] 12%|██████████▏                                                                       | 2512/20117 [1:33:43<11:11:26,  2.29s/it] 12%|██████████▏                                                                       | 2513/20117 [1:33:45<11:09:47,  2.28s/it] 12%|██████████▏                                                                       | 2514/20117 [1:33:47<11:01:21,  2.25s/it] 13%|██████████▎                                                                       | 2515/20117 [1:33:49<10:59:55,  2.25s/it] 13%|██████████▎                                                                       | 2516/20117 [1:33:51<10:57:07,  2.24s/it] 13%|██████████▎                                                                       | 2517/20117 [1:33:54<10:53:08,  2.23s/it] 13%|██████████▎                                                                       | 2518/20117 [1:33:56<10:53:46,  2.23s/it] 13%|██████████▎                                                                       | 2519/20117 [1:33:58<10:57:53,  2.24s/it] 13%|██████████▎                                                                       | 2520/20117 [1:34:00<10:53:06,  2.23s/it]                                                                                                                                 {'loss': 0.314, 'grad_norm': 0.540902853012085, 'learning_rate': 0.00019287933163192602, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 390.5, 'epoch': 0.25}
 13%|██████████▎                                                                       | 2520/20117 [1:34:00<10:53:06,  2.23s/it] 13%|██████████▎                                                                       | 2521/20117 [1:34:03<10:48:06,  2.21s/it] 13%|██████████▎                                                                       | 2522/20117 [1:34:05<10:48:53,  2.21s/it] 13%|██████████▎                                                                       | 2523/20117 [1:34:07<10:47:31,  2.21s/it] 13%|██████████▎                                                                       | 2524/20117 [1:34:09<10:48:31,  2.21s/it] 13%|██████████▎                                                                       | 2525/20117 [1:34:11<10:45:48,  2.20s/it] 13%|██████████▎                                                                       | 2526/20117 [1:34:14<10:45:16,  2.20s/it] 13%|██████████▎                                                                       | 2527/20117 [1:34:16<11:24:34,  2.34s/it] 13%|██████████▎                                                                       | 2528/20117 [1:34:18<11:10:50,  2.29s/it] 13%|██████████▎                                                                       | 2529/20117 [1:34:21<11:28:54,  2.35s/it] 13%|██████████▎                                                                       | 2530/20117 [1:34:23<11:16:24,  2.31s/it]                                                                                                                                 {'loss': 0.2592, 'grad_norm': 0.2346281111240387, 'learning_rate': 0.00019282105328600303, 'memory/max_active (GiB)': 17.12, 'memory/max_allocated (GiB)': 17.12, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 335.14, 'epoch': 0.25}
 13%|██████████▎                                                                       | 2530/20117 [1:34:23<11:16:24,  2.31s/it] 13%|██████████▎                                                                       | 2531/20117 [1:34:25<11:05:50,  2.27s/it] 13%|██████████▎                                                                       | 2532/20117 [1:34:28<11:05:27,  2.27s/it] 13%|██████████▎                                                                       | 2533/20117 [1:34:30<11:09:40,  2.29s/it] 13%|██████████▎                                                                       | 2534/20117 [1:34:32<11:07:13,  2.28s/it] 13%|██████████▎                                                                       | 2535/20117 [1:34:34<11:06:33,  2.27s/it] 13%|██████████▎                                                                       | 2536/20117 [1:34:37<11:05:41,  2.27s/it] 13%|██████████▎                                                                       | 2537/20117 [1:34:39<11:13:30,  2.30s/it] 13%|██████████▎                                                                       | 2538/20117 [1:34:41<11:13:29,  2.30s/it] 13%|██████████▎                                                                       | 2539/20117 [1:34:44<11:14:44,  2.30s/it] 13%|██████████▎                                                                       | 2540/20117 [1:34:46<11:05:22,  2.27s/it]                                                                                                                                 {'loss': 0.2119, 'grad_norm': 0.4211244583129883, 'learning_rate': 0.0001927625463022076, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 360.45, 'epoch': 0.25}
 13%|██████████▎                                                                       | 2540/20117 [1:34:46<11:05:22,  2.27s/it] 13%|██████████▎                                                                       | 2541/20117 [1:34:48<11:02:28,  2.26s/it] 13%|██████████▎                                                                       | 2542/20117 [1:34:50<10:55:37,  2.24s/it] 13%|██████████▎                                                                       | 2543/20117 [1:34:52<10:57:31,  2.24s/it] 13%|██████████▎                                                                       | 2544/20117 [1:34:55<10:53:12,  2.23s/it] 13%|██████████▎                                                                       | 2545/20117 [1:34:57<10:57:38,  2.25s/it] 13%|██████████▍                                                                       | 2546/20117 [1:34:59<11:02:17,  2.26s/it] 13%|██████████▍                                                                       | 2547/20117 [1:35:01<10:59:19,  2.25s/it] 13%|██████████▍                                                                       | 2548/20117 [1:35:04<11:02:16,  2.26s/it] 13%|██████████▍                                                                       | 2549/20117 [1:35:06<11:02:24,  2.26s/it] 13%|██████████▍                                                                       | 2550/20117 [1:35:08<11:00:11,  2.25s/it]                                                                                                                                 {'loss': 0.2628, 'grad_norm': 0.5125908851623535, 'learning_rate': 0.00019270381082465483, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 316.73, 'epoch': 0.25}
 13%|██████████▍                                                                       | 2550/20117 [1:35:08<11:00:11,  2.25s/it] 13%|██████████▍                                                                       | 2551/20117 [1:35:11<11:05:47,  2.27s/it] 13%|██████████▍                                                                       | 2552/20117 [1:35:13<10:58:21,  2.25s/it] 13%|██████████▍                                                                       | 2553/20117 [1:35:15<10:53:38,  2.23s/it] 13%|██████████▍                                                                       | 2554/20117 [1:35:17<10:51:01,  2.22s/it] 13%|██████████▍                                                                       | 2555/20117 [1:35:19<10:45:34,  2.21s/it] 13%|██████████▍                                                                       | 2556/20117 [1:35:22<10:45:36,  2.21s/it] 13%|██████████▍                                                                       | 2557/20117 [1:35:24<10:45:05,  2.20s/it] 13%|██████████▍                                                                       | 2558/20117 [1:35:26<10:41:50,  2.19s/it] 13%|██████████▍                                                                       | 2559/20117 [1:35:28<10:41:58,  2.19s/it] 13%|██████████▍                                                                       | 2560/20117 [1:35:30<10:46:11,  2.21s/it]                                                                                                                                 {'loss': 0.2393, 'grad_norm': 0.20704445242881775, 'learning_rate': 0.00019264484699802262, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 352.58, 'epoch': 0.25}
 13%|██████████▍                                                                       | 2560/20117 [1:35:30<10:46:11,  2.21s/it] 13%|██████████▍                                                                       | 2561/20117 [1:35:33<10:41:38,  2.19s/it] 13%|██████████▍                                                                       | 2562/20117 [1:35:35<10:40:49,  2.19s/it] 13%|██████████▍                                                                       | 2563/20117 [1:35:37<10:47:09,  2.21s/it] 13%|██████████▍                                                                       | 2564/20117 [1:35:39<10:46:05,  2.21s/it] 13%|██████████▍                                                                       | 2565/20117 [1:35:41<10:49:36,  2.22s/it] 13%|██████████▍                                                                       | 2566/20117 [1:35:44<10:45:30,  2.21s/it] 13%|██████████▍                                                                       | 2567/20117 [1:35:46<10:47:24,  2.21s/it] 13%|██████████▍                                                                       | 2568/20117 [1:35:48<10:52:18,  2.23s/it] 13%|██████████▍                                                                       | 2569/20117 [1:35:50<10:53:24,  2.23s/it] 13%|██████████▍                                                                       | 2570/20117 [1:35:53<10:54:02,  2.24s/it]                                                                                                                                 {'loss': 0.1621, 'grad_norm': 0.2666438817977905, 'learning_rate': 0.00019258565496755128, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 346.34, 'epoch': 0.26}
 13%|██████████▍                                                                       | 2570/20117 [1:35:53<10:54:02,  2.24s/it] 13%|██████████▍                                                                       | 2571/20117 [1:35:55<10:55:06,  2.24s/it] 13%|██████████▍                                                                       | 2572/20117 [1:35:57<10:50:21,  2.22s/it] 13%|██████████▍                                                                       | 2573/20117 [1:35:59<10:50:34,  2.22s/it] 13%|██████████▍                                                                       | 2574/20117 [1:36:01<10:49:25,  2.22s/it] 13%|██████████▍                                                                       | 2575/20117 [1:36:04<10:46:33,  2.21s/it] 13%|██████████▌                                                                       | 2576/20117 [1:36:06<10:45:14,  2.21s/it] 13%|██████████▌                                                                       | 2577/20117 [1:36:08<10:41:05,  2.19s/it] 13%|██████████▌                                                                       | 2578/20117 [1:36:10<10:40:57,  2.19s/it] 13%|██████████▌                                                                       | 2579/20117 [1:36:12<10:41:11,  2.19s/it] 13%|██████████▌                                                                       | 2580/20117 [1:36:15<10:41:52,  2.20s/it]                                                                                                                                 {'loss': 0.2066, 'grad_norm': 0.4072653353214264, 'learning_rate': 0.00019252623487904335, 'memory/max_active (GiB)': 17.11, 'memory/max_allocated (GiB)': 17.11, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 360.03, 'epoch': 0.26}
 13%|██████████▌                                                                       | 2580/20117 [1:36:15<10:41:52,  2.20s/it] 13%|██████████▌                                                                       | 2581/20117 [1:36:17<11:05:22,  2.28s/it] 13%|██████████▌                                                                       | 2582/20117 [1:36:19<11:02:07,  2.27s/it] 13%|██████████▌                                                                       | 2583/20117 [1:36:21<10:53:48,  2.24s/it] 13%|██████████▌                                                                       | 2584/20117 [1:36:24<10:49:41,  2.22s/it] 13%|██████████▌                                                                       | 2585/20117 [1:36:26<10:48:59,  2.22s/it] 13%|██████████▌                                                                       | 2586/20117 [1:36:28<10:47:00,  2.21s/it] 13%|██████████▌                                                                       | 2587/20117 [1:36:30<10:45:30,  2.21s/it] 13%|██████████▌                                                                       | 2588/20117 [1:36:32<10:47:22,  2.22s/it] 13%|██████████▌                                                                       | 2589/20117 [1:36:35<10:44:59,  2.21s/it] 13%|██████████▌                                                                       | 2590/20117 [1:36:37<10:43:42,  2.20s/it]                                                                                                                                 {'loss': 0.256, 'grad_norm': 0.517437219619751, 'learning_rate': 0.00019246658687886302, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 335.1, 'epoch': 0.26}
 13%|██████████▌                                                                       | 2590/20117 [1:36:37<10:43:42,  2.20s/it] 13%|██████████▌                                                                       | 2591/20117 [1:36:39<10:48:06,  2.22s/it] 13%|██████████▌                                                                       | 2592/20117 [1:36:41<10:48:48,  2.22s/it] 13%|██████████▌                                                                       | 2593/20117 [1:36:44<10:47:48,  2.22s/it] 13%|██████████▌                                                                       | 2594/20117 [1:36:46<10:51:33,  2.23s/it] 13%|██████████▌                                                                       | 2595/20117 [1:36:48<10:51:50,  2.23s/it] 13%|██████████▌                                                                       | 2596/20117 [1:36:50<10:48:55,  2.22s/it] 13%|██████████▌                                                                       | 2597/20117 [1:36:52<10:49:02,  2.22s/it] 13%|██████████▌                                                                       | 2598/20117 [1:36:55<10:48:09,  2.22s/it] 13%|██████████▌                                                                       | 2599/20117 [1:36:57<10:46:05,  2.21s/it] 13%|██████████▌                                                                       | 2600/20117 [1:36:59<10:45:06,  2.21s/it]                                                                                                                                 {'loss': 0.2437, 'grad_norm': 1.1084229946136475, 'learning_rate': 0.00019240671111393597, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 336.03, 'epoch': 0.26}
 13%|██████████▌                                                                       | 2600/20117 [1:36:59<10:45:06,  2.21s/it] 13%|██████████▌                                                                       | 2601/20117 [1:37:01<10:46:28,  2.21s/it] 13%|██████████▌                                                                       | 2602/20117 [1:37:04<10:44:23,  2.21s/it] 13%|██████████▌                                                                       | 2603/20117 [1:37:06<10:43:14,  2.20s/it] 13%|██████████▌                                                                       | 2604/20117 [1:37:08<10:48:24,  2.22s/it] 13%|██████████▌                                                                       | 2605/20117 [1:37:10<10:51:23,  2.23s/it] 13%|██████████▌                                                                       | 2606/20117 [1:37:12<10:49:37,  2.23s/it] 13%|██████████▋                                                                       | 2607/20117 [1:37:15<11:06:40,  2.28s/it] 13%|██████████▋                                                                       | 2608/20117 [1:37:17<11:10:34,  2.30s/it] 13%|██████████▋                                                                       | 2609/20117 [1:37:20<11:20:46,  2.33s/it] 13%|██████████▋                                                                       | 2610/20117 [1:37:22<11:17:32,  2.32s/it]                                                                                                                                 {'loss': 0.2102, 'grad_norm': 0.3532284200191498, 'learning_rate': 0.00019234660773174883, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 304.29, 'epoch': 0.26}
 13%|██████████▋                                                                       | 2610/20117 [1:37:22<11:17:32,  2.32s/it] 13%|██████████▋                                                                       | 2611/20117 [1:37:24<11:22:07,  2.34s/it] 13%|██████████▋                                                                       | 2612/20117 [1:37:27<11:25:14,  2.35s/it] 13%|██████████▋                                                                       | 2613/20117 [1:37:29<11:33:56,  2.38s/it] 13%|██████████▋                                                                       | 2614/20117 [1:37:31<11:24:41,  2.35s/it] 13%|██████████▋                                                                       | 2615/20117 [1:37:34<11:19:44,  2.33s/it] 13%|██████████▋                                                                       | 2616/20117 [1:37:36<11:15:16,  2.32s/it] 13%|██████████▋                                                                       | 2617/20117 [1:37:38<11:01:12,  2.27s/it] 13%|██████████▋                                                                       | 2618/20117 [1:37:40<10:54:50,  2.25s/it] 13%|██████████▋                                                                       | 2619/20117 [1:37:42<10:51:05,  2.23s/it] 13%|██████████▋                                                                       | 2620/20117 [1:37:45<10:55:11,  2.25s/it]                                                                                                                                 {'loss': 0.3338, 'grad_norm': 0.4219340980052948, 'learning_rate': 0.00019228627688034898, 'memory/max_active (GiB)': 20.61, 'memory/max_allocated (GiB)': 20.61, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 350.23, 'epoch': 0.26}
 13%|██████████▋                                                                       | 2620/20117 [1:37:45<10:55:11,  2.25s/it] 13%|██████████▋                                                                       | 2621/20117 [1:37:47<10:52:26,  2.24s/it] 13%|██████████▋                                                                       | 2622/20117 [1:37:49<10:50:19,  2.23s/it] 13%|██████████▋                                                                       | 2623/20117 [1:37:51<10:52:03,  2.24s/it] 13%|██████████▋                                                                       | 2624/20117 [1:37:54<10:43:32,  2.21s/it] 13%|██████████▋                                                                       | 2625/20117 [1:37:56<10:45:37,  2.21s/it] 13%|██████████▋                                                                       | 2626/20117 [1:37:58<10:39:01,  2.19s/it] 13%|██████████▋                                                                       | 2627/20117 [1:38:00<10:42:50,  2.21s/it] 13%|██████████▋                                                                       | 2628/20117 [1:38:02<10:44:43,  2.21s/it] 13%|██████████▋                                                                       | 2629/20117 [1:38:05<10:42:36,  2.20s/it] 13%|██████████▋                                                                       | 2630/20117 [1:38:07<10:48:26,  2.22s/it]                                                                                                                                 {'loss': 0.2466, 'grad_norm': 0.3215112090110779, 'learning_rate': 0.000192225718708344, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 369.51, 'epoch': 0.26}
 13%|██████████▋                                                                       | 2630/20117 [1:38:07<10:48:26,  2.22s/it] 13%|██████████▋                                                                       | 2631/20117 [1:38:09<10:51:42,  2.24s/it] 13%|██████████▋                                                                       | 2632/20117 [1:38:12<11:27:36,  2.36s/it] 13%|██████████▋                                                                       | 2633/20117 [1:38:14<11:16:25,  2.32s/it] 13%|██████████▋                                                                       | 2634/20117 [1:38:16<11:06:47,  2.29s/it] 13%|██████████▋                                                                       | 2635/20117 [1:38:19<11:07:19,  2.29s/it] 13%|██████████▋                                                                       | 2636/20117 [1:38:21<11:00:14,  2.27s/it] 13%|██████████▋                                                                       | 2637/20117 [1:38:23<10:53:55,  2.24s/it] 13%|██████████▊                                                                       | 2638/20117 [1:38:25<10:45:43,  2.22s/it] 13%|██████████▊                                                                       | 2639/20117 [1:38:27<10:42:43,  2.21s/it] 13%|██████████▊                                                                       | 2640/20117 [1:38:30<10:47:10,  2.22s/it]                                                                                                                                 {'loss': 0.2839, 'grad_norm': 0.36818957328796387, 'learning_rate': 0.00019216493336490152, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 369.73, 'epoch': 0.26}
 13%|██████████▊                                                                       | 2640/20117 [1:38:30<10:47:10,  2.22s/it] 13%|██████████▊                                                                       | 2641/20117 [1:38:32<10:47:14,  2.22s/it] 13%|██████████▊                                                                       | 2642/20117 [1:38:34<10:46:44,  2.22s/it] 13%|██████████▊                                                                       | 2643/20117 [1:38:36<10:50:21,  2.23s/it] 13%|██████████▊                                                                       | 2644/20117 [1:38:38<10:50:44,  2.23s/it] 13%|██████████▊                                                                       | 2645/20117 [1:38:41<10:50:54,  2.24s/it] 13%|██████████▊                                                                       | 2646/20117 [1:38:43<10:54:44,  2.25s/it] 13%|██████████▊                                                                       | 2647/20117 [1:38:45<10:54:13,  2.25s/it] 13%|██████████▊                                                                       | 2648/20117 [1:38:47<10:47:14,  2.22s/it] 13%|██████████▊                                                                       | 2649/20117 [1:38:50<10:51:42,  2.24s/it] 13%|██████████▊                                                                       | 2650/20117 [1:38:52<10:53:10,  2.24s/it]                                                                                                                                 {'loss': 0.2231, 'grad_norm': 0.41190099716186523, 'learning_rate': 0.0001921039209997486, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 334.38, 'epoch': 0.26}
 13%|██████████▊                                                                       | 2650/20117 [1:38:52<10:53:10,  2.24s/it] 13%|██████████▊                                                                       | 2651/20117 [1:38:54<11:01:15,  2.27s/it] 13%|██████████▊                                                                       | 2652/20117 [1:38:57<11:04:28,  2.28s/it] 13%|██████████▊                                                                       | 2653/20117 [1:38:59<11:01:34,  2.27s/it] 13%|██████████▊                                                                       | 2654/20117 [1:39:01<11:02:32,  2.28s/it] 13%|██████████▊                                                                       | 2655/20117 [1:39:03<10:56:48,  2.26s/it] 13%|██████████▊                                                                       | 2656/20117 [1:39:06<10:54:20,  2.25s/it] 13%|██████████▊                                                                       | 2657/20117 [1:39:08<10:51:55,  2.24s/it] 13%|██████████▊                                                                       | 2658/20117 [1:39:10<10:51:58,  2.24s/it] 13%|██████████▊                                                                       | 2659/20117 [1:39:12<10:47:12,  2.22s/it] 13%|██████████▊                                                                       | 2660/20117 [1:39:14<10:50:43,  2.24s/it]                                                                                                                                 {'loss': 0.162, 'grad_norm': 0.3659015893936157, 'learning_rate': 0.0001920426817631717, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 336.11, 'epoch': 0.26}
 13%|██████████▊                                                                       | 2660/20117 [1:39:14<10:50:43,  2.24s/it] 13%|██████████▊                                                                       | 2661/20117 [1:39:17<10:53:10,  2.25s/it] 13%|██████████▊                                                                       | 2662/20117 [1:39:19<10:53:45,  2.25s/it] 13%|██████████▊                                                                       | 2663/20117 [1:39:21<10:48:40,  2.23s/it] 13%|██████████▊                                                                       | 2664/20117 [1:39:23<10:45:40,  2.22s/it] 13%|██████████▊                                                                       | 2665/20117 [1:39:26<10:49:11,  2.23s/it] 13%|██████████▊                                                                       | 2666/20117 [1:39:28<10:47:05,  2.22s/it] 13%|██████████▊                                                                       | 2667/20117 [1:39:30<10:49:38,  2.23s/it] 13%|██████████▉                                                                       | 2668/20117 [1:39:32<10:49:21,  2.23s/it] 13%|██████████▉                                                                       | 2669/20117 [1:39:35<10:48:17,  2.23s/it] 13%|██████████▉                                                                       | 2670/20117 [1:39:37<10:47:46,  2.23s/it]                                                                                                                                 {'loss': 0.2313, 'grad_norm': 0.13166293501853943, 'learning_rate': 0.00019198121580601596, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 341.25, 'epoch': 0.27}
 13%|██████████▉                                                                       | 2670/20117 [1:39:37<10:47:46,  2.23s/it] 13%|██████████▉                                                                       | 2671/20117 [1:39:39<10:48:22,  2.23s/it] 13%|██████████▉                                                                       | 2672/20117 [1:39:41<10:44:40,  2.22s/it] 13%|██████████▉                                                                       | 2673/20117 [1:39:43<10:47:49,  2.23s/it] 13%|██████████▉                                                                       | 2674/20117 [1:39:46<10:49:48,  2.24s/it] 13%|██████████▉                                                                       | 2675/20117 [1:39:48<10:46:29,  2.22s/it] 13%|██████████▉                                                                       | 2676/20117 [1:39:50<10:52:02,  2.24s/it] 13%|██████████▉                                                                       | 2677/20117 [1:39:52<10:51:42,  2.24s/it] 13%|██████████▉                                                                       | 2678/20117 [1:39:55<10:47:07,  2.23s/it] 13%|██████████▉                                                                       | 2679/20117 [1:39:57<10:50:19,  2.24s/it] 13%|██████████▉                                                                       | 2680/20117 [1:39:59<10:54:30,  2.25s/it]                                                                                                                                 {'loss': 0.2887, 'grad_norm': 0.3052745759487152, 'learning_rate': 0.00019191952327968497, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 390.54, 'epoch': 0.27}
 13%|██████████▉                                                                       | 2680/20117 [1:39:59<10:54:30,  2.25s/it] 13%|██████████▉                                                                       | 2681/20117 [1:40:01<10:48:34,  2.23s/it] 13%|██████████▉                                                                       | 2682/20117 [1:40:04<10:46:57,  2.23s/it] 13%|██████████▉                                                                       | 2683/20117 [1:40:06<10:39:45,  2.20s/it] 13%|██████████▉                                                                       | 2684/20117 [1:40:08<10:36:48,  2.19s/it] 13%|██████████▉                                                                       | 2685/20117 [1:40:10<11:10:08,  2.31s/it] 13%|██████████▉                                                                       | 2686/20117 [1:40:13<11:00:16,  2.27s/it] 13%|██████████▉                                                                       | 2687/20117 [1:40:15<10:56:31,  2.26s/it] 13%|██████████▉                                                                       | 2688/20117 [1:40:17<10:58:49,  2.27s/it] 13%|██████████▉                                                                       | 2689/20117 [1:40:19<10:48:34,  2.23s/it] 13%|██████████▉                                                                       | 2690/20117 [1:40:21<10:43:14,  2.21s/it]                                                                                                                                 {'loss': 0.2272, 'grad_norm': 0.3094649910926819, 'learning_rate': 0.00019185760433614054, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 371.44, 'epoch': 0.27}
 13%|██████████▉                                                                       | 2690/20117 [1:40:21<10:43:14,  2.21s/it] 13%|██████████▉                                                                       | 2691/20117 [1:40:24<10:49:23,  2.24s/it] 13%|██████████▉                                                                       | 2692/20117 [1:40:26<10:50:12,  2.24s/it] 13%|██████████▉                                                                       | 2693/20117 [1:40:28<10:55:26,  2.26s/it] 13%|██████████▉                                                                       | 2694/20117 [1:40:31<10:53:58,  2.25s/it] 13%|██████████▉                                                                       | 2695/20117 [1:40:33<10:59:45,  2.27s/it] 13%|██████████▉                                                                       | 2696/20117 [1:40:35<10:58:10,  2.27s/it] 13%|██████████▉                                                                       | 2697/20117 [1:40:37<10:57:54,  2.27s/it] 13%|██████████▉                                                                       | 2698/20117 [1:40:40<11:05:16,  2.29s/it] 13%|███████████                                                                       | 2699/20117 [1:40:42<11:02:43,  2.28s/it] 13%|███████████                                                                       | 2700/20117 [1:40:44<11:00:47,  2.28s/it]                                                                                                                                 {'loss': 0.2826, 'grad_norm': 4.753208160400391, 'learning_rate': 0.00019179545912790207, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.43, 'tokens_per_second_per_gpu': 374.86, 'epoch': 0.27}
 13%|███████████                                                                       | 2700/20117 [1:40:44<11:00:47,  2.28s/it] 13%|███████████                                                                       | 2701/20117 [1:40:47<10:59:27,  2.27s/it] 13%|███████████                                                                       | 2702/20117 [1:40:49<10:53:03,  2.25s/it] 13%|███████████                                                                       | 2703/20117 [1:40:51<10:50:58,  2.24s/it] 13%|███████████                                                                       | 2704/20117 [1:40:53<10:52:32,  2.25s/it] 13%|███████████                                                                       | 2705/20117 [1:40:55<10:53:41,  2.25s/it] 13%|███████████                                                                       | 2706/20117 [1:40:58<10:54:23,  2.26s/it] 13%|███████████                                                                       | 2707/20117 [1:41:00<10:55:06,  2.26s/it] 13%|███████████                                                                       | 2708/20117 [1:41:02<10:57:03,  2.26s/it] 13%|███████████                                                                       | 2709/20117 [1:41:04<10:51:58,  2.25s/it] 13%|███████████                                                                       | 2710/20117 [1:41:07<10:51:27,  2.25s/it]                                                                                                                                 {'loss': 0.2372, 'grad_norm': 0.3265855014324188, 'learning_rate': 0.00019173308780804637, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 320.39, 'epoch': 0.27}
 13%|███████████                                                                       | 2710/20117 [1:41:07<10:51:27,  2.25s/it] 13%|███████████                                                                       | 2711/20117 [1:41:09<10:48:34,  2.24s/it] 13%|███████████                                                                       | 2712/20117 [1:41:11<10:48:20,  2.24s/it] 13%|███████████                                                                       | 2713/20117 [1:41:13<10:42:56,  2.22s/it] 13%|███████████                                                                       | 2714/20117 [1:41:16<10:43:31,  2.22s/it] 13%|███████████                                                                       | 2715/20117 [1:41:18<10:45:14,  2.22s/it] 14%|███████████                                                                       | 2716/20117 [1:41:20<10:43:22,  2.22s/it] 14%|███████████                                                                       | 2717/20117 [1:41:22<10:44:35,  2.22s/it] 14%|███████████                                                                       | 2718/20117 [1:41:24<10:46:55,  2.23s/it] 14%|███████████                                                                       | 2719/20117 [1:41:27<10:48:19,  2.24s/it] 14%|███████████                                                                       | 2720/20117 [1:41:29<10:46:59,  2.23s/it]                                                                                                                                 {'loss': 0.247, 'grad_norm': 0.3575897514820099, 'learning_rate': 0.00019167049053020712, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 312.17, 'epoch': 0.27}
 14%|███████████                                                                       | 2720/20117 [1:41:29<10:46:59,  2.23s/it] 14%|███████████                                                                       | 2721/20117 [1:41:31<10:51:37,  2.25s/it] 14%|███████████                                                                       | 2722/20117 [1:41:33<10:50:24,  2.24s/it] 14%|███████████                                                                       | 2723/20117 [1:41:36<10:50:28,  2.24s/it] 14%|███████████                                                                       | 2724/20117 [1:41:38<10:46:24,  2.23s/it] 14%|███████████                                                                       | 2725/20117 [1:41:40<10:45:41,  2.23s/it] 14%|███████████                                                                       | 2726/20117 [1:41:42<10:50:56,  2.25s/it] 14%|███████████                                                                       | 2727/20117 [1:41:45<10:42:46,  2.22s/it] 14%|███████████                                                                       | 2728/20117 [1:41:47<10:40:03,  2.21s/it] 14%|███████████                                                                       | 2729/20117 [1:41:49<10:39:00,  2.21s/it] 14%|███████████▏                                                                      | 2730/20117 [1:41:51<10:38:44,  2.20s/it]                                                                                                                                 {'loss': 0.2732, 'grad_norm': 0.486605703830719, 'learning_rate': 0.00019160766744857476, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 388.44, 'epoch': 0.27}
 14%|███████████▏                                                                      | 2730/20117 [1:41:51<10:38:44,  2.20s/it] 14%|███████████▏                                                                      | 2731/20117 [1:41:53<10:42:40,  2.22s/it] 14%|███████████▏                                                                      | 2732/20117 [1:41:56<10:48:50,  2.24s/it] 14%|███████████▏                                                                      | 2733/20117 [1:41:58<10:48:45,  2.24s/it] 14%|███████████▏                                                                      | 2734/20117 [1:42:00<10:47:58,  2.24s/it] 14%|███████████▏                                                                      | 2735/20117 [1:42:02<10:41:47,  2.22s/it] 14%|███████████▏                                                                      | 2736/20117 [1:42:05<10:57:49,  2.27s/it] 14%|███████████▏                                                                      | 2737/20117 [1:42:07<11:18:09,  2.34s/it] 14%|███████████▏                                                                      | 2738/20117 [1:42:09<11:08:14,  2.31s/it] 14%|███████████▏                                                                      | 2739/20117 [1:42:12<11:04:37,  2.29s/it] 14%|███████████▏                                                                      | 2740/20117 [1:42:14<10:54:44,  2.26s/it]                                                                                                                                 {'loss': 0.2733, 'grad_norm': 0.3815803527832031, 'learning_rate': 0.00019154461871789572, 'memory/max_active (GiB)': 19.76, 'memory/max_allocated (GiB)': 19.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 338.38, 'epoch': 0.27}
 14%|███████████▏                                                                      | 2740/20117 [1:42:14<10:54:44,  2.26s/it] 14%|███████████▏                                                                      | 2741/20117 [1:42:16<10:53:01,  2.25s/it] 14%|███████████▏                                                                      | 2742/20117 [1:42:18<10:56:08,  2.27s/it] 14%|███████████▏                                                                      | 2743/20117 [1:42:21<10:47:06,  2.23s/it] 14%|███████████▏                                                                      | 2744/20117 [1:42:23<10:47:22,  2.24s/it] 14%|███████████▏                                                                      | 2745/20117 [1:42:25<10:51:21,  2.25s/it] 14%|███████████▏                                                                      | 2746/20117 [1:42:27<10:43:42,  2.22s/it] 14%|███████████▏                                                                      | 2747/20117 [1:42:30<10:45:19,  2.23s/it] 14%|███████████▏                                                                      | 2748/20117 [1:42:32<10:47:41,  2.24s/it] 14%|███████████▏                                                                      | 2749/20117 [1:42:34<10:52:01,  2.25s/it] 14%|███████████▏                                                                      | 2750/20117 [1:42:36<10:48:17,  2.24s/it]                                                                                                                                 {'loss': 0.2912, 'grad_norm': 0.3805699050426483, 'learning_rate': 0.0001914813444934724, 'memory/max_active (GiB)': 19.67, 'memory/max_allocated (GiB)': 19.67, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.54, 'epoch': 0.27}
 14%|███████████▏                                                                      | 2750/20117 [1:42:36<10:48:17,  2.24s/it] 14%|███████████▏                                                                      | 2751/20117 [1:42:39<10:50:01,  2.25s/it] 14%|███████████▏                                                                      | 2752/20117 [1:42:41<10:46:45,  2.23s/it] 14%|███████████▏                                                                      | 2753/20117 [1:42:43<10:50:49,  2.25s/it] 14%|███████████▏                                                                      | 2754/20117 [1:42:45<10:51:56,  2.25s/it] 14%|███████████▏                                                                      | 2755/20117 [1:42:48<10:55:11,  2.26s/it] 14%|███████████▏                                                                      | 2756/20117 [1:42:50<10:52:21,  2.25s/it] 14%|███████████▏                                                                      | 2757/20117 [1:42:52<10:48:28,  2.24s/it] 14%|███████████▏                                                                      | 2758/20117 [1:42:54<10:47:18,  2.24s/it] 14%|███████████▏                                                                      | 2759/20117 [1:42:56<10:43:24,  2.22s/it] 14%|███████████▎                                                                      | 2760/20117 [1:42:59<10:40:39,  2.21s/it]                                                                                                                                 {'loss': 0.3009, 'grad_norm': 0.5376086831092834, 'learning_rate': 0.00019141784493116254, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 392.53, 'epoch': 0.27}
 14%|███████████▎                                                                      | 2760/20117 [1:42:59<10:40:39,  2.21s/it] 14%|███████████▎                                                                      | 2761/20117 [1:43:01<10:42:01,  2.22s/it] 14%|███████████▎                                                                      | 2762/20117 [1:43:03<10:36:06,  2.20s/it] 14%|███████████▎                                                                      | 2763/20117 [1:43:05<10:39:21,  2.21s/it] 14%|███████████▎                                                                      | 2764/20117 [1:43:08<10:56:14,  2.27s/it] 14%|███████████▎                                                                      | 2765/20117 [1:43:10<10:56:26,  2.27s/it] 14%|███████████▎                                                                      | 2766/20117 [1:43:12<10:54:09,  2.26s/it] 14%|███████████▎                                                                      | 2767/20117 [1:43:14<10:51:08,  2.25s/it] 14%|███████████▎                                                                      | 2768/20117 [1:43:17<10:52:15,  2.26s/it] 14%|███████████▎                                                                      | 2769/20117 [1:43:19<10:58:41,  2.28s/it] 14%|███████████▎                                                                      | 2770/20117 [1:43:21<10:51:03,  2.25s/it]                                                                                                                                 {'loss': 0.3044, 'grad_norm': 0.2709028720855713, 'learning_rate': 0.000191354120187379, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.03, 'epoch': 0.28}
 14%|███████████▎                                                                      | 2770/20117 [1:43:21<10:51:03,  2.25s/it] 14%|███████████▎                                                                      | 2771/20117 [1:43:23<10:53:51,  2.26s/it] 14%|███████████▎                                                                      | 2772/20117 [1:43:26<10:58:56,  2.28s/it] 14%|███████████▎                                                                      | 2773/20117 [1:43:28<10:52:24,  2.26s/it] 14%|███████████▎                                                                      | 2774/20117 [1:43:30<10:49:28,  2.25s/it] 14%|███████████▎                                                                      | 2775/20117 [1:43:32<10:44:34,  2.23s/it] 14%|███████████▎                                                                      | 2776/20117 [1:43:35<10:41:06,  2.22s/it] 14%|███████████▎                                                                      | 2777/20117 [1:43:37<10:48:18,  2.24s/it] 14%|███████████▎                                                                      | 2778/20117 [1:43:39<10:47:39,  2.24s/it] 14%|███████████▎                                                                      | 2779/20117 [1:43:41<10:47:38,  2.24s/it] 14%|███████████▎                                                                      | 2780/20117 [1:43:44<10:51:17,  2.25s/it]                                                                                                                                 {'loss': 0.2744, 'grad_norm': 0.24123673141002655, 'learning_rate': 0.00019129017041908934, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.75, 'epoch': 0.28}
 14%|███████████▎                                                                      | 2780/20117 [1:43:44<10:51:17,  2.25s/it] 14%|███████████▎                                                                      | 2781/20117 [1:43:46<10:48:47,  2.25s/it] 14%|███████████▎                                                                      | 2782/20117 [1:43:48<10:45:50,  2.24s/it] 14%|███████████▎                                                                      | 2783/20117 [1:43:50<10:50:10,  2.25s/it] 14%|███████████▎                                                                      | 2784/20117 [1:43:53<10:44:48,  2.23s/it] 14%|███████████▎                                                                      | 2785/20117 [1:43:55<10:45:38,  2.24s/it] 14%|███████████▎                                                                      | 2786/20117 [1:43:57<10:40:41,  2.22s/it] 14%|███████████▎                                                                      | 2787/20117 [1:43:59<10:46:30,  2.24s/it] 14%|███████████▎                                                                      | 2788/20117 [1:44:02<10:45:37,  2.24s/it] 14%|███████████▎                                                                      | 2789/20117 [1:44:04<10:46:40,  2.24s/it] 14%|███████████▎                                                                      | 2790/20117 [1:44:06<10:44:02,  2.23s/it]                                                                                                                                 {'loss': 0.2764, 'grad_norm': 0.4202309548854828, 'learning_rate': 0.00019122599578381532, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 387.68, 'epoch': 0.28}
 14%|███████████▎                                                                      | 2790/20117 [1:44:06<10:44:02,  2.23s/it] 14%|███████████▍                                                                      | 2791/20117 [1:44:09<11:11:56,  2.33s/it] 14%|███████████▍                                                                      | 2792/20117 [1:44:11<11:08:32,  2.32s/it] 14%|███████████▍                                                                      | 2793/20117 [1:44:13<10:59:49,  2.29s/it] 14%|███████████▍                                                                      | 2794/20117 [1:44:15<10:56:41,  2.27s/it] 14%|███████████▍                                                                      | 2795/20117 [1:44:18<10:57:30,  2.28s/it] 14%|███████████▍                                                                      | 2796/20117 [1:44:20<10:55:39,  2.27s/it] 14%|███████████▍                                                                      | 2797/20117 [1:44:22<10:52:39,  2.26s/it] 14%|███████████▍                                                                      | 2798/20117 [1:44:24<10:49:25,  2.25s/it] 14%|███████████▍                                                                      | 2799/20117 [1:44:27<10:50:09,  2.25s/it] 14%|███████████▍                                                                      | 2800/20117 [1:44:29<10:46:54,  2.24s/it]                                                                                                                                 {'loss': 0.245, 'grad_norm': 0.42621132731437683, 'learning_rate': 0.00019116159643963262, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.56, 'epoch': 0.28}
 14%|███████████▍                                                                      | 2800/20117 [1:44:29<10:46:54,  2.24s/it] 14%|███████████▍                                                                      | 2801/20117 [1:44:31<10:47:45,  2.24s/it] 14%|███████████▍                                                                      | 2802/20117 [1:44:33<10:44:13,  2.23s/it] 14%|███████████▍                                                                      | 2803/20117 [1:44:35<10:44:26,  2.23s/it] 14%|███████████▍                                                                      | 2804/20117 [1:44:38<10:47:34,  2.24s/it] 14%|███████████▍                                                                      | 2805/20117 [1:44:40<10:57:07,  2.28s/it] 14%|███████████▍                                                                      | 2806/20117 [1:44:42<10:55:49,  2.27s/it] 14%|███████████▍                                                                      | 2807/20117 [1:44:45<10:47:52,  2.25s/it] 14%|███████████▍                                                                      | 2808/20117 [1:44:47<10:46:24,  2.24s/it] 14%|███████████▍                                                                      | 2809/20117 [1:44:49<10:42:48,  2.23s/it] 14%|███████████▍                                                                      | 2810/20117 [1:44:51<10:47:39,  2.25s/it]                                                                                                                                 {'loss': 0.2809, 'grad_norm': 0.30440858006477356, 'learning_rate': 0.00019109697254517048, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 382.5, 'epoch': 0.28}
 14%|███████████▍                                                                      | 2810/20117 [1:44:51<10:47:39,  2.25s/it] 14%|███████████▍                                                                      | 2811/20117 [1:44:53<10:41:45,  2.22s/it] 14%|███████████▍                                                                      | 2812/20117 [1:44:56<10:37:50,  2.21s/it] 14%|███████████▍                                                                      | 2813/20117 [1:44:58<10:36:21,  2.21s/it] 14%|███████████▍                                                                      | 2814/20117 [1:45:00<10:33:50,  2.20s/it] 14%|███████████▍                                                                      | 2815/20117 [1:45:02<10:34:45,  2.20s/it] 14%|███████████▍                                                                      | 2816/20117 [1:45:04<10:43:50,  2.23s/it] 14%|███████████▍                                                                      | 2817/20117 [1:45:07<10:45:59,  2.24s/it] 14%|███████████▍                                                                      | 2818/20117 [1:45:09<10:48:53,  2.25s/it] 14%|███████████▍                                                                      | 2819/20117 [1:45:11<10:51:58,  2.26s/it] 14%|███████████▍                                                                      | 2820/20117 [1:45:14<10:54:10,  2.27s/it]                                                                                                                                 {'loss': 0.319, 'grad_norm': 0.49908211827278137, 'learning_rate': 0.00019103212425961111, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 402.18, 'epoch': 0.28}
 14%|███████████▍                                                                      | 2820/20117 [1:45:14<10:54:10,  2.27s/it] 14%|███████████▍                                                                      | 2821/20117 [1:45:16<10:52:30,  2.26s/it] 14%|███████████▌                                                                      | 2822/20117 [1:45:18<10:52:40,  2.26s/it] 14%|███████████▌                                                                      | 2823/20117 [1:45:20<10:53:55,  2.27s/it] 14%|███████████▌                                                                      | 2824/20117 [1:45:23<10:50:36,  2.26s/it] 14%|███████████▌                                                                      | 2825/20117 [1:45:25<10:52:29,  2.26s/it] 14%|███████████▌                                                                      | 2826/20117 [1:45:27<10:43:31,  2.23s/it] 14%|███████████▌                                                                      | 2827/20117 [1:45:29<10:37:40,  2.21s/it] 14%|███████████▌                                                                      | 2828/20117 [1:45:32<10:42:29,  2.23s/it] 14%|███████████▌                                                                      | 2829/20117 [1:45:34<10:44:24,  2.24s/it] 14%|███████████▌                                                                      | 2830/20117 [1:45:36<10:46:39,  2.24s/it]                                                                                                                                 {'loss': 0.2392, 'grad_norm': 0.38030481338500977, 'learning_rate': 0.00019096705174268967, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.29, 'epoch': 0.28}
 14%|███████████▌                                                                      | 2830/20117 [1:45:36<10:46:39,  2.24s/it] 14%|███████████▌                                                                      | 2831/20117 [1:45:38<10:50:41,  2.26s/it] 14%|███████████▌                                                                      | 2832/20117 [1:45:41<10:46:58,  2.25s/it] 14%|███████████▌                                                                      | 2833/20117 [1:45:43<10:51:08,  2.26s/it] 14%|███████████▌                                                                      | 2834/20117 [1:45:45<10:43:25,  2.23s/it] 14%|███████████▌                                                                      | 2835/20117 [1:45:47<10:45:28,  2.24s/it] 14%|███████████▌                                                                      | 2836/20117 [1:45:49<10:39:27,  2.22s/it] 14%|███████████▌                                                                      | 2837/20117 [1:45:52<10:37:08,  2.21s/it] 14%|███████████▌                                                                      | 2838/20117 [1:45:54<10:38:00,  2.22s/it] 14%|███████████▌                                                                      | 2839/20117 [1:45:56<10:41:55,  2.23s/it] 14%|███████████▌                                                                      | 2840/20117 [1:45:58<10:43:21,  2.23s/it]                                                                                                                                 {'loss': 0.2962, 'grad_norm': 0.4159802198410034, 'learning_rate': 0.00019090175515469344, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 354.51, 'epoch': 0.28}
 14%|███████████▌                                                                      | 2840/20117 [1:45:58<10:43:21,  2.23s/it] 14%|███████████▌                                                                      | 2841/20117 [1:46:01<10:44:22,  2.24s/it] 14%|███████████▌                                                                      | 2842/20117 [1:46:03<10:41:23,  2.23s/it] 14%|███████████▌                                                                      | 2843/20117 [1:46:05<10:38:04,  2.22s/it] 14%|███████████▌                                                                      | 2844/20117 [1:46:07<11:02:25,  2.30s/it] 14%|███████████▌                                                                      | 2845/20117 [1:46:10<10:54:06,  2.27s/it] 14%|███████████▌                                                                      | 2846/20117 [1:46:12<10:53:44,  2.27s/it] 14%|███████████▌                                                                      | 2847/20117 [1:46:14<10:51:31,  2.26s/it] 14%|███████████▌                                                                      | 2848/20117 [1:46:16<10:47:45,  2.25s/it] 14%|███████████▌                                                                      | 2849/20117 [1:46:19<10:48:57,  2.25s/it] 14%|███████████▌                                                                      | 2850/20117 [1:46:21<10:48:01,  2.25s/it]                                                                                                                                 {'loss': 0.2523, 'grad_norm': 0.47257235646247864, 'learning_rate': 0.00019083623465646172, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.31, 'epoch': 0.28}
 14%|███████████▌                                                                      | 2850/20117 [1:46:21<10:48:01,  2.25s/it] 14%|███████████▌                                                                      | 2851/20117 [1:46:23<10:48:09,  2.25s/it] 14%|███████████▋                                                                      | 2852/20117 [1:46:26<10:56:19,  2.28s/it] 14%|███████████▋                                                                      | 2853/20117 [1:46:28<10:53:07,  2.27s/it] 14%|███████████▋                                                                      | 2854/20117 [1:46:30<10:54:19,  2.27s/it] 14%|███████████▋                                                                      | 2855/20117 [1:46:32<10:56:16,  2.28s/it] 14%|███████████▋                                                                      | 2856/20117 [1:46:35<10:56:17,  2.28s/it] 14%|███████████▋                                                                      | 2857/20117 [1:46:37<10:56:22,  2.28s/it] 14%|███████████▋                                                                      | 2858/20117 [1:46:39<10:51:14,  2.26s/it] 14%|███████████▋                                                                      | 2859/20117 [1:46:41<10:51:25,  2.26s/it] 14%|███████████▋                                                                      | 2860/20117 [1:46:44<10:52:56,  2.27s/it]                                                                                                                                 {'loss': 0.2605, 'grad_norm': 0.2591971158981323, 'learning_rate': 0.0001907704904093854, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 386.6, 'epoch': 0.28}
 14%|███████████▋                                                                      | 2860/20117 [1:46:44<10:52:56,  2.27s/it] 14%|███████████▋                                                                      | 2861/20117 [1:46:46<10:47:38,  2.25s/it] 14%|███████████▋                                                                      | 2862/20117 [1:46:48<10:46:26,  2.25s/it] 14%|███████████▋                                                                      | 2863/20117 [1:46:50<10:46:02,  2.25s/it] 14%|███████████▋                                                                      | 2864/20117 [1:46:53<10:45:37,  2.25s/it] 14%|███████████▋                                                                      | 2865/20117 [1:46:55<10:41:18,  2.23s/it] 14%|███████████▋                                                                      | 2866/20117 [1:46:57<10:36:46,  2.21s/it] 14%|███████████▋                                                                      | 2867/20117 [1:46:59<10:36:29,  2.21s/it] 14%|███████████▋                                                                      | 2868/20117 [1:47:01<10:40:04,  2.23s/it] 14%|███████████▋                                                                      | 2869/20117 [1:47:04<10:37:19,  2.22s/it] 14%|███████████▋                                                                      | 2870/20117 [1:47:06<10:40:37,  2.23s/it]                                                                                                                                 {'loss': 0.3024, 'grad_norm': 0.3341462314128876, 'learning_rate': 0.00019070452257540638, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 389.19, 'epoch': 0.29}
 14%|███████████▋                                                                      | 2870/20117 [1:47:06<10:40:37,  2.23s/it] 14%|███████████▋                                                                      | 2871/20117 [1:47:08<10:41:13,  2.23s/it] 14%|███████████▋                                                                      | 2872/20117 [1:47:10<10:36:13,  2.21s/it] 14%|███████████▋                                                                      | 2873/20117 [1:47:13<10:34:51,  2.21s/it] 14%|███████████▋                                                                      | 2874/20117 [1:47:15<10:37:32,  2.22s/it] 14%|███████████▋                                                                      | 2875/20117 [1:47:17<10:30:11,  2.19s/it] 14%|███████████▋                                                                      | 2876/20117 [1:47:19<10:29:43,  2.19s/it] 14%|███████████▋                                                                      | 2877/20117 [1:47:21<10:29:51,  2.19s/it] 14%|███████████▋                                                                      | 2878/20117 [1:47:23<10:24:56,  2.18s/it] 14%|███████████▋                                                                      | 2879/20117 [1:47:26<10:28:23,  2.19s/it] 14%|███████████▋                                                                      | 2880/20117 [1:47:28<10:30:26,  2.19s/it]                                                                                                                                 {'loss': 0.2648, 'grad_norm': 0.3934953808784485, 'learning_rate': 0.00019063833131701744, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 333.15, 'epoch': 0.29}
 14%|███████████▋                                                                      | 2880/20117 [1:47:28<10:30:26,  2.19s/it] 14%|███████████▋                                                                      | 2881/20117 [1:47:30<10:27:35,  2.18s/it] 14%|███████████▋                                                                      | 2882/20117 [1:47:32<10:30:06,  2.19s/it] 14%|███████████▊                                                                      | 2883/20117 [1:47:34<10:27:27,  2.18s/it] 14%|███████████▊                                                                      | 2884/20117 [1:47:37<10:27:03,  2.18s/it] 14%|███████████▊                                                                      | 2885/20117 [1:47:39<10:30:19,  2.19s/it] 14%|███████████▊                                                                      | 2886/20117 [1:47:41<10:28:38,  2.19s/it] 14%|███████████▊                                                                      | 2887/20117 [1:47:43<10:33:47,  2.21s/it] 14%|███████████▊                                                                      | 2888/20117 [1:47:46<10:43:45,  2.24s/it] 14%|███████████▊                                                                      | 2889/20117 [1:47:48<10:39:30,  2.23s/it] 14%|███████████▊                                                                      | 2890/20117 [1:47:50<10:33:36,  2.21s/it]                                                                                                                                 {'loss': 0.2232, 'grad_norm': 0.3615911304950714, 'learning_rate': 0.00019057191679726162, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 326.24, 'epoch': 0.29}
 14%|███████████▊                                                                      | 2890/20117 [1:47:50<10:33:36,  2.21s/it] 14%|███████████▊                                                                      | 2891/20117 [1:47:52<10:34:01,  2.21s/it] 14%|███████████▊                                                                      | 2892/20117 [1:47:55<10:56:52,  2.29s/it] 14%|███████████▊                                                                      | 2893/20117 [1:47:57<10:55:03,  2.28s/it] 14%|███████████▊                                                                      | 2894/20117 [1:47:59<10:48:24,  2.26s/it] 14%|███████████▊                                                                      | 2895/20117 [1:48:01<10:55:32,  2.28s/it] 14%|███████████▊                                                                      | 2896/20117 [1:48:04<11:20:11,  2.37s/it] 14%|███████████▊                                                                      | 2897/20117 [1:48:06<11:18:55,  2.37s/it] 14%|███████████▊                                                                      | 2898/20117 [1:48:09<11:21:39,  2.38s/it] 14%|███████████▊                                                                      | 2899/20117 [1:48:11<11:15:05,  2.35s/it] 14%|███████████▊                                                                      | 2900/20117 [1:48:13<11:01:52,  2.31s/it]                                                                                                                                 {'loss': 0.2467, 'grad_norm': 0.45740652084350586, 'learning_rate': 0.00019050527917973192, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 399.74, 'epoch': 0.29}
 14%|███████████▊                                                                      | 2900/20117 [1:48:13<11:01:52,  2.31s/it] 14%|███████████▊                                                                      | 2901/20117 [1:48:15<10:56:48,  2.29s/it] 14%|███████████▊                                                                      | 2902/20117 [1:48:18<10:58:31,  2.30s/it] 14%|███████████▊                                                                      | 2903/20117 [1:48:20<10:53:48,  2.28s/it] 14%|███████████▊                                                                      | 2904/20117 [1:48:22<10:52:46,  2.28s/it] 14%|███████████▊                                                                      | 2905/20117 [1:48:25<10:48:57,  2.26s/it] 14%|███████████▊                                                                      | 2906/20117 [1:48:27<10:50:06,  2.27s/it] 14%|███████████▊                                                                      | 2907/20117 [1:48:29<10:49:18,  2.26s/it] 14%|███████████▊                                                                      | 2908/20117 [1:48:31<10:46:35,  2.25s/it] 14%|███████████▊                                                                      | 2909/20117 [1:48:34<10:49:49,  2.27s/it] 14%|███████████▊                                                                      | 2910/20117 [1:48:36<10:46:55,  2.26s/it]                                                                                                                                 {'loss': 0.292, 'grad_norm': 0.5232924818992615, 'learning_rate': 0.00019043841862857088, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.0, 'epoch': 0.29}
 14%|███████████▊                                                                      | 2910/20117 [1:48:36<10:46:55,  2.26s/it] 14%|███████████▊                                                                      | 2911/20117 [1:48:38<10:51:58,  2.27s/it] 14%|███████████▊                                                                      | 2912/20117 [1:48:40<10:47:42,  2.26s/it] 14%|███████████▊                                                                      | 2913/20117 [1:48:43<10:47:51,  2.26s/it] 14%|███████████▉                                                                      | 2914/20117 [1:48:45<10:55:29,  2.29s/it] 14%|███████████▉                                                                      | 2915/20117 [1:48:47<10:49:42,  2.27s/it] 14%|███████████▉                                                                      | 2916/20117 [1:48:49<10:42:54,  2.24s/it] 15%|███████████▉                                                                      | 2917/20117 [1:48:52<10:43:59,  2.25s/it] 15%|███████████▉                                                                      | 2918/20117 [1:48:54<10:43:11,  2.24s/it] 15%|███████████▉                                                                      | 2919/20117 [1:48:56<10:45:58,  2.25s/it] 15%|███████████▉                                                                      | 2920/20117 [1:48:58<10:50:32,  2.27s/it]                                                                                                                                 {'loss': 0.2618, 'grad_norm': 0.3617020547389984, 'learning_rate': 0.00019037133530847014, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.42, 'epoch': 0.29}
 15%|███████████▉                                                                      | 2920/20117 [1:48:58<10:50:32,  2.27s/it] 15%|███████████▉                                                                      | 2921/20117 [1:49:01<10:48:23,  2.26s/it] 15%|███████████▉                                                                      | 2922/20117 [1:49:03<10:47:55,  2.26s/it] 15%|███████████▉                                                                      | 2923/20117 [1:49:05<10:44:32,  2.25s/it] 15%|███████████▉                                                                      | 2924/20117 [1:49:07<10:46:13,  2.26s/it] 15%|███████████▉                                                                      | 2925/20117 [1:49:10<10:46:51,  2.26s/it] 15%|███████████▉                                                                      | 2926/20117 [1:49:12<10:51:09,  2.27s/it] 15%|███████████▉                                                                      | 2927/20117 [1:49:14<10:58:43,  2.30s/it] 15%|███████████▉                                                                      | 2928/20117 [1:49:17<10:54:05,  2.28s/it] 15%|███████████▉                                                                      | 2929/20117 [1:49:19<10:50:10,  2.27s/it] 15%|███████████▉                                                                      | 2930/20117 [1:49:21<10:47:03,  2.26s/it]                                                                                                                                 {'loss': 0.281, 'grad_norm': 0.4762667119503021, 'learning_rate': 0.00019030402938467013, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 398.66, 'epoch': 0.29}
 15%|███████████▉                                                                      | 2930/20117 [1:49:21<10:47:03,  2.26s/it] 15%|███████████▉                                                                      | 2931/20117 [1:49:23<10:41:47,  2.24s/it] 15%|███████████▉                                                                      | 2932/20117 [1:49:25<10:36:22,  2.22s/it] 15%|███████████▉                                                                      | 2933/20117 [1:49:28<10:34:47,  2.22s/it] 15%|███████████▉                                                                      | 2934/20117 [1:49:30<10:33:46,  2.21s/it] 15%|███████████▉                                                                      | 2935/20117 [1:49:32<10:44:28,  2.25s/it] 15%|███████████▉                                                                      | 2936/20117 [1:49:34<10:41:33,  2.24s/it] 15%|███████████▉                                                                      | 2937/20117 [1:49:37<10:42:15,  2.24s/it] 15%|███████████▉                                                                      | 2938/20117 [1:49:39<10:40:30,  2.24s/it] 15%|███████████▉                                                                      | 2939/20117 [1:49:41<10:43:31,  2.25s/it] 15%|███████████▉                                                                      | 2940/20117 [1:49:43<10:44:49,  2.25s/it]                                                                                                                                 {'loss': 0.2205, 'grad_norm': 0.5239278078079224, 'learning_rate': 0.00019023650102295957, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 317.01, 'epoch': 0.29}
 15%|███████████▉                                                                      | 2940/20117 [1:49:43<10:44:49,  2.25s/it] 15%|███████████▉                                                                      | 2941/20117 [1:49:46<11:08:58,  2.34s/it] 15%|███████████▉                                                                      | 2942/20117 [1:49:48<11:06:25,  2.33s/it] 15%|███████████▉                                                                      | 2943/20117 [1:49:50<10:57:22,  2.30s/it] 15%|████████████                                                                      | 2944/20117 [1:49:53<10:47:53,  2.26s/it] 15%|████████████                                                                      | 2945/20117 [1:49:55<10:37:38,  2.23s/it] 15%|████████████                                                                      | 2946/20117 [1:49:57<10:40:40,  2.24s/it] 15%|████████████                                                                      | 2947/20117 [1:49:59<10:50:54,  2.27s/it] 15%|████████████                                                                      | 2948/20117 [1:50:02<10:40:33,  2.24s/it] 15%|████████████                                                                      | 2949/20117 [1:50:04<11:12:57,  2.35s/it] 15%|████████████                                                                      | 2950/20117 [1:50:06<11:00:56,  2.31s/it]                                                                                                                                 {'loss': 0.2618, 'grad_norm': 0.38952499628067017, 'learning_rate': 0.00019016875038967507, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 356.78, 'epoch': 0.29}
 15%|████████████                                                                      | 2950/20117 [1:50:06<11:00:56,  2.31s/it] 15%|████████████                                                                      | 2951/20117 [1:50:09<10:51:58,  2.28s/it] 15%|████████████                                                                      | 2952/20117 [1:50:11<10:51:13,  2.28s/it] 15%|████████████                                                                      | 2953/20117 [1:50:13<10:45:29,  2.26s/it] 15%|████████████                                                                      | 2954/20117 [1:50:15<10:41:29,  2.24s/it] 15%|████████████                                                                      | 2955/20117 [1:50:17<10:35:05,  2.22s/it] 15%|████████████                                                                      | 2956/20117 [1:50:20<10:27:40,  2.19s/it] 15%|████████████                                                                      | 2957/20117 [1:50:22<10:28:07,  2.20s/it] 15%|████████████                                                                      | 2958/20117 [1:50:24<10:28:50,  2.20s/it] 15%|████████████                                                                      | 2959/20117 [1:50:26<10:24:47,  2.18s/it] 15%|████████████                                                                      | 2960/20117 [1:50:28<10:29:23,  2.20s/it]                                                                                                                                 {'loss': 0.2508, 'grad_norm': 0.4657343924045563, 'learning_rate': 0.00019010077765170072, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 364.11, 'epoch': 0.29}
 15%|████████████                                                                      | 2960/20117 [1:50:28<10:29:23,  2.20s/it] 15%|████████████                                                                      | 2961/20117 [1:50:31<10:27:19,  2.19s/it] 15%|████████████                                                                      | 2962/20117 [1:50:33<10:24:00,  2.18s/it] 15%|████████████                                                                      | 2963/20117 [1:50:35<10:46:32,  2.26s/it] 15%|████████████                                                                      | 2964/20117 [1:50:38<11:02:05,  2.32s/it] 15%|████████████                                                                      | 2965/20117 [1:50:40<10:57:55,  2.30s/it] 15%|████████████                                                                      | 2966/20117 [1:50:42<11:03:50,  2.32s/it] 15%|████████████                                                                      | 2967/20117 [1:50:45<10:56:37,  2.30s/it] 15%|████████████                                                                      | 2968/20117 [1:50:47<10:55:06,  2.29s/it] 15%|████████████                                                                      | 2969/20117 [1:50:49<10:47:57,  2.27s/it] 15%|████████████                                                                      | 2970/20117 [1:50:51<10:37:08,  2.23s/it]                                                                                                                                 {'loss': 0.2148, 'grad_norm': 0.4659232497215271, 'learning_rate': 0.0001900325829764678, 'memory/max_active (GiB)': 19.69, 'memory/max_allocated (GiB)': 19.69, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 387.16, 'epoch': 0.3}
 15%|████████████                                                                      | 2970/20117 [1:50:51<10:37:08,  2.23s/it] 15%|████████████                                                                      | 2971/20117 [1:50:53<10:36:32,  2.23s/it] 15%|████████████                                                                      | 2972/20117 [1:50:56<10:32:59,  2.22s/it] 15%|████████████                                                                      | 2973/20117 [1:50:58<10:33:57,  2.22s/it] 15%|████████████                                                                      | 2974/20117 [1:51:00<10:37:47,  2.23s/it] 15%|████████████▏                                                                     | 2975/20117 [1:51:02<10:39:13,  2.24s/it] 15%|████████████▏                                                                     | 2976/20117 [1:51:05<10:35:34,  2.22s/it] 15%|████████████▏                                                                     | 2977/20117 [1:51:07<10:34:45,  2.22s/it] 15%|████████████▏                                                                     | 2978/20117 [1:51:09<10:32:13,  2.21s/it] 15%|████████████▏                                                                     | 2979/20117 [1:51:11<10:41:33,  2.25s/it] 15%|████████████▏                                                                     | 2980/20117 [1:51:13<10:36:56,  2.23s/it]                                                                                                                                 {'loss': 0.2857, 'grad_norm': 0.5370404124259949, 'learning_rate': 0.0001899641665319542, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 315.91, 'epoch': 0.3}
 15%|████████████▏                                                                     | 2980/20117 [1:51:13<10:36:56,  2.23s/it] 15%|████████████▏                                                                     | 2981/20117 [1:51:16<10:34:21,  2.22s/it] 15%|████████████▏                                                                     | 2982/20117 [1:51:18<10:35:15,  2.22s/it] 15%|████████████▏                                                                     | 2983/20117 [1:51:20<10:34:51,  2.22s/it] 15%|████████████▏                                                                     | 2984/20117 [1:51:22<10:48:45,  2.27s/it] 15%|████████████▏                                                                     | 2985/20117 [1:51:25<10:46:47,  2.27s/it] 15%|████████████▏                                                                     | 2986/20117 [1:51:27<10:39:22,  2.24s/it] 15%|████████████▏                                                                     | 2987/20117 [1:51:29<10:46:20,  2.26s/it] 15%|████████████▏                                                                     | 2988/20117 [1:51:32<10:49:38,  2.28s/it] 15%|████████████▏                                                                     | 2989/20117 [1:51:34<10:46:01,  2.26s/it] 15%|████████████▏                                                                     | 2990/20117 [1:51:36<10:43:26,  2.25s/it]                                                                                                                                 {'loss': 0.2989, 'grad_norm': 0.3220022916793823, 'learning_rate': 0.00018989552848668406, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 394.41, 'epoch': 0.3}
 15%|████████████▏                                                                     | 2990/20117 [1:51:36<10:43:26,  2.25s/it] 15%|████████████▏                                                                     | 2991/20117 [1:51:38<10:39:32,  2.24s/it] 15%|████████████▏                                                                     | 2992/20117 [1:51:41<10:45:19,  2.26s/it] 15%|████████████▏                                                                     | 2993/20117 [1:51:43<10:47:03,  2.27s/it] 15%|████████████▏                                                                     | 2994/20117 [1:51:45<10:40:23,  2.24s/it] 15%|████████████▏                                                                     | 2995/20117 [1:51:47<10:38:51,  2.24s/it] 15%|████████████▏                                                                     | 2996/20117 [1:51:49<10:34:38,  2.22s/it] 15%|████████████▏                                                                     | 2997/20117 [1:51:52<10:33:07,  2.22s/it] 15%|████████████▏                                                                     | 2998/20117 [1:51:54<10:27:13,  2.20s/it] 15%|████████████▏                                                                     | 2999/20117 [1:51:56<10:29:39,  2.21s/it] 15%|████████████▏                                                                     | 3000/20117 [1:51:58<10:30:11,  2.21s/it]                                                                                                                                 {'loss': 0.2354, 'grad_norm': 0.38055410981178284, 'learning_rate': 0.0001898266690097274, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.33, 'epoch': 0.3}
 15%|████████████▏                                                                     | 3000/20117 [1:51:58<10:30:11,  2.21s/it] 15%|████████████▏                                                                     | 3001/20117 [1:52:00<10:33:28,  2.22s/it] 15%|████████████▏                                                                     | 3002/20117 [1:52:03<10:56:32,  2.30s/it] 15%|████████████▏                                                                     | 3003/20117 [1:52:05<10:56:34,  2.30s/it] 15%|████████████▏                                                                     | 3004/20117 [1:52:07<10:48:34,  2.27s/it] 15%|████████████▏                                                                     | 3005/20117 [1:52:10<10:47:57,  2.27s/it] 15%|████████████▎                                                                     | 3006/20117 [1:52:12<10:46:13,  2.27s/it] 15%|████████████▎                                                                     | 3007/20117 [1:52:14<10:44:09,  2.26s/it] 15%|████████████▎                                                                     | 3008/20117 [1:52:16<10:45:38,  2.26s/it] 15%|████████████▎                                                                     | 3009/20117 [1:52:19<10:47:52,  2.27s/it] 15%|████████████▎                                                                     | 3010/20117 [1:52:21<10:44:58,  2.26s/it]                                                                                                                                 {'loss': 0.2437, 'grad_norm': 0.36663955450057983, 'learning_rate': 0.00018975758827069968, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 326.18, 'epoch': 0.3}
 15%|████████████▎                                                                     | 3010/20117 [1:52:21<10:44:58,  2.26s/it] 15%|████████████▎                                                                     | 3011/20117 [1:52:23<10:42:17,  2.25s/it] 15%|████████████▎                                                                     | 3012/20117 [1:52:26<10:45:23,  2.26s/it] 15%|████████████▎                                                                     | 3013/20117 [1:52:28<10:40:15,  2.25s/it] 15%|████████████▎                                                                     | 3014/20117 [1:52:30<10:33:25,  2.22s/it] 15%|████████████▎                                                                     | 3015/20117 [1:52:32<10:38:56,  2.24s/it] 15%|████████████▎                                                                     | 3016/20117 [1:52:34<10:44:52,  2.26s/it] 15%|████████████▎                                                                     | 3017/20117 [1:52:37<10:44:57,  2.26s/it] 15%|████████████▎                                                                     | 3018/20117 [1:52:39<10:57:05,  2.31s/it] 15%|████████████▎                                                                     | 3019/20117 [1:52:41<10:53:33,  2.29s/it] 15%|████████████▎                                                                     | 3020/20117 [1:52:44<10:54:36,  2.30s/it]                                                                                                                                 {'loss': 0.2016, 'grad_norm': 0.3347165882587433, 'learning_rate': 0.00018968828643976135, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.38, 'epoch': 0.3}
 15%|████████████▎                                                                     | 3020/20117 [1:52:44<10:54:36,  2.30s/it] 15%|████████████▎                                                                     | 3021/20117 [1:52:46<10:59:56,  2.32s/it] 15%|████████████▎                                                                     | 3022/20117 [1:52:48<10:55:44,  2.30s/it] 15%|████████████▎                                                                     | 3023/20117 [1:52:51<10:54:47,  2.30s/it] 15%|████████████▎                                                                     | 3024/20117 [1:52:53<10:51:57,  2.29s/it] 15%|████████████▎                                                                     | 3025/20117 [1:52:55<10:47:04,  2.27s/it] 15%|████████████▎                                                                     | 3026/20117 [1:52:57<10:43:31,  2.26s/it] 15%|████████████▎                                                                     | 3027/20117 [1:53:00<10:45:25,  2.27s/it] 15%|████████████▎                                                                     | 3028/20117 [1:53:02<10:43:48,  2.26s/it] 15%|████████████▎                                                                     | 3029/20117 [1:53:04<10:50:11,  2.28s/it] 15%|████████████▎                                                                     | 3030/20117 [1:53:06<10:42:53,  2.26s/it]                                                                                                                                 {'loss': 0.2421, 'grad_norm': 0.4647526144981384, 'learning_rate': 0.0001896187636876175, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 366.11, 'epoch': 0.3}
 15%|████████████▎                                                                     | 3030/20117 [1:53:06<10:42:53,  2.26s/it] 15%|████████████▎                                                                     | 3031/20117 [1:53:09<10:40:16,  2.25s/it] 15%|████████████▎                                                                     | 3032/20117 [1:53:11<10:46:39,  2.27s/it] 15%|████████████▎                                                                     | 3033/20117 [1:53:13<10:48:43,  2.28s/it] 15%|████████████▎                                                                     | 3034/20117 [1:53:16<10:46:12,  2.27s/it] 15%|████████████▎                                                                     | 3035/20117 [1:53:18<10:45:50,  2.27s/it] 15%|████████████▍                                                                     | 3036/20117 [1:53:20<10:37:26,  2.24s/it] 15%|████████████▍                                                                     | 3037/20117 [1:53:22<10:32:57,  2.22s/it] 15%|████████████▍                                                                     | 3038/20117 [1:53:24<10:30:09,  2.21s/it] 15%|████████████▍                                                                     | 3039/20117 [1:53:27<10:36:18,  2.24s/it] 15%|████████████▍                                                                     | 3040/20117 [1:53:29<10:32:14,  2.22s/it]                                                                                                                                 {'loss': 0.28, 'grad_norm': 0.35668620467185974, 'learning_rate': 0.00018954902018551728, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 333.01, 'epoch': 0.3}
 15%|████████████▍                                                                     | 3040/20117 [1:53:29<10:32:14,  2.22s/it] 15%|████████████▍                                                                     | 3041/20117 [1:53:31<10:37:15,  2.24s/it] 15%|████████████▍                                                                     | 3042/20117 [1:53:33<10:35:55,  2.23s/it] 15%|████████████▍                                                                     | 3043/20117 [1:53:36<10:44:27,  2.26s/it] 15%|████████████▍                                                                     | 3044/20117 [1:53:38<10:47:19,  2.27s/it] 15%|████████████▍                                                                     | 3045/20117 [1:53:40<10:48:56,  2.28s/it] 15%|████████████▍                                                                     | 3046/20117 [1:53:43<10:50:13,  2.29s/it] 15%|████████████▍                                                                     | 3047/20117 [1:53:45<10:45:20,  2.27s/it] 15%|████████████▍                                                                     | 3048/20117 [1:53:47<10:46:03,  2.27s/it] 15%|████████████▍                                                                     | 3049/20117 [1:53:49<10:45:34,  2.27s/it] 15%|████████████▍                                                                     | 3050/20117 [1:53:52<10:38:36,  2.25s/it]                                                                                                                                 {'loss': 0.3329, 'grad_norm': 0.4070911109447479, 'learning_rate': 0.00018947905610525374, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 367.0, 'epoch': 0.3}
 15%|████████████▍                                                                     | 3050/20117 [1:53:52<10:38:36,  2.25s/it] 15%|████████████▍                                                                     | 3051/20117 [1:53:54<10:36:14,  2.24s/it] 15%|████████████▍                                                                     | 3052/20117 [1:53:56<10:34:38,  2.23s/it] 15%|████████████▍                                                                     | 3053/20117 [1:53:58<10:32:02,  2.22s/it] 15%|████████████▍                                                                     | 3054/20117 [1:54:00<10:31:21,  2.22s/it] 15%|████████████▍                                                                     | 3055/20117 [1:54:03<10:55:02,  2.30s/it] 15%|████████████▍                                                                     | 3056/20117 [1:54:05<11:02:24,  2.33s/it] 15%|████████████▍                                                                     | 3057/20117 [1:54:07<10:52:25,  2.29s/it] 15%|████████████▍                                                                     | 3058/20117 [1:54:10<10:47:55,  2.28s/it] 15%|████████████▍                                                                     | 3059/20117 [1:54:12<10:51:07,  2.29s/it] 15%|████████████▍                                                                     | 3060/20117 [1:54:14<10:45:48,  2.27s/it]                                                                                                                                 {'loss': 0.2532, 'grad_norm': 0.4529660642147064, 'learning_rate': 0.00018940887161916317, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 438.51, 'epoch': 0.3}
 15%|████████████▍                                                                     | 3060/20117 [1:54:14<10:45:48,  2.27s/it] 15%|████████████▍                                                                     | 3061/20117 [1:54:17<10:46:18,  2.27s/it] 15%|████████████▍                                                                     | 3062/20117 [1:54:19<10:44:25,  2.27s/it] 15%|████████████▍                                                                     | 3063/20117 [1:54:21<10:36:21,  2.24s/it] 15%|████████████▍                                                                     | 3064/20117 [1:54:23<10:32:56,  2.23s/it] 15%|████████████▍                                                                     | 3065/20117 [1:54:25<10:31:49,  2.22s/it] 15%|████████████▍                                                                     | 3066/20117 [1:54:28<10:29:48,  2.22s/it] 15%|████████████▌                                                                     | 3067/20117 [1:54:30<10:33:02,  2.23s/it] 15%|████████████▌                                                                     | 3068/20117 [1:54:32<10:29:24,  2.22s/it] 15%|████████████▌                                                                     | 3069/20117 [1:54:34<10:27:16,  2.21s/it] 15%|████████████▌                                                                     | 3070/20117 [1:54:36<10:31:41,  2.22s/it]                                                                                                                                 {'loss': 0.2717, 'grad_norm': 0.323428213596344, 'learning_rate': 0.0001893384669001248, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 304.06, 'epoch': 0.31}
 15%|████████████▌                                                                     | 3070/20117 [1:54:36<10:31:41,  2.22s/it] 15%|████████████▌                                                                     | 3071/20117 [1:54:39<10:27:37,  2.21s/it] 15%|████████████▌                                                                     | 3072/20117 [1:54:41<10:33:31,  2.23s/it] 15%|████████████▌                                                                     | 3073/20117 [1:54:43<10:31:49,  2.22s/it] 15%|████████████▌                                                                     | 3074/20117 [1:54:45<10:26:54,  2.21s/it] 15%|████████████▌                                                                     | 3075/20117 [1:54:48<10:29:33,  2.22s/it] 15%|████████████▌                                                                     | 3076/20117 [1:54:50<10:28:36,  2.21s/it] 15%|████████████▌                                                                     | 3077/20117 [1:54:52<10:26:24,  2.21s/it] 15%|████████████▌                                                                     | 3078/20117 [1:54:54<10:24:35,  2.20s/it] 15%|████████████▌                                                                     | 3079/20117 [1:54:56<10:24:10,  2.20s/it] 15%|████████████▌                                                                     | 3080/20117 [1:54:59<10:25:07,  2.20s/it]                                                                                                                                 {'loss': 0.2575, 'grad_norm': 0.33601540327072144, 'learning_rate': 0.00018926784212156038, 'memory/max_active (GiB)': 17.29, 'memory/max_allocated (GiB)': 17.29, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 315.78, 'epoch': 0.31}
 15%|████████████▌                                                                     | 3080/20117 [1:54:59<10:25:07,  2.20s/it] 15%|████████████▌                                                                     | 3081/20117 [1:55:01<10:29:08,  2.22s/it] 15%|████████████▌                                                                     | 3082/20117 [1:55:03<10:27:30,  2.21s/it] 15%|████████████▌                                                                     | 3083/20117 [1:55:05<10:31:18,  2.22s/it] 15%|████████████▌                                                                     | 3084/20117 [1:55:07<10:31:55,  2.23s/it] 15%|████████████▌                                                                     | 3085/20117 [1:55:10<10:29:49,  2.22s/it] 15%|████████████▌                                                                     | 3086/20117 [1:55:12<10:33:25,  2.23s/it] 15%|████████████▌                                                                     | 3087/20117 [1:55:14<10:33:38,  2.23s/it] 15%|████████████▌                                                                     | 3088/20117 [1:55:16<10:31:37,  2.23s/it] 15%|████████████▌                                                                     | 3089/20117 [1:55:19<10:29:55,  2.22s/it] 15%|████████████▌                                                                     | 3090/20117 [1:55:21<10:25:19,  2.20s/it]                                                                                                                                 {'loss': 0.2606, 'grad_norm': 0.4147079586982727, 'learning_rate': 0.0001891969974574336, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.02, 'epoch': 0.31}
 15%|████████████▌                                                                     | 3090/20117 [1:55:21<10:25:19,  2.20s/it] 15%|████████████▌                                                                     | 3091/20117 [1:55:23<10:22:56,  2.20s/it] 15%|████████████▌                                                                     | 3092/20117 [1:55:25<10:19:44,  2.18s/it] 15%|████████████▌                                                                     | 3093/20117 [1:55:27<10:20:29,  2.19s/it] 15%|████████████▌                                                                     | 3094/20117 [1:55:29<10:21:45,  2.19s/it] 15%|████████████▌                                                                     | 3095/20117 [1:55:32<10:17:36,  2.18s/it] 15%|████████████▌                                                                     | 3096/20117 [1:55:34<10:20:27,  2.19s/it] 15%|████████████▌                                                                     | 3097/20117 [1:55:36<10:21:20,  2.19s/it] 15%|████████████▋                                                                     | 3098/20117 [1:55:38<10:17:11,  2.18s/it] 15%|████████████▋                                                                     | 3099/20117 [1:55:40<10:14:44,  2.17s/it] 15%|████████████▋                                                                     | 3100/20117 [1:55:42<10:10:46,  2.15s/it]                                                                                                                                 {'loss': 0.2824, 'grad_norm': 0.5923524498939514, 'learning_rate': 0.00018912593308224987, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.95, 'epoch': 0.31}
 15%|████████████▋                                                                     | 3100/20117 [1:55:42<10:10:46,  2.15s/it] 15%|████████████▋                                                                     | 3101/20117 [1:55:45<10:13:59,  2.16s/it] 15%|████████████▋                                                                     | 3102/20117 [1:55:47<10:10:07,  2.15s/it] 15%|████████████▋                                                                     | 3103/20117 [1:55:49<10:16:23,  2.17s/it] 15%|████████████▋                                                                     | 3104/20117 [1:55:51<10:17:10,  2.18s/it] 15%|████████████▋                                                                     | 3105/20117 [1:55:53<10:14:40,  2.17s/it] 15%|████████████▋                                                                     | 3106/20117 [1:55:55<10:17:02,  2.18s/it] 15%|████████████▋                                                                     | 3107/20117 [1:55:58<10:23:03,  2.20s/it] 15%|████████████▋                                                                     | 3108/20117 [1:56:00<10:43:30,  2.27s/it] 15%|████████████▋                                                                     | 3109/20117 [1:56:02<10:37:03,  2.25s/it] 15%|████████████▋                                                                     | 3110/20117 [1:56:04<10:26:27,  2.21s/it]                                                                                                                                 {'loss': 0.3001, 'grad_norm': 0.2961815297603607, 'learning_rate': 0.00018905464917105577, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.31, 'epoch': 0.31}
 15%|████████████▋                                                                     | 3110/20117 [1:56:04<10:26:27,  2.21s/it] 15%|████████████▋                                                                     | 3111/20117 [1:56:07<10:18:31,  2.18s/it] 15%|████████████▋                                                                     | 3112/20117 [1:56:09<10:20:12,  2.19s/it] 15%|████████████▋                                                                     | 3113/20117 [1:56:11<10:19:38,  2.19s/it] 15%|████████████▋                                                                     | 3114/20117 [1:56:13<10:20:19,  2.19s/it] 15%|████████████▋                                                                     | 3115/20117 [1:56:15<10:20:43,  2.19s/it] 15%|████████████▋                                                                     | 3116/20117 [1:56:17<10:12:42,  2.16s/it] 15%|████████████▋                                                                     | 3117/20117 [1:56:20<10:15:00,  2.17s/it] 15%|████████████▋                                                                     | 3118/20117 [1:56:22<10:13:09,  2.16s/it] 16%|████████████▋                                                                     | 3119/20117 [1:56:24<10:10:18,  2.15s/it] 16%|████████████▋                                                                     | 3120/20117 [1:56:26<10:07:17,  2.14s/it]                                                                                                                                 {'loss': 0.2441, 'grad_norm': 0.2864610254764557, 'learning_rate': 0.00018898314589943862, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 383.33, 'epoch': 0.31}
 16%|████████████▋                                                                     | 3120/20117 [1:56:26<10:07:17,  2.14s/it] 16%|████████████▋                                                                     | 3121/20117 [1:56:28<10:09:59,  2.15s/it] 16%|████████████▋                                                                     | 3122/20117 [1:56:30<10:13:22,  2.17s/it] 16%|████████████▋                                                                     | 3123/20117 [1:56:33<10:12:54,  2.16s/it] 16%|████████████▋                                                                     | 3124/20117 [1:56:35<10:09:10,  2.15s/it] 16%|████████████▋                                                                     | 3125/20117 [1:56:37<10:15:30,  2.17s/it] 16%|████████████▋                                                                     | 3126/20117 [1:56:39<10:13:48,  2.17s/it] 16%|████████████▋                                                                     | 3127/20117 [1:56:41<10:13:54,  2.17s/it] 16%|████████████▊                                                                     | 3128/20117 [1:56:43<10:15:17,  2.17s/it] 16%|████████████▊                                                                     | 3129/20117 [1:56:46<10:16:33,  2.18s/it] 16%|████████████▊                                                                     | 3130/20117 [1:56:48<10:15:13,  2.17s/it]                                                                                                                                 {'loss': 0.2402, 'grad_norm': 0.28341177105903625, 'learning_rate': 0.00018891142344352611, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.6, 'epoch': 0.31}
 16%|████████████▊                                                                     | 3130/20117 [1:56:48<10:15:13,  2.17s/it] 16%|████████████▊                                                                     | 3131/20117 [1:56:50<10:14:04,  2.17s/it] 16%|████████████▊                                                                     | 3132/20117 [1:56:52<10:12:53,  2.17s/it] 16%|████████████▊                                                                     | 3133/20117 [1:56:54<10:14:00,  2.17s/it] 16%|████████████▊                                                                     | 3134/20117 [1:56:56<10:16:31,  2.18s/it] 16%|████████████▊                                                                     | 3135/20117 [1:56:59<10:15:35,  2.17s/it] 16%|████████████▊                                                                     | 3136/20117 [1:57:01<10:13:17,  2.17s/it] 16%|████████████▊                                                                     | 3137/20117 [1:57:03<10:14:53,  2.17s/it] 16%|████████████▊                                                                     | 3138/20117 [1:57:05<10:14:52,  2.17s/it] 16%|████████████▊                                                                     | 3139/20117 [1:57:07<10:17:25,  2.18s/it] 16%|████████████▊                                                                     | 3140/20117 [1:57:10<10:15:12,  2.17s/it]                                                                                                                                 {'loss': 0.2318, 'grad_norm': 0.3956068158149719, 'learning_rate': 0.0001888394819799858, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.71, 'epoch': 0.31}
 16%|████████████▊                                                                     | 3140/20117 [1:57:10<10:15:12,  2.17s/it] 16%|████████████▊                                                                     | 3141/20117 [1:57:12<10:15:32,  2.18s/it] 16%|████████████▊                                                                     | 3142/20117 [1:57:14<10:24:10,  2.21s/it] 16%|████████████▊                                                                     | 3143/20117 [1:57:16<10:22:04,  2.20s/it] 16%|████████████▊                                                                     | 3144/20117 [1:57:18<10:20:55,  2.19s/it] 16%|████████████▊                                                                     | 3145/20117 [1:57:21<10:17:48,  2.18s/it] 16%|████████████▊                                                                     | 3146/20117 [1:57:23<10:11:17,  2.16s/it] 16%|████████████▊                                                                     | 3147/20117 [1:57:25<10:18:45,  2.19s/it] 16%|████████████▊                                                                     | 3148/20117 [1:57:27<10:14:44,  2.17s/it] 16%|████████████▊                                                                     | 3149/20117 [1:57:29<10:11:47,  2.16s/it] 16%|████████████▊                                                                     | 3150/20117 [1:57:31<10:09:00,  2.15s/it]                                                                                                                                 {'loss': 0.2047, 'grad_norm': 0.3796366751194, 'learning_rate': 0.00018876732168602472, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 353.35, 'epoch': 0.31}
 16%|████████████▊                                                                     | 3150/20117 [1:57:31<10:09:00,  2.15s/it] 16%|████████████▊                                                                     | 3151/20117 [1:57:33<10:07:52,  2.15s/it] 16%|████████████▊                                                                     | 3152/20117 [1:57:36<10:09:53,  2.16s/it] 16%|████████████▊                                                                     | 3153/20117 [1:57:38<10:08:13,  2.15s/it] 16%|████████████▊                                                                     | 3154/20117 [1:57:40<10:07:28,  2.15s/it] 16%|████████████▊                                                                     | 3155/20117 [1:57:42<10:05:31,  2.14s/it] 16%|████████████▊                                                                     | 3156/20117 [1:57:44<10:04:09,  2.14s/it] 16%|████████████▊                                                                     | 3157/20117 [1:57:46<10:05:19,  2.14s/it] 16%|████████████▊                                                                     | 3158/20117 [1:57:48<10:03:42,  2.14s/it] 16%|████████████▉                                                                     | 3159/20117 [1:57:51<10:02:39,  2.13s/it] 16%|████████████▉                                                                     | 3160/20117 [1:57:53<10:01:01,  2.13s/it]                                                                                                                                 {'loss': 0.3128, 'grad_norm': 0.34916970133781433, 'learning_rate': 0.00018869494273938893, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 316.54, 'epoch': 0.31}
 16%|████████████▉                                                                     | 3160/20117 [1:57:53<10:01:01,  2.13s/it] 16%|█████████████                                                                      | 3161/20117 [1:57:55<9:58:58,  2.12s/it] 16%|████████████▉                                                                     | 3162/20117 [1:57:57<10:22:38,  2.20s/it] 16%|████████████▉                                                                     | 3163/20117 [1:57:59<10:15:24,  2.18s/it] 16%|████████████▉                                                                     | 3164/20117 [1:58:02<10:21:28,  2.20s/it] 16%|████████████▉                                                                     | 3165/20117 [1:58:04<10:19:50,  2.19s/it] 16%|████████████▉                                                                     | 3166/20117 [1:58:06<10:13:54,  2.17s/it] 16%|████████████▉                                                                     | 3167/20117 [1:58:08<10:12:06,  2.17s/it] 16%|████████████▉                                                                     | 3168/20117 [1:58:10<10:18:45,  2.19s/it] 16%|████████████▉                                                                     | 3169/20117 [1:58:12<10:16:04,  2.18s/it] 16%|████████████▉                                                                     | 3170/20117 [1:58:15<10:12:09,  2.17s/it]                                                                                                                                 {'loss': 0.3033, 'grad_norm': 0.3254728317260742, 'learning_rate': 0.00018862234531836307, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.71, 'epoch': 0.32}
 16%|████████████▉                                                                     | 3170/20117 [1:58:15<10:12:09,  2.17s/it] 16%|████████████▉                                                                     | 3171/20117 [1:58:17<10:10:13,  2.16s/it] 16%|████████████▉                                                                     | 3172/20117 [1:58:19<10:08:08,  2.15s/it] 16%|████████████▉                                                                     | 3173/20117 [1:58:21<10:10:57,  2.16s/it] 16%|████████████▉                                                                     | 3174/20117 [1:58:23<10:07:13,  2.15s/it] 16%|████████████▉                                                                     | 3175/20117 [1:58:26<10:28:37,  2.23s/it] 16%|████████████▉                                                                     | 3176/20117 [1:58:28<10:25:25,  2.22s/it] 16%|████████████▉                                                                     | 3177/20117 [1:58:30<10:24:35,  2.21s/it] 16%|████████████▉                                                                     | 3178/20117 [1:58:32<10:18:02,  2.19s/it] 16%|████████████▉                                                                     | 3179/20117 [1:58:34<10:11:58,  2.17s/it] 16%|████████████▉                                                                     | 3180/20117 [1:58:36<10:06:22,  2.15s/it]                                                                                                                                 {'loss': 0.2952, 'grad_norm': 0.43219825625419617, 'learning_rate': 0.0001885495296017699, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 314.3, 'epoch': 0.32}
 16%|████████████▉                                                                     | 3180/20117 [1:58:36<10:06:22,  2.15s/it] 16%|████████████▉                                                                     | 3181/20117 [1:58:38<10:03:46,  2.14s/it] 16%|████████████▉                                                                     | 3182/20117 [1:58:41<10:13:34,  2.17s/it] 16%|████████████▉                                                                     | 3183/20117 [1:58:43<10:16:54,  2.19s/it] 16%|████████████▉                                                                     | 3184/20117 [1:58:45<10:13:47,  2.17s/it] 16%|████████████▉                                                                     | 3185/20117 [1:58:47<10:10:39,  2.16s/it] 16%|████████████▉                                                                     | 3186/20117 [1:58:49<10:10:57,  2.17s/it] 16%|████████████▉                                                                     | 3187/20117 [1:58:51<10:13:47,  2.18s/it] 16%|████████████▉                                                                     | 3188/20117 [1:58:54<10:12:40,  2.17s/it] 16%|████████████▉                                                                     | 3189/20117 [1:58:56<10:16:44,  2.19s/it] 16%|█████████████                                                                     | 3190/20117 [1:58:58<10:21:16,  2.20s/it]                                                                                                                                 {'loss': 0.2087, 'grad_norm': 0.2606908082962036, 'learning_rate': 0.00018847649576897, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 345.1, 'epoch': 0.32}
 16%|█████████████                                                                     | 3190/20117 [1:58:58<10:21:16,  2.20s/it] 16%|█████████████                                                                     | 3191/20117 [1:59:00<10:14:54,  2.18s/it] 16%|█████████████                                                                     | 3192/20117 [1:59:02<10:19:33,  2.20s/it] 16%|█████████████                                                                     | 3193/20117 [1:59:05<10:17:46,  2.19s/it] 16%|█████████████                                                                     | 3194/20117 [1:59:07<10:13:45,  2.18s/it] 16%|█████████████                                                                     | 3195/20117 [1:59:09<10:13:16,  2.17s/it] 16%|█████████████                                                                     | 3196/20117 [1:59:11<10:21:41,  2.20s/it] 16%|█████████████                                                                     | 3197/20117 [1:59:13<10:17:24,  2.19s/it] 16%|█████████████                                                                     | 3198/20117 [1:59:16<10:13:11,  2.17s/it] 16%|█████████████                                                                     | 3199/20117 [1:59:18<10:11:34,  2.17s/it] 16%|█████████████                                                                     | 3200/20117 [1:59:20<10:12:00,  2.17s/it]                                                                                                                                 {'loss': 0.2665, 'grad_norm': 0.46498534083366394, 'learning_rate': 0.00018840324399986105, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 313.16, 'epoch': 0.32}
 16%|█████████████                                                                     | 3200/20117 [1:59:20<10:12:00,  2.17s/it] 16%|█████████████                                                                     | 3201/20117 [1:59:22<10:17:14,  2.19s/it] 16%|█████████████                                                                     | 3202/20117 [1:59:24<10:14:49,  2.18s/it] 16%|█████████████                                                                     | 3203/20117 [1:59:26<10:18:30,  2.19s/it] 16%|█████████████                                                                     | 3204/20117 [1:59:29<10:19:30,  2.20s/it] 16%|█████████████                                                                     | 3205/20117 [1:59:31<10:26:53,  2.22s/it] 16%|█████████████                                                                     | 3206/20117 [1:59:33<10:21:24,  2.20s/it] 16%|█████████████                                                                     | 3207/20117 [1:59:35<10:33:40,  2.25s/it] 16%|█████████████                                                                     | 3208/20117 [1:59:38<10:29:42,  2.23s/it] 16%|█████████████                                                                     | 3209/20117 [1:59:40<10:33:20,  2.25s/it] 16%|█████████████                                                                     | 3210/20117 [1:59:42<10:35:22,  2.25s/it]                                                                                                                                 {'loss': 0.2328, 'grad_norm': 0.35087594389915466, 'learning_rate': 0.00018832977447487772, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 321.23, 'epoch': 0.32}
 16%|█████████████                                                                     | 3210/20117 [1:59:42<10:35:22,  2.25s/it] 16%|█████████████                                                                     | 3211/20117 [1:59:44<10:28:45,  2.23s/it] 16%|█████████████                                                                     | 3212/20117 [1:59:47<10:28:30,  2.23s/it] 16%|█████████████                                                                     | 3213/20117 [1:59:49<10:25:10,  2.22s/it] 16%|█████████████                                                                     | 3214/20117 [1:59:51<10:30:48,  2.24s/it] 16%|█████████████                                                                     | 3215/20117 [1:59:54<11:05:00,  2.36s/it] 16%|█████████████                                                                     | 3216/20117 [1:59:56<11:05:45,  2.36s/it] 16%|█████████████                                                                     | 3217/20117 [1:59:58<11:02:41,  2.35s/it] 16%|█████████████                                                                     | 3218/20117 [2:00:01<11:01:13,  2.35s/it] 16%|█████████████                                                                     | 3219/20117 [2:00:03<10:53:06,  2.32s/it] 16%|█████████████▏                                                                    | 3220/20117 [2:00:05<10:51:38,  2.31s/it]                                                                                                                                 {'loss': 0.2233, 'grad_norm': 0.46294090151786804, 'learning_rate': 0.00018825608737499088, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 366.88, 'epoch': 0.32}
 16%|█████████████▏                                                                    | 3220/20117 [2:00:05<10:51:38,  2.31s/it] 16%|█████████████▏                                                                    | 3221/20117 [2:00:08<10:48:33,  2.30s/it] 16%|█████████████▏                                                                    | 3222/20117 [2:00:10<10:56:02,  2.33s/it] 16%|█████████████▏                                                                    | 3223/20117 [2:00:12<10:52:47,  2.32s/it] 16%|█████████████▏                                                                    | 3224/20117 [2:00:15<10:46:42,  2.30s/it] 16%|█████████████▏                                                                    | 3225/20117 [2:00:17<10:40:04,  2.27s/it] 16%|█████████████▏                                                                    | 3226/20117 [2:00:19<10:35:06,  2.26s/it] 16%|█████████████▏                                                                    | 3227/20117 [2:00:21<10:30:09,  2.24s/it] 16%|█████████████▏                                                                    | 3228/20117 [2:00:23<10:34:40,  2.25s/it] 16%|█████████████▏                                                                    | 3229/20117 [2:00:26<10:38:37,  2.27s/it] 16%|█████████████▏                                                                    | 3230/20117 [2:00:28<10:44:56,  2.29s/it]                                                                                                                                 {'loss': 0.2653, 'grad_norm': 0.35385075211524963, 'learning_rate': 0.00018818218288170753, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 350.06, 'epoch': 0.32}
 16%|█████████████▏                                                                    | 3230/20117 [2:00:28<10:44:56,  2.29s/it] 16%|█████████████▏                                                                    | 3231/20117 [2:00:30<10:44:50,  2.29s/it] 16%|█████████████▏                                                                    | 3232/20117 [2:00:33<10:48:19,  2.30s/it] 16%|█████████████▏                                                                    | 3233/20117 [2:00:35<10:53:35,  2.32s/it] 16%|█████████████▏                                                                    | 3234/20117 [2:00:38<10:58:52,  2.34s/it] 16%|█████████████▏                                                                    | 3235/20117 [2:00:40<11:13:31,  2.39s/it] 16%|█████████████▏                                                                    | 3236/20117 [2:00:42<11:06:52,  2.37s/it] 16%|█████████████▏                                                                    | 3237/20117 [2:00:45<10:56:54,  2.33s/it] 16%|█████████████▏                                                                    | 3238/20117 [2:00:47<10:53:38,  2.32s/it] 16%|█████████████▏                                                                    | 3239/20117 [2:00:49<10:53:56,  2.32s/it] 16%|█████████████▏                                                                    | 3240/20117 [2:00:52<10:50:29,  2.31s/it]                                                                                                                                 {'loss': 0.2921, 'grad_norm': 0.4959578812122345, 'learning_rate': 0.00018810806117706998, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 415.69, 'epoch': 0.32}
 16%|█████████████▏                                                                    | 3240/20117 [2:00:52<10:50:29,  2.31s/it] 16%|█████████████▏                                                                    | 3241/20117 [2:00:54<10:44:54,  2.29s/it] 16%|█████████████▏                                                                    | 3242/20117 [2:00:56<10:47:59,  2.30s/it] 16%|█████████████▏                                                                    | 3243/20117 [2:00:58<10:46:25,  2.30s/it] 16%|█████████████▏                                                                    | 3244/20117 [2:01:01<10:48:28,  2.31s/it] 16%|█████████████▏                                                                    | 3245/20117 [2:01:03<10:47:54,  2.30s/it] 16%|█████████████▏                                                                    | 3246/20117 [2:01:05<10:50:01,  2.31s/it] 16%|█████████████▏                                                                    | 3247/20117 [2:01:08<10:50:21,  2.31s/it] 16%|█████████████▏                                                                    | 3248/20117 [2:01:10<10:48:10,  2.31s/it] 16%|█████████████▏                                                                    | 3249/20117 [2:01:12<10:48:06,  2.31s/it] 16%|█████████████▏                                                                    | 3250/20117 [2:01:15<10:53:38,  2.33s/it]                                                                                                                                 {'loss': 0.3187, 'grad_norm': 0.4978230893611908, 'learning_rate': 0.0001880337224436557, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.08, 'epoch': 0.32}
 16%|█████████████▏                                                                    | 3250/20117 [2:01:15<10:53:38,  2.33s/it] 16%|█████████████▎                                                                    | 3251/20117 [2:01:17<10:52:44,  2.32s/it] 16%|█████████████▎                                                                    | 3252/20117 [2:01:19<10:55:42,  2.33s/it] 16%|█████████████▎                                                                    | 3253/20117 [2:01:22<10:55:28,  2.33s/it] 16%|█████████████▎                                                                    | 3254/20117 [2:01:24<10:53:19,  2.32s/it] 16%|█████████████▎                                                                    | 3255/20117 [2:01:26<10:48:11,  2.31s/it] 16%|█████████████▎                                                                    | 3256/20117 [2:01:28<10:46:11,  2.30s/it] 16%|█████████████▎                                                                    | 3257/20117 [2:01:31<10:44:43,  2.29s/it] 16%|█████████████▎                                                                    | 3258/20117 [2:01:33<10:44:57,  2.30s/it] 16%|█████████████▎                                                                    | 3259/20117 [2:01:35<10:44:48,  2.29s/it] 16%|█████████████▎                                                                    | 3260/20117 [2:01:38<10:43:31,  2.29s/it]                                                                                                                                 {'loss': 0.3013, 'grad_norm': 0.45267730951309204, 'learning_rate': 0.00018795916686457667, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 410.24, 'epoch': 0.32}
 16%|█████████████▎                                                                    | 3260/20117 [2:01:38<10:43:31,  2.29s/it] 16%|█████████████▎                                                                    | 3261/20117 [2:01:40<10:39:08,  2.28s/it] 16%|█████████████▎                                                                    | 3262/20117 [2:01:42<10:38:11,  2.27s/it] 16%|█████████████▎                                                                    | 3263/20117 [2:01:44<10:39:36,  2.28s/it] 16%|█████████████▎                                                                    | 3264/20117 [2:01:47<10:39:05,  2.28s/it] 16%|█████████████▎                                                                    | 3265/20117 [2:01:49<10:36:48,  2.27s/it] 16%|█████████████▎                                                                    | 3266/20117 [2:01:51<10:34:50,  2.26s/it] 16%|█████████████▎                                                                    | 3267/20117 [2:01:53<10:40:48,  2.28s/it] 16%|█████████████▎                                                                    | 3268/20117 [2:01:56<10:43:59,  2.29s/it] 16%|█████████████▎                                                                    | 3269/20117 [2:01:58<11:07:53,  2.38s/it] 16%|█████████████▎                                                                    | 3270/20117 [2:02:01<11:00:02,  2.35s/it]                                                                                                                                 {'loss': 0.2876, 'grad_norm': 0.3263493478298187, 'learning_rate': 0.00018788439462347908, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 446.75, 'epoch': 0.33}
 16%|█████████████▎                                                                    | 3270/20117 [2:02:01<11:00:02,  2.35s/it] 16%|█████████████▎                                                                    | 3271/20117 [2:02:03<10:54:06,  2.33s/it] 16%|█████████████▎                                                                    | 3272/20117 [2:02:05<10:46:59,  2.30s/it] 16%|█████████████▎                                                                    | 3273/20117 [2:02:07<10:40:58,  2.28s/it] 16%|█████████████▎                                                                    | 3274/20117 [2:02:10<10:37:20,  2.27s/it] 16%|█████████████▎                                                                    | 3275/20117 [2:02:12<10:47:33,  2.31s/it] 16%|█████████████▎                                                                    | 3276/20117 [2:02:15<11:06:32,  2.37s/it] 16%|█████████████▎                                                                    | 3277/20117 [2:02:17<10:56:20,  2.34s/it] 16%|█████████████▎                                                                    | 3278/20117 [2:02:19<10:49:06,  2.31s/it] 16%|█████████████▎                                                                    | 3279/20117 [2:02:21<10:46:46,  2.30s/it] 16%|█████████████▎                                                                    | 3280/20117 [2:02:24<11:13:46,  2.40s/it]                                                                                                                                 {'loss': 0.3504, 'grad_norm': 0.4769454896450043, 'learning_rate': 0.00018780940590454277, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 301.09, 'epoch': 0.33}
 16%|█████████████▎                                                                    | 3280/20117 [2:02:24<11:13:46,  2.40s/it] 16%|█████████████▎                                                                    | 3281/20117 [2:02:26<11:02:35,  2.36s/it] 16%|█████████████▍                                                                    | 3282/20117 [2:02:29<11:00:35,  2.35s/it] 16%|█████████████▍                                                                    | 3283/20117 [2:02:31<10:52:12,  2.32s/it] 16%|█████████████▍                                                                    | 3284/20117 [2:02:33<10:48:27,  2.31s/it] 16%|█████████████▍                                                                    | 3285/20117 [2:02:35<10:39:15,  2.28s/it] 16%|█████████████▍                                                                    | 3286/20117 [2:02:38<10:42:42,  2.29s/it] 16%|█████████████▍                                                                    | 3287/20117 [2:02:40<10:36:25,  2.27s/it] 16%|█████████████▍                                                                    | 3288/20117 [2:02:42<10:40:59,  2.29s/it] 16%|█████████████▍                                                                    | 3289/20117 [2:02:45<10:40:27,  2.28s/it] 16%|█████████████▍                                                                    | 3290/20117 [2:02:47<10:33:33,  2.26s/it]                                                                                                                                 {'loss': 0.2291, 'grad_norm': 0.21503253281116486, 'learning_rate': 0.00018773420089248074, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 375.18, 'epoch': 0.33}
 16%|█████████████▍                                                                    | 3290/20117 [2:02:47<10:33:33,  2.26s/it] 16%|█████████████▍                                                                    | 3291/20117 [2:02:49<10:29:59,  2.25s/it] 16%|█████████████▍                                                                    | 3292/20117 [2:02:51<10:30:42,  2.25s/it] 16%|█████████████▍                                                                    | 3293/20117 [2:02:53<10:30:30,  2.25s/it] 16%|█████████████▍                                                                    | 3294/20117 [2:02:56<10:36:49,  2.27s/it] 16%|█████████████▍                                                                    | 3295/20117 [2:02:58<10:31:42,  2.25s/it] 16%|█████████████▍                                                                    | 3296/20117 [2:03:00<10:35:41,  2.27s/it] 16%|█████████████▍                                                                    | 3297/20117 [2:03:03<10:40:30,  2.28s/it] 16%|█████████████▍                                                                    | 3298/20117 [2:03:05<10:39:49,  2.28s/it] 16%|█████████████▍                                                                    | 3299/20117 [2:03:07<10:42:21,  2.29s/it] 16%|█████████████▍                                                                    | 3300/20117 [2:03:09<10:43:39,  2.30s/it]                                                                                                                                 {'loss': 0.2674, 'grad_norm': 0.49010154604911804, 'learning_rate': 0.00018765877977253888, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 349.74, 'epoch': 0.33}
 16%|█████████████▍                                                                    | 3300/20117 [2:03:09<10:43:39,  2.30s/it] 16%|█████████████▍                                                                    | 3301/20117 [2:03:12<10:49:58,  2.32s/it] 16%|█████████████▍                                                                    | 3302/20117 [2:03:14<10:53:55,  2.33s/it] 16%|█████████████▍                                                                    | 3303/20117 [2:03:17<10:48:54,  2.32s/it] 16%|█████████████▍                                                                    | 3304/20117 [2:03:19<10:41:39,  2.29s/it] 16%|█████████████▍                                                                    | 3305/20117 [2:03:21<10:40:13,  2.28s/it] 16%|█████████████▍                                                                    | 3306/20117 [2:03:23<10:38:39,  2.28s/it] 16%|█████████████▍                                                                    | 3307/20117 [2:03:26<10:34:19,  2.26s/it] 16%|█████████████▍                                                                    | 3308/20117 [2:03:28<10:31:07,  2.25s/it] 16%|█████████████▍                                                                    | 3309/20117 [2:03:30<10:28:13,  2.24s/it] 16%|█████████████▍                                                                    | 3310/20117 [2:03:32<10:28:19,  2.24s/it]                                                                                                                                 {'loss': 0.2975, 'grad_norm': 0.3158112168312073, 'learning_rate': 0.00018758314273049532, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.25, 'epoch': 0.33}
 16%|█████████████▍                                                                    | 3310/20117 [2:03:32<10:28:19,  2.24s/it] 16%|█████████████▍                                                                    | 3311/20117 [2:03:34<10:30:11,  2.25s/it] 16%|█████████████▌                                                                    | 3312/20117 [2:03:37<10:29:55,  2.25s/it] 16%|█████████████▌                                                                    | 3313/20117 [2:03:39<10:32:13,  2.26s/it] 16%|█████████████▌                                                                    | 3314/20117 [2:03:41<10:37:20,  2.28s/it] 16%|█████████████▌                                                                    | 3315/20117 [2:03:44<10:33:37,  2.26s/it] 16%|█████████████▌                                                                    | 3316/20117 [2:03:46<10:35:18,  2.27s/it] 16%|█████████████▌                                                                    | 3317/20117 [2:03:48<10:35:51,  2.27s/it] 16%|█████████████▌                                                                    | 3318/20117 [2:03:50<10:31:04,  2.25s/it] 16%|█████████████▌                                                                    | 3319/20117 [2:03:53<10:33:03,  2.26s/it] 17%|█████████████▌                                                                    | 3320/20117 [2:03:55<10:35:41,  2.27s/it]                                                                                                                                 {'loss': 0.2618, 'grad_norm': 0.428846150636673, 'learning_rate': 0.0001875072899526601, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 284.44, 'epoch': 0.33}
 17%|█████████████▌                                                                    | 3320/20117 [2:03:55<10:35:41,  2.27s/it] 17%|█████████████▌                                                                    | 3321/20117 [2:03:58<11:10:34,  2.40s/it] 17%|█████████████▌                                                                    | 3322/20117 [2:04:00<11:00:38,  2.36s/it] 17%|█████████████▌                                                                    | 3323/20117 [2:04:02<10:51:39,  2.33s/it] 17%|█████████████▌                                                                    | 3324/20117 [2:04:04<10:50:44,  2.33s/it] 17%|█████████████▌                                                                    | 3325/20117 [2:04:07<10:48:39,  2.32s/it] 17%|█████████████▌                                                                    | 3326/20117 [2:04:09<10:41:23,  2.29s/it] 17%|█████████████▌                                                                    | 3327/20117 [2:04:11<10:41:36,  2.29s/it] 17%|█████████████▌                                                                    | 3328/20117 [2:04:13<10:36:14,  2.27s/it] 17%|█████████████▌                                                                    | 3329/20117 [2:04:16<10:38:08,  2.28s/it] 17%|█████████████▌                                                                    | 3330/20117 [2:04:18<10:34:21,  2.27s/it]                                                                                                                                 {'loss': 0.2926, 'grad_norm': 0.5033756494522095, 'learning_rate': 0.00018743122162587464, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 419.59, 'epoch': 0.33}
 17%|█████████████▌                                                                    | 3330/20117 [2:04:18<10:34:21,  2.27s/it] 17%|█████████████▌                                                                    | 3331/20117 [2:04:20<10:39:43,  2.29s/it] 17%|█████████████▌                                                                    | 3332/20117 [2:04:23<10:36:42,  2.28s/it] 17%|█████████████▌                                                                    | 3333/20117 [2:04:25<10:32:14,  2.26s/it] 17%|█████████████▌                                                                    | 3334/20117 [2:04:27<10:35:14,  2.27s/it] 17%|█████████████▌                                                                    | 3335/20117 [2:04:29<10:35:15,  2.27s/it] 17%|█████████████▌                                                                    | 3336/20117 [2:04:32<10:39:18,  2.29s/it] 17%|█████████████▌                                                                    | 3337/20117 [2:04:34<10:33:27,  2.27s/it] 17%|█████████████▌                                                                    | 3338/20117 [2:04:36<10:34:30,  2.27s/it] 17%|█████████████▌                                                                    | 3339/20117 [2:04:38<10:32:51,  2.26s/it] 17%|█████████████▌                                                                    | 3340/20117 [2:04:41<10:45:09,  2.31s/it]                                                                                                                                 {'loss': 0.2464, 'grad_norm': 0.17825426161289215, 'learning_rate': 0.0001873549379375113, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 342.16, 'epoch': 0.33}
 17%|█████████████▌                                                                    | 3340/20117 [2:04:41<10:45:09,  2.31s/it] 17%|█████████████▌                                                                    | 3341/20117 [2:04:43<10:42:10,  2.30s/it] 17%|█████████████▌                                                                    | 3342/20117 [2:04:45<10:37:14,  2.28s/it] 17%|█████████████▋                                                                    | 3343/20117 [2:04:48<10:38:32,  2.28s/it] 17%|█████████████▋                                                                    | 3344/20117 [2:04:50<10:40:35,  2.29s/it] 17%|█████████████▋                                                                    | 3345/20117 [2:04:52<10:40:15,  2.29s/it] 17%|█████████████▋                                                                    | 3346/20117 [2:04:55<10:39:30,  2.29s/it] 17%|█████████████▋                                                                    | 3347/20117 [2:04:57<10:43:15,  2.30s/it] 17%|█████████████▋                                                                    | 3348/20117 [2:04:59<10:47:38,  2.32s/it] 17%|█████████████▋                                                                    | 3349/20117 [2:05:01<10:43:26,  2.30s/it] 17%|█████████████▋                                                                    | 3350/20117 [2:05:04<10:50:22,  2.33s/it]                                                                                                                                 {'loss': 0.2729, 'grad_norm': 0.34919679164886475, 'learning_rate': 0.00018727843907547293, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 390.88, 'epoch': 0.33}
 17%|█████████████▋                                                                    | 3350/20117 [2:05:04<10:50:22,  2.33s/it] 17%|█████████████▋                                                                    | 3351/20117 [2:05:06<10:45:06,  2.31s/it] 17%|█████████████▋                                                                    | 3352/20117 [2:05:08<10:47:24,  2.32s/it] 17%|█████████████▋                                                                    | 3353/20117 [2:05:11<10:46:02,  2.31s/it] 17%|█████████████▋                                                                    | 3354/20117 [2:05:13<10:41:40,  2.30s/it] 17%|█████████████▋                                                                    | 3355/20117 [2:05:15<10:41:38,  2.30s/it] 17%|█████████████▋                                                                    | 3356/20117 [2:05:18<10:34:20,  2.27s/it] 17%|█████████████▋                                                                    | 3357/20117 [2:05:20<10:46:58,  2.32s/it] 17%|█████████████▋                                                                    | 3358/20117 [2:05:22<10:39:43,  2.29s/it] 17%|█████████████▋                                                                    | 3359/20117 [2:05:24<10:31:49,  2.26s/it] 17%|█████████████▋                                                                    | 3360/20117 [2:05:27<10:36:46,  2.28s/it]                                                                                                                                 {'loss': 0.3117, 'grad_norm': 0.37125077843666077, 'learning_rate': 0.00018720172522819243, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 394.73, 'epoch': 0.33}
 17%|█████████████▋                                                                    | 3360/20117 [2:05:27<10:36:46,  2.28s/it] 17%|█████████████▋                                                                    | 3361/20117 [2:05:29<10:35:22,  2.28s/it] 17%|█████████████▋                                                                    | 3362/20117 [2:05:31<10:34:47,  2.27s/it] 17%|█████████████▋                                                                    | 3363/20117 [2:05:34<10:36:42,  2.28s/it] 17%|█████████████▋                                                                    | 3364/20117 [2:05:36<10:32:56,  2.27s/it] 17%|█████████████▋                                                                    | 3365/20117 [2:05:38<10:31:25,  2.26s/it] 17%|█████████████▋                                                                    | 3366/20117 [2:05:40<10:30:35,  2.26s/it] 17%|█████████████▋                                                                    | 3367/20117 [2:05:43<10:30:32,  2.26s/it] 17%|█████████████▋                                                                    | 3368/20117 [2:05:45<10:30:21,  2.26s/it] 17%|█████████████▋                                                                    | 3369/20117 [2:05:47<10:32:20,  2.27s/it] 17%|█████████████▋                                                                    | 3370/20117 [2:05:49<10:35:19,  2.28s/it]                                                                                                                                 {'loss': 0.2702, 'grad_norm': 0.31769683957099915, 'learning_rate': 0.00018712479658463215, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 424.16, 'epoch': 0.34}
 17%|█████████████▋                                                                    | 3370/20117 [2:05:49<10:35:19,  2.28s/it] 17%|█████████████▋                                                                    | 3371/20117 [2:05:52<10:30:58,  2.26s/it] 17%|█████████████▋                                                                    | 3372/20117 [2:05:54<10:27:35,  2.25s/it] 17%|█████████████▋                                                                    | 3373/20117 [2:05:56<10:27:20,  2.25s/it] 17%|█████████████▊                                                                    | 3374/20117 [2:05:59<10:49:14,  2.33s/it] 17%|█████████████▊                                                                    | 3375/20117 [2:06:01<10:38:35,  2.29s/it] 17%|█████████████▊                                                                    | 3376/20117 [2:06:03<10:31:34,  2.26s/it] 17%|█████████████▊                                                                    | 3377/20117 [2:06:05<10:28:01,  2.25s/it] 17%|█████████████▊                                                                    | 3378/20117 [2:06:08<10:32:24,  2.27s/it] 17%|█████████████▊                                                                    | 3379/20117 [2:06:10<10:30:08,  2.26s/it] 17%|█████████████▊                                                                    | 3380/20117 [2:06:12<10:31:39,  2.26s/it]                                                                                                                                 {'loss': 0.1966, 'grad_norm': 0.398548424243927, 'learning_rate': 0.00018704765333428367, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 310.88, 'epoch': 0.34}
 17%|█████████████▊                                                                    | 3380/20117 [2:06:12<10:31:39,  2.26s/it] 17%|█████████████▊                                                                    | 3381/20117 [2:06:14<10:33:49,  2.27s/it] 17%|█████████████▊                                                                    | 3382/20117 [2:06:17<10:34:29,  2.27s/it] 17%|█████████████▊                                                                    | 3383/20117 [2:06:19<10:29:36,  2.26s/it] 17%|█████████████▊                                                                    | 3384/20117 [2:06:21<10:28:29,  2.25s/it] 17%|█████████████▊                                                                    | 3385/20117 [2:06:23<10:23:47,  2.24s/it] 17%|█████████████▊                                                                    | 3386/20117 [2:06:25<10:22:10,  2.23s/it] 17%|█████████████▊                                                                    | 3387/20117 [2:06:28<10:22:54,  2.23s/it] 17%|█████████████▊                                                                    | 3388/20117 [2:06:30<10:28:31,  2.25s/it] 17%|█████████████▊                                                                    | 3389/20117 [2:06:32<10:24:37,  2.24s/it] 17%|█████████████▊                                                                    | 3390/20117 [2:06:34<10:24:42,  2.24s/it]                                                                                                                                 {'loss': 0.2189, 'grad_norm': 0.24172648787498474, 'learning_rate': 0.00018697029566716705, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 311.63, 'epoch': 0.34}
 17%|█████████████▊                                                                    | 3390/20117 [2:06:34<10:24:42,  2.24s/it] 17%|█████████████▊                                                                    | 3391/20117 [2:06:37<10:26:32,  2.25s/it] 17%|█████████████▊                                                                    | 3392/20117 [2:06:39<10:39:35,  2.29s/it] 17%|█████████████▊                                                                    | 3393/20117 [2:06:42<10:54:02,  2.35s/it] 17%|█████████████▊                                                                    | 3394/20117 [2:06:44<11:03:31,  2.38s/it] 17%|█████████████▊                                                                    | 3395/20117 [2:06:46<10:56:39,  2.36s/it] 17%|█████████████▊                                                                    | 3396/20117 [2:06:49<10:44:15,  2.31s/it] 17%|█████████████▊                                                                    | 3397/20117 [2:06:51<10:41:29,  2.30s/it] 17%|█████████████▊                                                                    | 3398/20117 [2:06:53<10:30:53,  2.26s/it] 17%|█████████████▊                                                                    | 3399/20117 [2:06:55<10:26:15,  2.25s/it] 17%|█████████████▊                                                                    | 3400/20117 [2:06:57<10:26:27,  2.25s/it]                                                                                                                                 {'loss': 0.3093, 'grad_norm': 0.46132785081863403, 'learning_rate': 0.00018689272377383064, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 411.8, 'epoch': 0.34}
 17%|█████████████▊                                                                    | 3400/20117 [2:06:57<10:26:27,  2.25s/it] 17%|█████████████▊                                                                    | 3401/20117 [2:07:00<10:32:28,  2.27s/it] 17%|█████████████▊                                                                    | 3402/20117 [2:07:02<10:26:35,  2.25s/it] 17%|█████████████▊                                                                    | 3403/20117 [2:07:04<10:23:11,  2.24s/it] 17%|█████████████▉                                                                    | 3404/20117 [2:07:06<10:24:12,  2.24s/it] 17%|█████████████▉                                                                    | 3405/20117 [2:07:09<10:26:27,  2.25s/it] 17%|█████████████▉                                                                    | 3406/20117 [2:07:11<10:22:09,  2.23s/it] 17%|█████████████▉                                                                    | 3407/20117 [2:07:13<10:17:13,  2.22s/it] 17%|█████████████▉                                                                    | 3408/20117 [2:07:16<10:34:53,  2.28s/it] 17%|█████████████▉                                                                    | 3409/20117 [2:07:18<10:43:17,  2.31s/it] 17%|█████████████▉                                                                    | 3410/20117 [2:07:20<10:47:40,  2.33s/it]                                                                                                                                 {'loss': 0.2558, 'grad_norm': 0.3627679944038391, 'learning_rate': 0.00018681493784535036, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.21, 'epoch': 0.34}
 17%|█████████████▉                                                                    | 3410/20117 [2:07:20<10:47:40,  2.33s/it] 17%|█████████████▉                                                                    | 3411/20117 [2:07:23<10:43:36,  2.31s/it] 17%|█████████████▉                                                                    | 3412/20117 [2:07:25<10:49:45,  2.33s/it] 17%|█████████████▉                                                                    | 3413/20117 [2:07:27<10:48:51,  2.33s/it] 17%|█████████████▉                                                                    | 3414/20117 [2:07:30<10:44:03,  2.31s/it] 17%|█████████████▉                                                                    | 3415/20117 [2:07:32<10:43:21,  2.31s/it] 17%|█████████████▉                                                                    | 3416/20117 [2:07:34<10:42:15,  2.31s/it] 17%|█████████████▉                                                                    | 3417/20117 [2:07:36<10:39:20,  2.30s/it] 17%|█████████████▉                                                                    | 3418/20117 [2:07:39<10:29:11,  2.26s/it] 17%|█████████████▉                                                                    | 3419/20117 [2:07:41<10:26:48,  2.25s/it] 17%|█████████████▉                                                                    | 3420/20117 [2:07:43<10:22:19,  2.24s/it]                                                                                                                                 {'loss': 0.228, 'grad_norm': 1.1992244720458984, 'learning_rate': 0.00018673693807332945, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 354.78, 'epoch': 0.34}
 17%|█████████████▉                                                                    | 3420/20117 [2:07:43<10:22:19,  2.24s/it] 17%|█████████████▉                                                                    | 3421/20117 [2:07:45<10:25:36,  2.25s/it] 17%|█████████████▉                                                                    | 3422/20117 [2:07:48<10:31:17,  2.27s/it] 17%|█████████████▉                                                                    | 3423/20117 [2:07:50<10:30:21,  2.27s/it] 17%|█████████████▉                                                                    | 3424/20117 [2:07:52<10:28:06,  2.26s/it] 17%|█████████████▉                                                                    | 3425/20117 [2:07:54<10:29:23,  2.26s/it] 17%|█████████████▉                                                                    | 3426/20117 [2:07:57<10:26:38,  2.25s/it] 17%|█████████████▉                                                                    | 3427/20117 [2:07:59<10:37:10,  2.29s/it] 17%|█████████████▉                                                                    | 3428/20117 [2:08:02<11:04:35,  2.39s/it] 17%|█████████████▉                                                                    | 3429/20117 [2:08:04<10:53:55,  2.35s/it] 17%|█████████████▉                                                                    | 3430/20117 [2:08:06<10:54:06,  2.35s/it]                                                                                                                                 {'loss': 0.1874, 'grad_norm': 0.26419004797935486, 'learning_rate': 0.00018665872464989773, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.02, 'epoch': 0.34}
 17%|█████████████▉                                                                    | 3430/20117 [2:08:06<10:54:06,  2.35s/it] 17%|█████████████▉                                                                    | 3431/20117 [2:08:08<10:41:28,  2.31s/it] 17%|█████████████▉                                                                    | 3432/20117 [2:08:11<10:35:05,  2.28s/it] 17%|█████████████▉                                                                    | 3433/20117 [2:08:13<10:29:32,  2.26s/it] 17%|█████████████▉                                                                    | 3434/20117 [2:08:15<10:27:09,  2.26s/it] 17%|██████████████                                                                    | 3435/20117 [2:08:17<10:22:09,  2.24s/it] 17%|██████████████                                                                    | 3436/20117 [2:08:20<10:21:46,  2.24s/it] 17%|██████████████                                                                    | 3437/20117 [2:08:22<10:26:04,  2.25s/it] 17%|██████████████                                                                    | 3438/20117 [2:08:24<10:27:41,  2.26s/it] 17%|██████████████                                                                    | 3439/20117 [2:08:26<10:26:19,  2.25s/it] 17%|██████████████                                                                    | 3440/20117 [2:08:29<10:28:33,  2.26s/it]                                                                                                                                 {'loss': 0.2231, 'grad_norm': 0.3501751720905304, 'learning_rate': 0.00018658029776771152, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 358.64, 'epoch': 0.34}
 17%|██████████████                                                                    | 3440/20117 [2:08:29<10:28:33,  2.26s/it] 17%|██████████████                                                                    | 3441/20117 [2:08:31<10:27:11,  2.26s/it] 17%|██████████████                                                                    | 3442/20117 [2:08:33<10:26:27,  2.25s/it] 17%|██████████████                                                                    | 3443/20117 [2:08:35<10:21:23,  2.24s/it] 17%|██████████████                                                                    | 3444/20117 [2:08:38<10:22:20,  2.24s/it] 17%|██████████████                                                                    | 3445/20117 [2:08:40<10:21:46,  2.24s/it] 17%|██████████████                                                                    | 3446/20117 [2:08:42<10:26:47,  2.26s/it] 17%|██████████████                                                                    | 3447/20117 [2:08:44<10:25:57,  2.25s/it] 17%|██████████████                                                                    | 3448/20117 [2:08:47<10:20:54,  2.23s/it] 17%|██████████████                                                                    | 3449/20117 [2:08:49<10:18:18,  2.23s/it] 17%|██████████████                                                                    | 3450/20117 [2:08:51<10:21:51,  2.24s/it]                                                                                                                                 {'loss': 0.2456, 'grad_norm': 0.4123583137989044, 'learning_rate': 0.0001865016576199527, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 399.31, 'epoch': 0.34}
 17%|██████████████                                                                    | 3450/20117 [2:08:51<10:21:51,  2.24s/it] 17%|██████████████                                                                    | 3451/20117 [2:08:53<10:18:58,  2.23s/it] 17%|██████████████                                                                    | 3452/20117 [2:08:55<10:20:01,  2.23s/it] 17%|██████████████                                                                    | 3453/20117 [2:08:58<10:16:26,  2.22s/it] 17%|██████████████                                                                    | 3454/20117 [2:09:00<10:17:36,  2.22s/it] 17%|██████████████                                                                    | 3455/20117 [2:09:02<10:24:14,  2.25s/it] 17%|██████████████                                                                    | 3456/20117 [2:09:04<10:27:01,  2.26s/it] 17%|██████████████                                                                    | 3457/20117 [2:09:07<10:22:55,  2.24s/it] 17%|██████████████                                                                    | 3458/20117 [2:09:09<10:28:03,  2.26s/it] 17%|██████████████                                                                    | 3459/20117 [2:09:11<10:30:48,  2.27s/it] 17%|██████████████                                                                    | 3460/20117 [2:09:14<10:28:02,  2.26s/it]                                                                                                                                 {'loss': 0.2716, 'grad_norm': 0.4507691264152527, 'learning_rate': 0.00018642280440032863, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 384.1, 'epoch': 0.34}
 17%|██████████████                                                                    | 3460/20117 [2:09:14<10:28:02,  2.26s/it] 17%|██████████████                                                                    | 3461/20117 [2:09:16<10:23:27,  2.25s/it] 17%|██████████████                                                                    | 3462/20117 [2:09:18<10:19:17,  2.23s/it] 17%|██████████████                                                                    | 3463/20117 [2:09:20<10:16:04,  2.22s/it] 17%|██████████████                                                                    | 3464/20117 [2:09:22<10:17:38,  2.23s/it] 17%|██████████████                                                                    | 3465/20117 [2:09:25<10:19:51,  2.23s/it] 17%|██████████████▏                                                                   | 3466/20117 [2:09:27<10:22:16,  2.24s/it] 17%|██████████████▏                                                                   | 3467/20117 [2:09:29<10:31:31,  2.28s/it] 17%|██████████████▏                                                                   | 3468/20117 [2:09:31<10:28:13,  2.26s/it] 17%|██████████████▏                                                                   | 3469/20117 [2:09:34<10:24:51,  2.25s/it] 17%|██████████████▏                                                                   | 3470/20117 [2:09:36<10:25:50,  2.26s/it]                                                                                                                                 {'loss': 0.2352, 'grad_norm': 0.43500733375549316, 'learning_rate': 0.00018634373830307146, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.67, 'epoch': 0.34}
 17%|██████████████▏                                                                   | 3470/20117 [2:09:36<10:25:50,  2.26s/it] 17%|██████████████▏                                                                   | 3471/20117 [2:09:38<10:26:30,  2.26s/it] 17%|██████████████▏                                                                   | 3472/20117 [2:09:40<10:25:21,  2.25s/it] 17%|██████████████▏                                                                   | 3473/20117 [2:09:43<10:26:07,  2.26s/it] 17%|██████████████▏                                                                   | 3474/20117 [2:09:45<10:24:55,  2.25s/it] 17%|██████████████▏                                                                   | 3475/20117 [2:09:47<10:27:11,  2.26s/it] 17%|██████████████▏                                                                   | 3476/20117 [2:09:49<10:24:45,  2.25s/it] 17%|██████████████▏                                                                   | 3477/20117 [2:09:52<10:25:46,  2.26s/it] 17%|██████████████▏                                                                   | 3478/20117 [2:09:54<10:25:00,  2.25s/it] 17%|██████████████▏                                                                   | 3479/20117 [2:09:56<10:23:51,  2.25s/it] 17%|██████████████▏                                                                   | 3480/20117 [2:09:59<10:54:34,  2.36s/it]                                                                                                                                 {'loss': 0.2623, 'grad_norm': 0.40590760111808777, 'learning_rate': 0.00018626445952293766, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.37, 'epoch': 0.35}
 17%|██████████████▏                                                                   | 3480/20117 [2:09:59<10:54:34,  2.36s/it] 17%|██████████████▏                                                                   | 3481/20117 [2:10:01<10:42:07,  2.32s/it] 17%|██████████████▏                                                                   | 3482/20117 [2:10:03<10:36:45,  2.30s/it] 17%|██████████████▏                                                                   | 3483/20117 [2:10:06<10:37:09,  2.30s/it] 17%|██████████████▏                                                                   | 3484/20117 [2:10:08<10:32:18,  2.28s/it] 17%|██████████████▏                                                                   | 3485/20117 [2:10:10<10:35:09,  2.29s/it] 17%|██████████████▏                                                                   | 3486/20117 [2:10:12<10:34:14,  2.29s/it] 17%|██████████████▏                                                                   | 3487/20117 [2:10:15<10:34:41,  2.29s/it] 17%|██████████████▏                                                                   | 3488/20117 [2:10:17<10:28:59,  2.27s/it] 17%|██████████████▏                                                                   | 3489/20117 [2:10:19<10:32:01,  2.28s/it] 17%|██████████████▏                                                                   | 3490/20117 [2:10:22<10:30:15,  2.27s/it]                                                                                                                                 {'loss': 0.2245, 'grad_norm': 0.2494644969701767, 'learning_rate': 0.00018618496825520767, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.95, 'epoch': 0.35}
 17%|██████████████▏                                                                   | 3490/20117 [2:10:22<10:30:15,  2.27s/it] 17%|██████████████▏                                                                   | 3491/20117 [2:10:24<10:37:39,  2.30s/it] 17%|██████████████▏                                                                   | 3492/20117 [2:10:26<10:36:48,  2.30s/it] 17%|██████████████▏                                                                   | 3493/20117 [2:10:28<10:35:09,  2.29s/it] 17%|██████████████▏                                                                   | 3494/20117 [2:10:31<10:28:39,  2.27s/it] 17%|██████████████▏                                                                   | 3495/20117 [2:10:33<10:23:07,  2.25s/it] 17%|██████████████▎                                                                   | 3496/20117 [2:10:35<10:25:04,  2.26s/it] 17%|██████████████▎                                                                   | 3497/20117 [2:10:37<10:21:45,  2.24s/it] 17%|██████████████▎                                                                   | 3498/20117 [2:10:40<10:25:00,  2.26s/it] 17%|██████████████▎                                                                   | 3499/20117 [2:10:42<10:25:37,  2.26s/it] 17%|██████████████▎                                                                   | 3500/20117 [2:10:44<10:24:43,  2.26s/it]                                                                                                                                 {'loss': 0.2775, 'grad_norm': 0.47100207209587097, 'learning_rate': 0.00018610526469568526, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 393.01, 'epoch': 0.35}
 17%|██████████████▎                                                                   | 3500/20117 [2:10:44<10:24:43,  2.26s/it] 17%|██████████████▎                                                                   | 3501/20117 [2:10:46<10:20:15,  2.24s/it] 17%|██████████████▎                                                                   | 3502/20117 [2:10:49<10:15:52,  2.22s/it] 17%|██████████████▎                                                                   | 3503/20117 [2:10:51<10:18:55,  2.24s/it] 17%|██████████████▎                                                                   | 3504/20117 [2:10:53<10:16:25,  2.23s/it] 17%|██████████████▎                                                                   | 3505/20117 [2:10:55<10:26:03,  2.26s/it] 17%|██████████████▎                                                                   | 3506/20117 [2:10:58<10:23:51,  2.25s/it] 17%|██████████████▎                                                                   | 3507/20117 [2:11:00<10:28:17,  2.27s/it] 17%|██████████████▎                                                                   | 3508/20117 [2:11:03<11:02:32,  2.39s/it] 17%|██████████████▎                                                                   | 3509/20117 [2:11:05<11:16:35,  2.44s/it] 17%|██████████████▎                                                                   | 3510/20117 [2:11:07<10:57:19,  2.37s/it]                                                                                                                                 {'loss': 0.3007, 'grad_norm': 0.5600543022155762, 'learning_rate': 0.00018602534904069712, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 399.51, 'epoch': 0.35}
 17%|██████████████▎                                                                   | 3510/20117 [2:11:07<10:57:19,  2.37s/it] 17%|██████████████▎                                                                   | 3511/20117 [2:11:10<10:48:25,  2.34s/it] 17%|██████████████▎                                                                   | 3512/20117 [2:11:12<10:41:21,  2.32s/it] 17%|██████████████▎                                                                   | 3513/20117 [2:11:14<10:36:33,  2.30s/it] 17%|██████████████▎                                                                   | 3514/20117 [2:11:16<10:34:10,  2.29s/it] 17%|██████████████▎                                                                   | 3515/20117 [2:11:19<10:26:42,  2.26s/it] 17%|██████████████▎                                                                   | 3516/20117 [2:11:21<10:23:00,  2.25s/it] 17%|██████████████▎                                                                   | 3517/20117 [2:11:23<10:20:37,  2.24s/it] 17%|██████████████▎                                                                   | 3518/20117 [2:11:25<10:30:08,  2.28s/it] 17%|██████████████▎                                                                   | 3519/20117 [2:11:28<10:33:13,  2.29s/it] 17%|██████████████▎                                                                   | 3520/20117 [2:11:30<10:34:03,  2.29s/it]                                                                                                                                 {'loss': 0.2134, 'grad_norm': 0.25791943073272705, 'learning_rate': 0.00018594522148709244, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 341.75, 'epoch': 0.35}
 17%|██████████████▎                                                                   | 3520/20117 [2:11:30<10:34:03,  2.29s/it] 18%|██████████████▎                                                                   | 3521/20117 [2:11:32<10:36:48,  2.30s/it] 18%|██████████████▎                                                                   | 3522/20117 [2:11:35<10:31:43,  2.28s/it] 18%|██████████████▎                                                                   | 3523/20117 [2:11:37<10:30:00,  2.28s/it] 18%|██████████████▎                                                                   | 3524/20117 [2:11:39<10:29:23,  2.28s/it] 18%|██████████████▎                                                                   | 3525/20117 [2:11:41<10:31:43,  2.28s/it] 18%|██████████████▎                                                                   | 3526/20117 [2:11:44<10:32:19,  2.29s/it] 18%|██████████████▍                                                                   | 3527/20117 [2:11:46<10:31:03,  2.28s/it] 18%|██████████████▍                                                                   | 3528/20117 [2:11:48<10:30:22,  2.28s/it] 18%|██████████████▍                                                                   | 3529/20117 [2:11:51<10:32:00,  2.29s/it] 18%|██████████████▍                                                                   | 3530/20117 [2:11:53<10:30:42,  2.28s/it]                                                                                                                                 {'loss': 0.1919, 'grad_norm': 0.2849276661872864, 'learning_rate': 0.00018586488223224228, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.12, 'epoch': 0.35}
 18%|██████████████▍                                                                   | 3530/20117 [2:11:53<10:30:42,  2.28s/it] 18%|██████████████▍                                                                   | 3531/20117 [2:11:55<10:25:59,  2.26s/it] 18%|██████████████▍                                                                   | 3532/20117 [2:11:58<10:56:44,  2.38s/it] 18%|██████████████▍                                                                   | 3533/20117 [2:12:00<10:52:16,  2.36s/it] 18%|██████████████▍                                                                   | 3534/20117 [2:12:02<10:47:17,  2.34s/it] 18%|██████████████▍                                                                   | 3535/20117 [2:12:05<10:39:58,  2.32s/it] 18%|██████████████▍                                                                   | 3536/20117 [2:12:07<10:36:10,  2.30s/it] 18%|██████████████▍                                                                   | 3537/20117 [2:12:09<10:33:08,  2.29s/it] 18%|██████████████▍                                                                   | 3538/20117 [2:12:11<10:30:53,  2.28s/it] 18%|██████████████▍                                                                   | 3539/20117 [2:12:14<10:29:10,  2.28s/it] 18%|██████████████▍                                                                   | 3540/20117 [2:12:16<10:30:34,  2.28s/it]                                                                                                                                 {'loss': 0.2192, 'grad_norm': 0.32681041955947876, 'learning_rate': 0.00018578433147403925, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 309.14, 'epoch': 0.35}
 18%|██████████████▍                                                                   | 3540/20117 [2:12:16<10:30:34,  2.28s/it] 18%|██████████████▍                                                                   | 3541/20117 [2:12:18<10:32:14,  2.29s/it] 18%|██████████████▍                                                                   | 3542/20117 [2:12:21<10:33:59,  2.29s/it] 18%|██████████████▍                                                                   | 3543/20117 [2:12:23<10:36:13,  2.30s/it] 18%|██████████████▍                                                                   | 3544/20117 [2:12:25<10:31:38,  2.29s/it] 18%|██████████████▍                                                                   | 3545/20117 [2:12:27<10:35:30,  2.30s/it] 18%|██████████████▍                                                                   | 3546/20117 [2:12:30<10:40:39,  2.32s/it] 18%|██████████████▍                                                                   | 3547/20117 [2:12:32<10:44:30,  2.33s/it] 18%|██████████████▍                                                                   | 3548/20117 [2:12:35<10:42:01,  2.32s/it] 18%|██████████████▍                                                                   | 3549/20117 [2:12:37<10:34:42,  2.30s/it] 18%|██████████████▍                                                                   | 3550/20117 [2:12:39<10:39:45,  2.32s/it]                                                                                                                                 {'loss': 0.2775, 'grad_norm': 0.5691526532173157, 'learning_rate': 0.00018570356941089686, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.84, 'epoch': 0.35}
 18%|██████████████▍                                                                   | 3550/20117 [2:12:39<10:39:45,  2.32s/it] 18%|██████████████▍                                                                   | 3551/20117 [2:12:41<10:33:57,  2.30s/it] 18%|██████████████▍                                                                   | 3552/20117 [2:12:44<10:26:25,  2.27s/it] 18%|██████████████▍                                                                   | 3553/20117 [2:12:46<10:24:38,  2.26s/it] 18%|██████████████▍                                                                   | 3554/20117 [2:12:48<10:26:46,  2.27s/it] 18%|██████████████▍                                                                   | 3555/20117 [2:12:50<10:28:17,  2.28s/it] 18%|██████████████▍                                                                   | 3556/20117 [2:12:53<10:26:04,  2.27s/it] 18%|██████████████▍                                                                   | 3557/20117 [2:12:55<10:30:42,  2.29s/it] 18%|██████████████▌                                                                   | 3558/20117 [2:12:57<10:33:16,  2.29s/it] 18%|██████████████▌                                                                   | 3559/20117 [2:13:00<10:36:21,  2.31s/it] 18%|██████████████▌                                                                   | 3560/20117 [2:13:02<10:35:32,  2.30s/it]                                                                                                                                 {'loss': 0.2285, 'grad_norm': 0.35383832454681396, 'learning_rate': 0.00018562259624174915, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 341.0, 'epoch': 0.35}
 18%|██████████████▌                                                                   | 3560/20117 [2:13:02<10:35:32,  2.30s/it] 18%|██████████████▌                                                                   | 3561/20117 [2:13:04<10:29:56,  2.28s/it] 18%|██████████████▌                                                                   | 3562/20117 [2:13:07<10:38:17,  2.31s/it] 18%|██████████████▌                                                                   | 3563/20117 [2:13:09<10:37:31,  2.31s/it] 18%|██████████████▌                                                                   | 3564/20117 [2:13:11<10:39:33,  2.32s/it] 18%|██████████████▌                                                                   | 3565/20117 [2:13:14<10:47:08,  2.35s/it] 18%|██████████████▌                                                                   | 3566/20117 [2:13:16<10:35:12,  2.30s/it] 18%|██████████████▌                                                                   | 3567/20117 [2:13:18<10:29:18,  2.28s/it] 18%|██████████████▌                                                                   | 3568/20117 [2:13:20<10:26:39,  2.27s/it] 18%|██████████████▌                                                                   | 3569/20117 [2:13:23<10:28:28,  2.28s/it] 18%|██████████████▌                                                                   | 3570/20117 [2:13:25<10:24:14,  2.26s/it]                                                                                                                                 {'loss': 0.2216, 'grad_norm': 0.4128180742263794, 'learning_rate': 0.00018554141216605016, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 310.04, 'epoch': 0.35}
 18%|██████████████▌                                                                   | 3570/20117 [2:13:25<10:24:14,  2.26s/it] 18%|██████████████▌                                                                   | 3571/20117 [2:13:27<10:24:04,  2.26s/it] 18%|██████████████▌                                                                   | 3572/20117 [2:13:29<10:22:02,  2.26s/it] 18%|██████████████▌                                                                   | 3573/20117 [2:13:32<10:33:43,  2.30s/it] 18%|██████████████▌                                                                   | 3574/20117 [2:13:34<10:30:46,  2.29s/it] 18%|██████████████▌                                                                   | 3575/20117 [2:13:36<10:36:12,  2.31s/it] 18%|██████████████▌                                                                   | 3576/20117 [2:13:39<10:32:05,  2.29s/it] 18%|██████████████▌                                                                   | 3577/20117 [2:13:41<10:27:21,  2.28s/it] 18%|██████████████▌                                                                   | 3578/20117 [2:13:43<10:25:05,  2.27s/it] 18%|██████████████▌                                                                   | 3579/20117 [2:13:45<10:22:17,  2.26s/it] 18%|██████████████▌                                                                   | 3580/20117 [2:13:47<10:18:23,  2.24s/it]                                                                                                                                 {'loss': 0.3354, 'grad_norm': 0.4811583459377289, 'learning_rate': 0.00018546001738377338, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 446.97, 'epoch': 0.36}
 18%|██████████████▌                                                                   | 3580/20117 [2:13:47<10:18:23,  2.24s/it] 18%|██████████████▌                                                                   | 3581/20117 [2:13:50<10:15:58,  2.24s/it] 18%|██████████████▌                                                                   | 3582/20117 [2:13:52<10:14:42,  2.23s/it] 18%|██████████████▌                                                                   | 3583/20117 [2:13:54<10:17:16,  2.24s/it] 18%|██████████████▌                                                                   | 3584/20117 [2:13:57<10:38:36,  2.32s/it] 18%|██████████████▌                                                                   | 3585/20117 [2:13:59<10:34:12,  2.30s/it] 18%|██████████████▌                                                                   | 3586/20117 [2:14:01<10:29:05,  2.28s/it] 18%|██████████████▌                                                                   | 3587/20117 [2:14:03<10:25:14,  2.27s/it] 18%|██████████████▋                                                                   | 3588/20117 [2:14:06<10:21:41,  2.26s/it] 18%|██████████████▋                                                                   | 3589/20117 [2:14:08<10:12:45,  2.22s/it] 18%|██████████████▋                                                                   | 3590/20117 [2:14:10<10:07:43,  2.21s/it]                                                                                                                                 {'loss': 0.216, 'grad_norm': 0.4148445725440979, 'learning_rate': 0.0001853784120954114, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.38, 'epoch': 0.36}
 18%|██████████████▋                                                                   | 3590/20117 [2:14:10<10:07:43,  2.21s/it] 18%|██████████████▋                                                                   | 3591/20117 [2:14:12<10:06:29,  2.20s/it] 18%|██████████████▋                                                                   | 3592/20117 [2:14:14<10:02:15,  2.19s/it] 18%|██████████████▋                                                                   | 3593/20117 [2:14:17<10:03:42,  2.19s/it] 18%|██████████████▋                                                                   | 3594/20117 [2:14:19<10:05:44,  2.20s/it] 18%|██████████████▋                                                                   | 3595/20117 [2:14:21<10:15:55,  2.24s/it] 18%|██████████████▋                                                                   | 3596/20117 [2:14:23<10:22:47,  2.26s/it] 18%|██████████████▋                                                                   | 3597/20117 [2:14:26<10:37:01,  2.31s/it] 18%|██████████████▋                                                                   | 3598/20117 [2:14:28<10:40:26,  2.33s/it] 18%|██████████████▋                                                                   | 3599/20117 [2:14:30<10:40:08,  2.33s/it] 18%|██████████████▋                                                                   | 3600/20117 [2:14:33<10:38:47,  2.32s/it]                                                                                                                                 {'loss': 0.2762, 'grad_norm': 0.5083706378936768, 'learning_rate': 0.0001852965965019753, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 371.89, 'epoch': 0.36}
 18%|██████████████▋                                                                   | 3600/20117 [2:14:33<10:38:47,  2.32s/it] 18%|██████████████▋                                                                   | 3601/20117 [2:14:35<10:51:31,  2.37s/it] 18%|██████████████▋                                                                   | 3602/20117 [2:14:38<10:52:17,  2.37s/it] 18%|██████████████▋                                                                   | 3603/20117 [2:14:40<11:00:32,  2.40s/it] 18%|██████████████▋                                                                   | 3604/20117 [2:14:42<10:51:41,  2.37s/it] 18%|██████████████▋                                                                   | 3605/20117 [2:14:45<10:41:41,  2.33s/it] 18%|██████████████▋                                                                   | 3606/20117 [2:14:47<10:32:53,  2.30s/it] 18%|██████████████▋                                                                   | 3607/20117 [2:14:49<10:31:57,  2.30s/it] 18%|██████████████▋                                                                   | 3608/20117 [2:14:51<10:28:41,  2.28s/it] 18%|██████████████▋                                                                   | 3609/20117 [2:14:54<10:33:32,  2.30s/it] 18%|██████████████▋                                                                   | 3610/20117 [2:14:56<10:32:53,  2.30s/it]                                                                                                                                 {'loss': 0.2455, 'grad_norm': 0.4946528673171997, 'learning_rate': 0.00018521457080499418, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 361.51, 'epoch': 0.36}
 18%|██████████████▋                                                                   | 3610/20117 [2:14:56<10:32:53,  2.30s/it] 18%|██████████████▋                                                                   | 3611/20117 [2:14:58<10:32:18,  2.30s/it] 18%|██████████████▋                                                                   | 3612/20117 [2:15:01<10:42:27,  2.34s/it] 18%|██████████████▋                                                                   | 3613/20117 [2:15:03<10:37:32,  2.32s/it] 18%|██████████████▋                                                                   | 3614/20117 [2:15:05<10:36:31,  2.31s/it] 18%|██████████████▋                                                                   | 3615/20117 [2:15:08<10:33:49,  2.30s/it] 18%|██████████████▋                                                                   | 3616/20117 [2:15:10<10:32:08,  2.30s/it] 18%|██████████████▋                                                                   | 3617/20117 [2:15:12<10:29:10,  2.29s/it] 18%|██████████████▋                                                                   | 3618/20117 [2:15:14<10:24:00,  2.27s/it] 18%|██████████████▊                                                                   | 3619/20117 [2:15:17<10:28:16,  2.28s/it] 18%|██████████████▊                                                                   | 3620/20117 [2:15:19<10:27:58,  2.28s/it]                                                                                                                                 {'loss': 0.2299, 'grad_norm': 0.4880548417568207, 'learning_rate': 0.00018513233520651466, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 310.24, 'epoch': 0.36}
 18%|██████████████▊                                                                   | 3620/20117 [2:15:19<10:27:58,  2.28s/it] 18%|██████████████▊                                                                   | 3621/20117 [2:15:21<10:25:55,  2.28s/it] 18%|██████████████▊                                                                   | 3622/20117 [2:15:24<10:28:52,  2.29s/it] 18%|██████████████▊                                                                   | 3623/20117 [2:15:26<10:24:36,  2.27s/it] 18%|██████████████▊                                                                   | 3624/20117 [2:15:28<10:25:04,  2.27s/it] 18%|██████████████▊                                                                   | 3625/20117 [2:15:30<10:21:18,  2.26s/it] 18%|██████████████▊                                                                   | 3626/20117 [2:15:33<10:29:04,  2.29s/it] 18%|██████████████▊                                                                   | 3627/20117 [2:15:35<10:27:34,  2.28s/it] 18%|██████████████▊                                                                   | 3628/20117 [2:15:37<10:24:20,  2.27s/it] 18%|██████████████▊                                                                   | 3629/20117 [2:15:40<10:27:15,  2.28s/it] 18%|██████████████▊                                                                   | 3630/20117 [2:15:42<10:30:12,  2.29s/it]                                                                                                                                 {'loss': 0.2325, 'grad_norm': 0.18661662936210632, 'learning_rate': 0.00018504988990910036, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.45, 'epoch': 0.36}
 18%|██████████████▊                                                                   | 3630/20117 [2:15:42<10:30:12,  2.29s/it] 18%|██████████████▊                                                                   | 3631/20117 [2:15:44<10:26:01,  2.28s/it] 18%|██████████████▊                                                                   | 3632/20117 [2:15:46<10:26:01,  2.28s/it] 18%|██████████████▊                                                                   | 3633/20117 [2:15:49<10:26:44,  2.28s/it] 18%|██████████████▊                                                                   | 3634/20117 [2:15:51<10:27:02,  2.28s/it] 18%|██████████████▊                                                                   | 3635/20117 [2:15:53<10:21:40,  2.26s/it] 18%|██████████████▊                                                                   | 3636/20117 [2:15:55<10:28:04,  2.29s/it] 18%|██████████████▊                                                                   | 3637/20117 [2:15:58<10:59:29,  2.40s/it] 18%|██████████████▊                                                                   | 3638/20117 [2:16:00<10:53:20,  2.38s/it] 18%|██████████████▊                                                                   | 3639/20117 [2:16:03<10:43:40,  2.34s/it] 18%|██████████████▊                                                                   | 3640/20117 [2:16:05<10:37:20,  2.32s/it]                                                                                                                                 {'loss': 0.2312, 'grad_norm': 0.49652183055877686, 'learning_rate': 0.00018496723511583153, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.3, 'epoch': 0.36}
 18%|██████████████▊                                                                   | 3640/20117 [2:16:05<10:37:20,  2.32s/it] 18%|██████████████▊                                                                   | 3641/20117 [2:16:07<10:32:52,  2.30s/it] 18%|██████████████▊                                                                   | 3642/20117 [2:16:09<10:23:43,  2.27s/it] 18%|██████████████▊                                                                   | 3643/20117 [2:16:12<10:28:36,  2.29s/it] 18%|██████████████▊                                                                   | 3644/20117 [2:16:14<10:22:22,  2.27s/it] 18%|██████████████▊                                                                   | 3645/20117 [2:16:16<10:21:17,  2.26s/it] 18%|██████████████▊                                                                   | 3646/20117 [2:16:19<10:26:47,  2.28s/it] 18%|██████████████▊                                                                   | 3647/20117 [2:16:21<10:20:02,  2.26s/it] 18%|██████████████▊                                                                   | 3648/20117 [2:16:23<10:17:57,  2.25s/it] 18%|██████████████▊                                                                   | 3649/20117 [2:16:25<10:17:39,  2.25s/it] 18%|██████████████▉                                                                   | 3650/20117 [2:16:28<10:15:27,  2.24s/it]                                                                                                                                 {'loss': 0.154, 'grad_norm': 0.35343873500823975, 'learning_rate': 0.0001848843710303044, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.16, 'epoch': 0.36}
 18%|██████████████▉                                                                   | 3650/20117 [2:16:28<10:15:27,  2.24s/it] 18%|██████████████▉                                                                   | 3651/20117 [2:16:30<10:19:52,  2.26s/it] 18%|██████████████▉                                                                   | 3652/20117 [2:16:32<10:16:30,  2.25s/it] 18%|██████████████▉                                                                   | 3653/20117 [2:16:34<10:18:28,  2.25s/it] 18%|██████████████▉                                                                   | 3654/20117 [2:16:37<10:19:59,  2.26s/it] 18%|██████████████▉                                                                   | 3655/20117 [2:16:39<10:26:33,  2.28s/it] 18%|██████████████▉                                                                   | 3656/20117 [2:16:41<10:29:11,  2.29s/it] 18%|██████████████▉                                                                   | 3657/20117 [2:16:43<10:23:49,  2.27s/it] 18%|██████████████▉                                                                   | 3658/20117 [2:16:46<10:21:21,  2.27s/it] 18%|██████████████▉                                                                   | 3659/20117 [2:16:48<10:18:45,  2.26s/it] 18%|██████████████▉                                                                   | 3660/20117 [2:16:50<10:17:31,  2.25s/it]                                                                                                                                 {'loss': 0.2677, 'grad_norm': 0.5269297361373901, 'learning_rate': 0.0001848012978566307, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.01, 'epoch': 0.36}
 18%|██████████████▉                                                                   | 3660/20117 [2:16:50<10:17:31,  2.25s/it] 18%|██████████████▉                                                                   | 3661/20117 [2:16:52<10:14:22,  2.24s/it] 18%|██████████████▉                                                                   | 3662/20117 [2:16:55<10:12:40,  2.23s/it] 18%|██████████████▉                                                                   | 3663/20117 [2:16:57<10:16:18,  2.25s/it] 18%|██████████████▉                                                                   | 3664/20117 [2:16:59<10:14:49,  2.24s/it] 18%|██████████████▉                                                                   | 3665/20117 [2:17:01<10:15:20,  2.24s/it] 18%|██████████████▉                                                                   | 3666/20117 [2:17:04<10:18:33,  2.26s/it] 18%|██████████████▉                                                                   | 3667/20117 [2:17:06<10:20:42,  2.26s/it] 18%|██████████████▉                                                                   | 3668/20117 [2:17:08<10:19:24,  2.26s/it] 18%|██████████████▉                                                                   | 3669/20117 [2:17:10<10:21:46,  2.27s/it] 18%|██████████████▉                                                                   | 3670/20117 [2:17:13<10:29:45,  2.30s/it]                                                                                                                                 {'loss': 0.3083, 'grad_norm': 0.4809168875217438, 'learning_rate': 0.00018471801579943717, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.61, 'epoch': 0.36}
 18%|██████████████▉                                                                   | 3670/20117 [2:17:13<10:29:45,  2.30s/it] 18%|██████████████▉                                                                   | 3671/20117 [2:17:15<10:27:06,  2.29s/it] 18%|██████████████▉                                                                   | 3672/20117 [2:17:18<10:39:07,  2.33s/it] 18%|██████████████▉                                                                   | 3673/20117 [2:17:20<10:41:10,  2.34s/it] 18%|██████████████▉                                                                   | 3674/20117 [2:17:22<10:31:40,  2.30s/it] 18%|██████████████▉                                                                   | 3675/20117 [2:17:24<10:22:43,  2.27s/it] 18%|██████████████▉                                                                   | 3676/20117 [2:17:27<10:25:32,  2.28s/it] 18%|██████████████▉                                                                   | 3677/20117 [2:17:29<10:17:13,  2.25s/it] 18%|██████████████▉                                                                   | 3678/20117 [2:17:31<10:12:39,  2.24s/it] 18%|██████████████▉                                                                   | 3679/20117 [2:17:33<10:10:37,  2.23s/it] 18%|███████████████                                                                   | 3680/20117 [2:17:35<10:11:08,  2.23s/it]                                                                                                                                 {'loss': 0.2711, 'grad_norm': 0.4312402904033661, 'learning_rate': 0.0001846345250638652, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.63, 'epoch': 0.37}
 18%|███████████████                                                                   | 3680/20117 [2:17:35<10:11:08,  2.23s/it] 18%|███████████████                                                                   | 3681/20117 [2:17:38<10:14:20,  2.24s/it] 18%|███████████████                                                                   | 3682/20117 [2:17:40<10:15:20,  2.25s/it] 18%|███████████████                                                                   | 3683/20117 [2:17:42<10:13:36,  2.24s/it] 18%|███████████████                                                                   | 3684/20117 [2:17:44<10:12:31,  2.24s/it] 18%|███████████████                                                                   | 3685/20117 [2:17:47<10:09:25,  2.23s/it] 18%|███████████████                                                                   | 3686/20117 [2:17:49<10:11:11,  2.23s/it] 18%|███████████████                                                                   | 3687/20117 [2:17:51<10:14:03,  2.24s/it] 18%|███████████████                                                                   | 3688/20117 [2:17:53<10:16:05,  2.25s/it] 18%|███████████████                                                                   | 3689/20117 [2:17:56<10:20:38,  2.27s/it] 18%|███████████████                                                                   | 3690/20117 [2:17:58<10:17:03,  2.25s/it]                                                                                                                                 {'loss': 0.2629, 'grad_norm': 0.44654685258865356, 'learning_rate': 0.0001845508258555701, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.56, 'epoch': 0.37}
 18%|███████████████                                                                   | 3690/20117 [2:17:58<10:17:03,  2.25s/it] 18%|███████████████                                                                   | 3691/20117 [2:18:00<10:41:14,  2.34s/it] 18%|███████████████                                                                   | 3692/20117 [2:18:03<10:31:10,  2.31s/it] 18%|███████████████                                                                   | 3693/20117 [2:18:05<10:24:53,  2.28s/it] 18%|███████████████                                                                   | 3694/20117 [2:18:07<10:18:25,  2.26s/it] 18%|███████████████                                                                   | 3695/20117 [2:18:09<10:20:37,  2.27s/it] 18%|███████████████                                                                   | 3696/20117 [2:18:12<10:19:55,  2.27s/it] 18%|███████████████                                                                   | 3697/20117 [2:18:14<10:17:51,  2.26s/it] 18%|███████████████                                                                   | 3698/20117 [2:18:16<10:14:06,  2.24s/it] 18%|███████████████                                                                   | 3699/20117 [2:18:18<10:13:30,  2.24s/it] 18%|███████████████                                                                   | 3700/20117 [2:18:21<10:18:32,  2.26s/it]                                                                                                                                 {'loss': 0.2451, 'grad_norm': 0.19989164173603058, 'learning_rate': 0.00018446691838072067, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 316.7, 'epoch': 0.37}
 18%|███████████████                                                                   | 3700/20117 [2:18:21<10:18:32,  2.26s/it] 18%|███████████████                                                                   | 3701/20117 [2:18:23<10:13:45,  2.24s/it] 18%|███████████████                                                                   | 3702/20117 [2:18:25<10:11:54,  2.24s/it] 18%|███████████████                                                                   | 3703/20117 [2:18:27<10:11:26,  2.24s/it] 18%|███████████████                                                                   | 3704/20117 [2:18:30<10:12:36,  2.24s/it] 18%|███████████████                                                                   | 3705/20117 [2:18:32<10:15:32,  2.25s/it] 18%|███████████████                                                                   | 3706/20117 [2:18:34<10:16:55,  2.26s/it] 18%|███████████████                                                                   | 3707/20117 [2:18:36<10:23:26,  2.28s/it] 18%|███████████████                                                                   | 3708/20117 [2:18:39<10:20:18,  2.27s/it] 18%|███████████████                                                                   | 3709/20117 [2:18:41<10:19:03,  2.26s/it] 18%|███████████████                                                                   | 3710/20117 [2:18:43<10:28:44,  2.30s/it]                                                                                                                                 {'loss': 0.2172, 'grad_norm': 0.268655925989151, 'learning_rate': 0.00018438280284599877, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 270.99, 'epoch': 0.37}
 18%|███████████████                                                                   | 3710/20117 [2:18:43<10:28:44,  2.30s/it] 18%|███████████████▏                                                                  | 3711/20117 [2:18:46<10:27:30,  2.29s/it] 18%|███████████████▏                                                                  | 3712/20117 [2:18:48<10:21:49,  2.27s/it] 18%|███████████████▏                                                                  | 3713/20117 [2:18:50<10:20:01,  2.27s/it] 18%|███████████████▏                                                                  | 3714/20117 [2:18:52<10:18:38,  2.26s/it] 18%|███████████████▏                                                                  | 3715/20117 [2:18:55<10:15:44,  2.25s/it] 18%|███████████████▏                                                                  | 3716/20117 [2:18:57<10:15:04,  2.25s/it] 18%|███████████████▏                                                                  | 3717/20117 [2:18:59<10:13:04,  2.24s/it] 18%|███████████████▏                                                                  | 3718/20117 [2:19:01<10:14:46,  2.25s/it] 18%|███████████████▏                                                                  | 3719/20117 [2:19:04<10:20:11,  2.27s/it] 18%|███████████████▏                                                                  | 3720/20117 [2:19:06<10:26:43,  2.29s/it]                                                                                                                                 {'loss': 0.2505, 'grad_norm': 0.3657344579696655, 'learning_rate': 0.00018429847945859872, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.93, 'epoch': 0.37}
 18%|███████████████▏                                                                  | 3720/20117 [2:19:06<10:26:43,  2.29s/it] 18%|███████████████▏                                                                  | 3721/20117 [2:19:08<10:31:01,  2.31s/it] 19%|███████████████▏                                                                  | 3722/20117 [2:19:11<10:34:54,  2.32s/it] 19%|███████████████▏                                                                  | 3723/20117 [2:19:13<10:32:15,  2.31s/it] 19%|███████████████▏                                                                  | 3724/20117 [2:19:15<10:21:41,  2.28s/it] 19%|███████████████▏                                                                  | 3725/20117 [2:19:17<10:18:05,  2.26s/it] 19%|███████████████▏                                                                  | 3726/20117 [2:19:20<10:23:14,  2.28s/it] 19%|███████████████▏                                                                  | 3727/20117 [2:19:22<10:14:55,  2.25s/it] 19%|███████████████▏                                                                  | 3728/20117 [2:19:24<10:16:03,  2.26s/it] 19%|███████████████▏                                                                  | 3729/20117 [2:19:26<10:16:24,  2.26s/it] 19%|███████████████▏                                                                  | 3730/20117 [2:19:29<10:15:28,  2.25s/it]                                                                                                                                 {'loss': 0.2462, 'grad_norm': 0.40177711844444275, 'learning_rate': 0.00018421394842622695, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 344.24, 'epoch': 0.37}
 19%|███████████████▏                                                                  | 3730/20117 [2:19:29<10:15:28,  2.25s/it] 19%|███████████████▏                                                                  | 3731/20117 [2:19:31<10:16:21,  2.26s/it] 19%|███████████████▏                                                                  | 3732/20117 [2:19:33<10:14:39,  2.25s/it] 19%|███████████████▏                                                                  | 3733/20117 [2:19:35<10:10:58,  2.24s/it] 19%|███████████████▏                                                                  | 3734/20117 [2:19:38<10:08:12,  2.23s/it] 19%|███████████████▏                                                                  | 3735/20117 [2:19:40<10:10:01,  2.23s/it] 19%|███████████████▏                                                                  | 3736/20117 [2:19:42<10:10:09,  2.23s/it] 19%|███████████████▏                                                                  | 3737/20117 [2:19:44<10:12:44,  2.24s/it] 19%|███████████████▏                                                                  | 3738/20117 [2:19:47<10:19:54,  2.27s/it] 19%|███████████████▏                                                                  | 3739/20117 [2:19:49<10:13:43,  2.25s/it] 19%|███████████████▏                                                                  | 3740/20117 [2:19:51<10:11:03,  2.24s/it]                                                                                                                                 {'loss': 0.2827, 'grad_norm': 0.48767533898353577, 'learning_rate': 0.00018412920995710113, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.45, 'epoch': 0.37}
 19%|███████████████▏                                                                  | 3740/20117 [2:19:51<10:11:03,  2.24s/it] 19%|███████████████▏                                                                  | 3741/20117 [2:19:53<10:09:52,  2.23s/it] 19%|███████████████▎                                                                  | 3742/20117 [2:19:56<10:09:55,  2.23s/it] 19%|███████████████▎                                                                  | 3743/20117 [2:19:58<10:37:02,  2.33s/it] 19%|███████████████▎                                                                  | 3744/20117 [2:20:00<10:37:24,  2.34s/it] 19%|███████████████▎                                                                  | 3745/20117 [2:20:03<10:23:07,  2.28s/it] 19%|███████████████▎                                                                  | 3746/20117 [2:20:05<10:22:13,  2.28s/it] 19%|███████████████▎                                                                  | 3747/20117 [2:20:07<10:15:32,  2.26s/it] 19%|███████████████▎                                                                  | 3748/20117 [2:20:09<10:15:25,  2.26s/it] 19%|███████████████▎                                                                  | 3749/20117 [2:20:12<10:16:02,  2.26s/it] 19%|███████████████▎                                                                  | 3750/20117 [2:20:14<10:16:31,  2.26s/it]                                                                                                                                 {'loss': 0.2355, 'grad_norm': 0.45828619599342346, 'learning_rate': 0.00018404426425995007, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 388.78, 'epoch': 0.37}
 19%|███████████████▎                                                                  | 3750/20117 [2:20:14<10:16:31,  2.26s/it] 19%|███████████████▎                                                                  | 3751/20117 [2:20:16<10:14:17,  2.25s/it] 19%|███████████████▎                                                                  | 3752/20117 [2:20:18<10:16:17,  2.26s/it] 19%|███████████████▎                                                                  | 3753/20117 [2:20:21<10:13:17,  2.25s/it] 19%|███████████████▎                                                                  | 3754/20117 [2:20:23<10:23:20,  2.29s/it] 19%|███████████████▎                                                                  | 3755/20117 [2:20:25<10:17:12,  2.26s/it] 19%|███████████████▎                                                                  | 3756/20117 [2:20:27<10:12:21,  2.25s/it] 19%|███████████████▎                                                                  | 3757/20117 [2:20:30<10:12:05,  2.24s/it] 19%|███████████████▎                                                                  | 3758/20117 [2:20:32<10:08:41,  2.23s/it] 19%|███████████████▎                                                                  | 3759/20117 [2:20:34<10:07:53,  2.23s/it] 19%|███████████████▎                                                                  | 3760/20117 [2:20:36<10:08:48,  2.23s/it]                                                                                                                                 {'loss': 0.2813, 'grad_norm': 0.49931567907333374, 'learning_rate': 0.000183959111544013, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 361.55, 'epoch': 0.37}
 19%|███████████████▎                                                                  | 3760/20117 [2:20:36<10:08:48,  2.23s/it] 19%|███████████████▎                                                                  | 3761/20117 [2:20:39<10:06:52,  2.23s/it] 19%|███████████████▎                                                                  | 3762/20117 [2:20:41<10:09:27,  2.24s/it] 19%|███████████████▎                                                                  | 3763/20117 [2:20:43<10:06:30,  2.23s/it] 19%|███████████████▎                                                                  | 3764/20117 [2:20:45<10:11:52,  2.24s/it] 19%|███████████████▎                                                                  | 3765/20117 [2:20:48<10:11:07,  2.24s/it] 19%|███████████████▎                                                                  | 3766/20117 [2:20:50<10:10:26,  2.24s/it] 19%|███████████████▎                                                                  | 3767/20117 [2:20:52<10:15:16,  2.26s/it] 19%|███████████████▎                                                                  | 3768/20117 [2:20:54<10:19:20,  2.27s/it] 19%|███████████████▎                                                                  | 3769/20117 [2:20:57<10:15:41,  2.26s/it] 19%|███████████████▎                                                                  | 3770/20117 [2:20:59<10:21:10,  2.28s/it]                                                                                                                                 {'loss': 0.2488, 'grad_norm': 0.3232674300670624, 'learning_rate': 0.00018387375201903903, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 326.01, 'epoch': 0.37}
 19%|███████████████▎                                                                  | 3770/20117 [2:20:59<10:21:10,  2.28s/it] 19%|███████████████▎                                                                  | 3771/20117 [2:21:01<10:16:39,  2.26s/it] 19%|███████████████▍                                                                  | 3772/20117 [2:21:03<10:17:42,  2.27s/it] 19%|███████████████▍                                                                  | 3773/20117 [2:21:06<10:18:07,  2.27s/it] 19%|███████████████▍                                                                  | 3774/20117 [2:21:08<10:21:31,  2.28s/it] 19%|███████████████▍                                                                  | 3775/20117 [2:21:10<10:20:50,  2.28s/it] 19%|███████████████▍                                                                  | 3776/20117 [2:21:13<10:19:10,  2.27s/it] 19%|███████████████▍                                                                  | 3777/20117 [2:21:15<10:13:51,  2.25s/it] 19%|███████████████▍                                                                  | 3778/20117 [2:21:17<10:12:36,  2.25s/it] 19%|███████████████▍                                                                  | 3779/20117 [2:21:19<10:08:03,  2.23s/it] 19%|███████████████▍                                                                  | 3780/20117 [2:21:21<10:03:37,  2.22s/it]                                                                                                                                 {'loss': 0.3117, 'grad_norm': 0.41870149970054626, 'learning_rate': 0.0001837881858952867, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 356.89, 'epoch': 0.38}
 19%|███████████████▍                                                                  | 3780/20117 [2:21:21<10:03:37,  2.22s/it] 19%|███████████████▍                                                                  | 3781/20117 [2:21:24<10:01:37,  2.21s/it] 19%|███████████████▍                                                                  | 3782/20117 [2:21:26<10:00:45,  2.21s/it] 19%|███████████████▍                                                                  | 3783/20117 [2:21:28<10:12:22,  2.25s/it] 19%|███████████████▍                                                                  | 3784/20117 [2:21:30<10:16:51,  2.27s/it] 19%|███████████████▍                                                                  | 3785/20117 [2:21:33<10:20:14,  2.28s/it] 19%|███████████████▍                                                                  | 3786/20117 [2:21:35<10:24:50,  2.30s/it] 19%|███████████████▍                                                                  | 3787/20117 [2:21:37<10:27:52,  2.31s/it] 19%|███████████████▍                                                                  | 3788/20117 [2:21:40<10:28:19,  2.31s/it] 19%|███████████████▍                                                                  | 3789/20117 [2:21:42<10:32:46,  2.33s/it] 19%|███████████████▍                                                                  | 3790/20117 [2:21:44<10:33:40,  2.33s/it]                                                                                                                                 {'loss': 0.3046, 'grad_norm': 0.396383672952652, 'learning_rate': 0.00018370241338352348, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 397.36, 'epoch': 0.38}
 19%|███████████████▍                                                                  | 3790/20117 [2:21:44<10:33:40,  2.33s/it] 19%|███████████████▍                                                                  | 3791/20117 [2:21:47<10:32:54,  2.33s/it] 19%|███████████████▍                                                                  | 3792/20117 [2:21:49<10:30:51,  2.32s/it] 19%|███████████████▍                                                                  | 3793/20117 [2:21:51<10:32:12,  2.32s/it] 19%|███████████████▍                                                                  | 3794/20117 [2:21:54<10:32:54,  2.33s/it] 19%|███████████████▍                                                                  | 3795/20117 [2:21:56<10:29:56,  2.32s/it] 19%|███████████████▍                                                                  | 3796/20117 [2:21:58<10:45:12,  2.37s/it] 19%|███████████████▍                                                                  | 3797/20117 [2:22:01<10:36:11,  2.34s/it] 19%|███████████████▍                                                                  | 3798/20117 [2:22:03<10:27:47,  2.31s/it] 19%|███████████████▍                                                                  | 3799/20117 [2:22:05<10:31:35,  2.32s/it] 19%|███████████████▍                                                                  | 3800/20117 [2:22:08<10:32:36,  2.33s/it]                                                                                                                                 {'loss': 0.2074, 'grad_norm': 0.33363988995552063, 'learning_rate': 0.00018361643469502517, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 320.36, 'epoch': 0.38}
 19%|███████████████▍                                                                  | 3800/20117 [2:22:08<10:32:36,  2.33s/it] 19%|███████████████▍                                                                  | 3801/20117 [2:22:10<10:24:52,  2.30s/it] 19%|███████████████▍                                                                  | 3802/20117 [2:22:12<10:18:58,  2.28s/it] 19%|███████████████▌                                                                  | 3803/20117 [2:22:14<10:16:28,  2.27s/it] 19%|███████████████▌                                                                  | 3804/20117 [2:22:17<10:14:28,  2.26s/it] 19%|███████████████▌                                                                  | 3805/20117 [2:22:19<10:16:27,  2.27s/it] 19%|███████████████▌                                                                  | 3806/20117 [2:22:21<10:12:32,  2.25s/it] 19%|███████████████▌                                                                  | 3807/20117 [2:22:23<10:17:13,  2.27s/it] 19%|███████████████▌                                                                  | 3808/20117 [2:22:26<10:12:17,  2.25s/it] 19%|███████████████▌                                                                  | 3809/20117 [2:22:28<10:11:15,  2.25s/it] 19%|███████████████▌                                                                  | 3810/20117 [2:22:30<10:09:15,  2.24s/it]                                                                                                                                 {'loss': 0.2449, 'grad_norm': 0.34591570496559143, 'learning_rate': 0.00018353025004157552, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 370.18, 'epoch': 0.38}
 19%|███████████████▌                                                                  | 3810/20117 [2:22:30<10:09:15,  2.24s/it] 19%|███████████████▌                                                                  | 3811/20117 [2:22:32<10:08:17,  2.24s/it] 19%|███████████████▌                                                                  | 3812/20117 [2:22:35<10:09:39,  2.24s/it] 19%|███████████████▌                                                                  | 3813/20117 [2:22:37<10:05:45,  2.23s/it] 19%|███████████████▌                                                                  | 3814/20117 [2:22:39<10:08:22,  2.24s/it] 19%|███████████████▌                                                                  | 3815/20117 [2:22:41<10:07:40,  2.24s/it] 19%|███████████████▌                                                                  | 3816/20117 [2:22:44<10:10:42,  2.25s/it] 19%|███████████████▌                                                                  | 3817/20117 [2:22:46<10:17:42,  2.27s/it] 19%|███████████████▌                                                                  | 3818/20117 [2:22:48<10:15:46,  2.27s/it] 19%|███████████████▌                                                                  | 3819/20117 [2:22:50<10:12:51,  2.26s/it] 19%|███████████████▌                                                                  | 3820/20117 [2:22:53<10:11:22,  2.25s/it]                                                                                                                                 {'loss': 0.2017, 'grad_norm': 0.4369080066680908, 'learning_rate': 0.00018344385963546547, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 346.43, 'epoch': 0.38}
 19%|███████████████▌                                                                  | 3820/20117 [2:22:53<10:11:22,  2.25s/it] 19%|███████████████▌                                                                  | 3821/20117 [2:22:55<10:06:48,  2.23s/it] 19%|███████████████▌                                                                  | 3822/20117 [2:22:57<10:06:28,  2.23s/it] 19%|███████████████▌                                                                  | 3823/20117 [2:22:59<10:05:28,  2.23s/it] 19%|███████████████▌                                                                  | 3824/20117 [2:23:01<10:04:30,  2.23s/it] 19%|███████████████▌                                                                  | 3825/20117 [2:23:04<10:09:05,  2.24s/it] 19%|███████████████▌                                                                  | 3826/20117 [2:23:06<10:08:16,  2.24s/it] 19%|███████████████▌                                                                  | 3827/20117 [2:23:08<10:02:58,  2.22s/it] 19%|███████████████▌                                                                  | 3828/20117 [2:23:10<10:04:16,  2.23s/it] 19%|███████████████▌                                                                  | 3829/20117 [2:23:13<10:11:02,  2.25s/it] 19%|███████████████▌                                                                  | 3830/20117 [2:23:15<10:07:24,  2.24s/it]                                                                                                                                 {'loss': 0.2987, 'grad_norm': 0.4190782308578491, 'learning_rate': 0.00018335726368949286, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.05, 'epoch': 0.38}
 19%|███████████████▌                                                                  | 3830/20117 [2:23:15<10:07:24,  2.24s/it] 19%|███████████████▌                                                                  | 3831/20117 [2:23:17<10:10:28,  2.25s/it] 19%|███████████████▌                                                                  | 3832/20117 [2:23:19<10:11:34,  2.25s/it] 19%|███████████████▌                                                                  | 3833/20117 [2:23:22<10:14:04,  2.26s/it] 19%|███████████████▋                                                                  | 3834/20117 [2:23:24<10:12:46,  2.26s/it] 19%|███████████████▋                                                                  | 3835/20117 [2:23:26<10:10:00,  2.25s/it] 19%|███████████████▋                                                                  | 3836/20117 [2:23:29<10:14:23,  2.26s/it] 19%|███████████████▋                                                                  | 3837/20117 [2:23:31<10:19:12,  2.28s/it] 19%|███████████████▋                                                                  | 3838/20117 [2:23:33<10:20:20,  2.29s/it] 19%|███████████████▋                                                                  | 3839/20117 [2:23:35<10:16:51,  2.27s/it] 19%|███████████████▋                                                                  | 3840/20117 [2:23:38<10:13:22,  2.26s/it]                                                                                                                                 {'loss': 0.2992, 'grad_norm': 0.4989373981952667, 'learning_rate': 0.00018327046241696184, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.52, 'epoch': 0.38}
 19%|███████████████▋                                                                  | 3840/20117 [2:23:38<10:13:22,  2.26s/it] 19%|███████████████▋                                                                  | 3841/20117 [2:23:40<10:13:10,  2.26s/it] 19%|███████████████▋                                                                  | 3842/20117 [2:23:42<10:15:17,  2.27s/it] 19%|███████████████▋                                                                  | 3843/20117 [2:23:44<10:17:35,  2.28s/it] 19%|███████████████▋                                                                  | 3844/20117 [2:23:47<10:31:21,  2.33s/it] 19%|███████████████▋                                                                  | 3845/20117 [2:23:49<10:37:32,  2.35s/it] 19%|███████████████▋                                                                  | 3846/20117 [2:23:52<10:32:12,  2.33s/it] 19%|███████████████▋                                                                  | 3847/20117 [2:23:54<10:33:56,  2.34s/it] 19%|███████████████▋                                                                  | 3848/20117 [2:23:56<10:42:00,  2.37s/it] 19%|███████████████▋                                                                  | 3849/20117 [2:23:59<10:34:03,  2.34s/it] 19%|███████████████▋                                                                  | 3850/20117 [2:24:01<11:07:28,  2.46s/it]                                                                                                                                 {'loss': 0.2311, 'grad_norm': 0.3604322671890259, 'learning_rate': 0.00018318345603168226, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 286.25, 'epoch': 0.38}
 19%|███████████████▋                                                                  | 3850/20117 [2:24:01<11:07:28,  2.46s/it] 19%|███████████████▋                                                                  | 3851/20117 [2:24:04<11:01:43,  2.44s/it] 19%|███████████████▋                                                                  | 3852/20117 [2:24:06<10:44:32,  2.38s/it] 19%|███████████████▋                                                                  | 3853/20117 [2:24:08<10:33:51,  2.34s/it] 19%|███████████████▋                                                                  | 3854/20117 [2:24:10<10:22:15,  2.30s/it] 19%|███████████████▋                                                                  | 3855/20117 [2:24:13<10:21:37,  2.29s/it] 19%|███████████████▋                                                                  | 3856/20117 [2:24:15<10:16:39,  2.28s/it] 19%|███████████████▋                                                                  | 3857/20117 [2:24:18<10:36:57,  2.35s/it] 19%|███████████████▋                                                                  | 3858/20117 [2:24:20<10:48:41,  2.39s/it] 19%|███████████████▋                                                                  | 3859/20117 [2:24:22<10:50:58,  2.40s/it] 19%|███████████████▋                                                                  | 3860/20117 [2:24:25<10:37:21,  2.35s/it]                                                                                                                                 {'loss': 0.1952, 'grad_norm': 0.303365021944046, 'learning_rate': 0.00018309624474796926, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.31, 'epoch': 0.38}
 19%|███████████████▋                                                                  | 3860/20117 [2:24:25<10:37:21,  2.35s/it] 19%|███████████████▋                                                                  | 3861/20117 [2:24:27<10:29:56,  2.33s/it] 19%|███████████████▋                                                                  | 3862/20117 [2:24:29<10:23:24,  2.30s/it] 19%|███████████████▋                                                                  | 3863/20117 [2:24:31<10:23:45,  2.30s/it] 19%|███████████████▊                                                                  | 3864/20117 [2:24:34<10:22:13,  2.30s/it] 19%|███████████████▊                                                                  | 3865/20117 [2:24:36<10:19:48,  2.29s/it] 19%|███████████████▊                                                                  | 3866/20117 [2:24:38<10:16:52,  2.28s/it] 19%|███████████████▊                                                                  | 3867/20117 [2:24:41<10:18:39,  2.28s/it] 19%|███████████████▊                                                                  | 3868/20117 [2:24:43<10:19:06,  2.29s/it] 19%|███████████████▊                                                                  | 3869/20117 [2:24:45<10:19:26,  2.29s/it] 19%|███████████████▊                                                                  | 3870/20117 [2:24:47<10:17:55,  2.28s/it]                                                                                                                                 {'loss': 0.2694, 'grad_norm': 0.5674360990524292, 'learning_rate': 0.00018300882878064266, 'memory/max_active (GiB)': 18.82, 'memory/max_allocated (GiB)': 18.82, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 346.22, 'epoch': 0.38}
 19%|███████████████▊                                                                  | 3870/20117 [2:24:47<10:17:55,  2.28s/it] 19%|███████████████▊                                                                  | 3871/20117 [2:24:50<10:14:57,  2.27s/it] 19%|███████████████▊                                                                  | 3872/20117 [2:24:52<10:24:40,  2.31s/it] 19%|███████████████▊                                                                  | 3873/20117 [2:24:54<10:27:30,  2.32s/it] 19%|███████████████▊                                                                  | 3874/20117 [2:24:57<10:27:33,  2.32s/it] 19%|███████████████▊                                                                  | 3875/20117 [2:24:59<10:22:42,  2.30s/it] 19%|███████████████▊                                                                  | 3876/20117 [2:25:01<10:18:18,  2.28s/it] 19%|███████████████▊                                                                  | 3877/20117 [2:25:04<10:18:20,  2.28s/it] 19%|███████████████▊                                                                  | 3878/20117 [2:25:06<10:16:33,  2.28s/it] 19%|███████████████▊                                                                  | 3879/20117 [2:25:08<10:23:21,  2.30s/it] 19%|███████████████▊                                                                  | 3880/20117 [2:25:11<10:35:57,  2.35s/it]                                                                                                                                 {'loss': 0.2825, 'grad_norm': 0.4402889609336853, 'learning_rate': 0.00018292120834502643, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 339.23, 'epoch': 0.39}
 19%|███████████████▊                                                                  | 3880/20117 [2:25:11<10:35:57,  2.35s/it] 19%|███████████████▊                                                                  | 3881/20117 [2:25:13<10:26:48,  2.32s/it] 19%|███████████████▊                                                                  | 3882/20117 [2:25:15<10:21:42,  2.30s/it] 19%|███████████████▊                                                                  | 3883/20117 [2:25:17<10:13:46,  2.27s/it] 19%|███████████████▊                                                                  | 3884/20117 [2:25:20<10:21:22,  2.30s/it] 19%|███████████████▊                                                                  | 3885/20117 [2:25:22<10:19:26,  2.29s/it] 19%|███████████████▊                                                                  | 3886/20117 [2:25:24<10:25:43,  2.31s/it] 19%|███████████████▊                                                                  | 3887/20117 [2:25:27<10:18:00,  2.28s/it] 19%|███████████████▊                                                                  | 3888/20117 [2:25:29<10:14:07,  2.27s/it] 19%|███████████████▊                                                                  | 3889/20117 [2:25:31<10:13:08,  2.27s/it] 19%|███████████████▊                                                                  | 3890/20117 [2:25:33<10:13:09,  2.27s/it]                                                                                                                                 {'loss': 0.2294, 'grad_norm': 0.3922783136367798, 'learning_rate': 0.00018283338365694825, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 289.83, 'epoch': 0.39}
 19%|███████████████▊                                                                  | 3890/20117 [2:25:33<10:13:09,  2.27s/it] 19%|███████████████▊                                                                  | 3891/20117 [2:25:36<10:17:40,  2.28s/it] 19%|███████████████▊                                                                  | 3892/20117 [2:25:38<10:14:17,  2.27s/it] 19%|███████████████▊                                                                  | 3893/20117 [2:25:40<10:14:11,  2.27s/it] 19%|███████████████▊                                                                  | 3894/20117 [2:25:42<10:14:06,  2.27s/it] 19%|███████████████▉                                                                  | 3895/20117 [2:25:45<10:09:54,  2.26s/it] 19%|███████████████▉                                                                  | 3896/20117 [2:25:47<10:11:59,  2.26s/it] 19%|███████████████▉                                                                  | 3897/20117 [2:25:49<10:10:12,  2.26s/it] 19%|███████████████▉                                                                  | 3898/20117 [2:25:51<10:12:41,  2.27s/it] 19%|███████████████▉                                                                  | 3899/20117 [2:25:54<10:11:38,  2.26s/it] 19%|███████████████▉                                                                  | 3900/20117 [2:25:56<10:12:21,  2.27s/it]                                                                                                                                 {'loss': 0.2244, 'grad_norm': 0.6003592014312744, 'learning_rate': 0.00018274535493273893, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 350.12, 'epoch': 0.39}
 19%|███████████████▉                                                                  | 3900/20117 [2:25:56<10:12:21,  2.27s/it] 19%|███████████████▉                                                                  | 3901/20117 [2:25:59<10:41:00,  2.37s/it] 19%|███████████████▉                                                                  | 3902/20117 [2:26:01<10:38:07,  2.36s/it] 19%|███████████████▉                                                                  | 3903/20117 [2:26:03<10:33:40,  2.34s/it] 19%|███████████████▉                                                                  | 3904/20117 [2:26:05<10:26:34,  2.32s/it] 19%|███████████████▉                                                                  | 3905/20117 [2:26:08<10:20:21,  2.30s/it] 19%|███████████████▉                                                                  | 3906/20117 [2:26:10<10:24:53,  2.31s/it] 19%|███████████████▉                                                                  | 3907/20117 [2:26:12<10:21:02,  2.30s/it] 19%|███████████████▉                                                                  | 3908/20117 [2:26:15<10:18:39,  2.29s/it] 19%|███████████████▉                                                                  | 3909/20117 [2:26:17<10:15:46,  2.28s/it] 19%|███████████████▉                                                                  | 3910/20117 [2:26:19<10:13:13,  2.27s/it]                                                                                                                                 {'loss': 0.2341, 'grad_norm': 0.436212956905365, 'learning_rate': 0.00018265712238923175, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.17, 'epoch': 0.39}
 19%|███████████████▉                                                                  | 3910/20117 [2:26:19<10:13:13,  2.27s/it] 19%|███████████████▉                                                                  | 3911/20117 [2:26:21<10:13:40,  2.27s/it] 19%|███████████████▉                                                                  | 3912/20117 [2:26:24<10:29:39,  2.33s/it] 19%|███████████████▉                                                                  | 3913/20117 [2:26:26<10:20:48,  2.30s/it] 19%|███████████████▉                                                                  | 3914/20117 [2:26:28<10:13:57,  2.27s/it] 19%|███████████████▉                                                                  | 3915/20117 [2:26:31<10:13:46,  2.27s/it] 19%|███████████████▉                                                                  | 3916/20117 [2:26:33<10:35:14,  2.35s/it] 19%|███████████████▉                                                                  | 3917/20117 [2:26:35<10:35:29,  2.35s/it] 19%|███████████████▉                                                                  | 3918/20117 [2:26:38<10:43:32,  2.38s/it] 19%|███████████████▉                                                                  | 3919/20117 [2:26:40<10:41:45,  2.38s/it] 19%|███████████████▉                                                                  | 3920/20117 [2:26:43<10:39:03,  2.37s/it]                                                                                                                                 {'loss': 0.2647, 'grad_norm': 0.2501852810382843, 'learning_rate': 0.00018256868624376215, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 396.07, 'epoch': 0.39}
 19%|███████████████▉                                                                  | 3920/20117 [2:26:43<10:39:03,  2.37s/it] 19%|███████████████▉                                                                  | 3921/20117 [2:26:45<10:28:11,  2.33s/it] 19%|███████████████▉                                                                  | 3922/20117 [2:26:47<10:21:42,  2.30s/it] 20%|███████████████▉                                                                  | 3923/20117 [2:26:49<10:15:28,  2.28s/it] 20%|███████████████▉                                                                  | 3924/20117 [2:26:52<10:11:46,  2.27s/it] 20%|███████████████▉                                                                  | 3925/20117 [2:26:54<10:11:07,  2.26s/it] 20%|████████████████                                                                  | 3926/20117 [2:26:56<10:14:25,  2.28s/it] 20%|████████████████                                                                  | 3927/20117 [2:26:58<10:10:39,  2.26s/it] 20%|████████████████                                                                  | 3928/20117 [2:27:01<10:07:02,  2.25s/it] 20%|████████████████                                                                  | 3929/20117 [2:27:03<10:04:57,  2.24s/it] 20%|████████████████                                                                  | 3930/20117 [2:27:05<10:08:32,  2.26s/it]                                                                                                                                 {'loss': 0.2664, 'grad_norm': 0.36171767115592957, 'learning_rate': 0.00018248004671416704, 'memory/max_active (GiB)': 19.69, 'memory/max_allocated (GiB)': 19.69, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 426.89, 'epoch': 0.39}
 20%|████████████████                                                                  | 3930/20117 [2:27:05<10:08:32,  2.26s/it] 20%|████████████████                                                                  | 3931/20117 [2:27:07<10:08:09,  2.25s/it] 20%|████████████████                                                                  | 3932/20117 [2:27:10<10:09:49,  2.26s/it] 20%|████████████████                                                                  | 3933/20117 [2:27:12<10:08:29,  2.26s/it] 20%|████████████████                                                                  | 3934/20117 [2:27:14<10:10:17,  2.26s/it] 20%|████████████████                                                                  | 3935/20117 [2:27:16<10:09:33,  2.26s/it] 20%|████████████████                                                                  | 3936/20117 [2:27:19<10:08:24,  2.26s/it] 20%|████████████████                                                                  | 3937/20117 [2:27:21<10:12:31,  2.27s/it] 20%|████████████████                                                                  | 3938/20117 [2:27:23<10:12:46,  2.27s/it] 20%|████████████████                                                                  | 3939/20117 [2:27:26<10:16:57,  2.29s/it] 20%|████████████████                                                                  | 3940/20117 [2:27:28<10:14:34,  2.28s/it]                                                                                                                                 {'loss': 0.3584, 'grad_norm': 0.47077932953834534, 'learning_rate': 0.00018239120401878432, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 452.63, 'epoch': 0.39}
 20%|████████████████                                                                  | 3940/20117 [2:27:28<10:14:34,  2.28s/it] 20%|████████████████                                                                  | 3941/20117 [2:27:30<10:12:19,  2.27s/it] 20%|████████████████                                                                  | 3942/20117 [2:27:32<10:16:27,  2.29s/it] 20%|████████████████                                                                  | 3943/20117 [2:27:35<10:18:52,  2.30s/it] 20%|████████████████                                                                  | 3944/20117 [2:27:37<10:13:46,  2.28s/it] 20%|████████████████                                                                  | 3945/20117 [2:27:39<10:13:37,  2.28s/it] 20%|████████████████                                                                  | 3946/20117 [2:27:41<10:08:27,  2.26s/it] 20%|████████████████                                                                  | 3947/20117 [2:27:44<10:11:54,  2.27s/it] 20%|████████████████                                                                  | 3948/20117 [2:27:46<10:09:50,  2.26s/it] 20%|████████████████                                                                  | 3949/20117 [2:27:48<10:04:14,  2.24s/it] 20%|████████████████                                                                  | 3950/20117 [2:27:50<10:01:17,  2.23s/it]                                                                                                                                 {'loss': 0.2715, 'grad_norm': 0.413924902677536, 'learning_rate': 0.00018230215837645232, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 424.3, 'epoch': 0.39}
 20%|████████████████                                                                  | 3950/20117 [2:27:50<10:01:17,  2.23s/it] 20%|████████████████                                                                  | 3951/20117 [2:27:53<10:05:14,  2.25s/it] 20%|████████████████                                                                  | 3952/20117 [2:27:55<10:07:24,  2.25s/it] 20%|████████████████                                                                  | 3953/20117 [2:27:57<10:33:51,  2.35s/it] 20%|████████████████                                                                  | 3954/20117 [2:28:00<10:20:40,  2.30s/it] 20%|████████████████                                                                  | 3955/20117 [2:28:02<10:12:50,  2.28s/it] 20%|████████████████▏                                                                 | 3956/20117 [2:28:04<10:09:06,  2.26s/it] 20%|████████████████▏                                                                 | 3957/20117 [2:28:06<10:05:14,  2.25s/it] 20%|████████████████▏                                                                 | 3958/20117 [2:28:09<10:05:25,  2.25s/it] 20%|████████████████▏                                                                 | 3959/20117 [2:28:11<10:03:44,  2.24s/it] 20%|████████████████▏                                                                 | 3960/20117 [2:28:13<10:01:29,  2.23s/it]                                                                                                                                 {'loss': 0.2855, 'grad_norm': 0.40877413749694824, 'learning_rate': 0.00018221291000650928, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 406.29, 'epoch': 0.39}
 20%|████████████████▏                                                                 | 3960/20117 [2:28:13<10:01:29,  2.23s/it] 20%|████████████████▏                                                                 | 3961/20117 [2:28:15<10:03:09,  2.24s/it] 20%|████████████████▏                                                                 | 3962/20117 [2:28:17<10:01:02,  2.23s/it] 20%|████████████████▏                                                                 | 3963/20117 [2:28:20<10:03:23,  2.24s/it] 20%|████████████████▏                                                                 | 3964/20117 [2:28:22<10:00:16,  2.23s/it] 20%|████████████████▏                                                                 | 3965/20117 [2:28:24<10:01:05,  2.23s/it] 20%|████████████████▎                                                                  | 3966/20117 [2:28:26<9:56:28,  2.22s/it] 20%|████████████████▎                                                                  | 3967/20117 [2:28:29<9:55:08,  2.21s/it] 20%|████████████████▎                                                                  | 3968/20117 [2:28:31<9:51:29,  2.20s/it] 20%|████████████████▍                                                                  | 3969/20117 [2:28:33<9:49:47,  2.19s/it] 20%|████████████████▍                                                                  | 3970/20117 [2:28:35<9:48:42,  2.19s/it]                                                                                                                                 {'loss': 0.1893, 'grad_norm': 0.4080711007118225, 'learning_rate': 0.0001821234591287928, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.04, 'epoch': 0.39}
 20%|████████████████▍                                                                  | 3970/20117 [2:28:35<9:48:42,  2.19s/it] 20%|████████████████▍                                                                  | 3971/20117 [2:28:37<9:46:58,  2.18s/it] 20%|████████████████▍                                                                  | 3972/20117 [2:28:39<9:49:45,  2.19s/it] 20%|████████████████▍                                                                  | 3973/20117 [2:28:42<9:51:14,  2.20s/it] 20%|████████████████▏                                                                 | 3974/20117 [2:28:44<10:00:03,  2.23s/it] 20%|████████████████▏                                                                 | 3975/20117 [2:28:46<10:05:39,  2.25s/it] 20%|████████████████▏                                                                 | 3976/20117 [2:28:49<10:16:55,  2.29s/it] 20%|████████████████▏                                                                 | 3977/20117 [2:28:51<10:18:11,  2.30s/it] 20%|████████████████▏                                                                 | 3978/20117 [2:28:53<10:19:48,  2.30s/it] 20%|████████████████▏                                                                 | 3979/20117 [2:28:56<10:22:20,  2.31s/it] 20%|████████████████▏                                                                 | 3980/20117 [2:28:58<10:24:27,  2.32s/it]                                                                                                                                 {'loss': 0.2328, 'grad_norm': 0.2622958719730377, 'learning_rate': 0.00018203380596363932, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 323.13, 'epoch': 0.4}
 20%|████████████████▏                                                                 | 3980/20117 [2:28:58<10:24:27,  2.32s/it] 20%|████████████████▏                                                                 | 3981/20117 [2:29:00<10:26:56,  2.33s/it] 20%|████████████████▏                                                                 | 3982/20117 [2:29:03<10:24:13,  2.32s/it] 20%|████████████████▏                                                                 | 3983/20117 [2:29:05<10:18:17,  2.30s/it] 20%|████████████████▏                                                                 | 3984/20117 [2:29:07<10:14:49,  2.29s/it] 20%|████████████████▏                                                                 | 3985/20117 [2:29:09<10:11:11,  2.27s/it] 20%|████████████████▏                                                                 | 3986/20117 [2:29:12<10:06:00,  2.25s/it] 20%|████████████████▍                                                                  | 3987/20117 [2:29:14<9:57:02,  2.22s/it] 20%|████████████████▍                                                                  | 3988/20117 [2:29:16<9:53:43,  2.21s/it] 20%|████████████████▍                                                                  | 3989/20117 [2:29:18<9:56:17,  2.22s/it] 20%|████████████████▎                                                                 | 3990/20117 [2:29:20<10:04:31,  2.25s/it]                                                                                                                                 {'loss': 0.196, 'grad_norm': 0.32758989930152893, 'learning_rate': 0.0001819439507318835, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 255.43, 'epoch': 0.4}
 20%|████████████████▎                                                                 | 3990/20117 [2:29:20<10:04:31,  2.25s/it] 20%|████████████████▎                                                                 | 3991/20117 [2:29:23<10:06:11,  2.26s/it] 20%|████████████████▎                                                                 | 3992/20117 [2:29:25<10:06:11,  2.26s/it] 20%|████████████████▎                                                                 | 3993/20117 [2:29:27<10:03:22,  2.25s/it] 20%|████████████████▎                                                                 | 3994/20117 [2:29:29<10:01:47,  2.24s/it] 20%|████████████████▍                                                                  | 3995/20117 [2:29:32<9:59:16,  2.23s/it] 20%|████████████████▍                                                                  | 3996/20117 [2:29:34<9:59:47,  2.23s/it] 20%|████████████████▎                                                                 | 3997/20117 [2:29:36<10:03:05,  2.24s/it] 20%|████████████████▎                                                                 | 3998/20117 [2:29:38<10:04:02,  2.25s/it] 20%|████████████████▎                                                                 | 3999/20117 [2:29:41<10:04:04,  2.25s/it] 20%|████████████████▎                                                                 | 4000/20117 [2:29:43<10:03:11,  2.25s/it]                                                                                                                                 {'loss': 0.2874, 'grad_norm': 0.4135094881057739, 'learning_rate': 0.00018185389365485774, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.42, 'epoch': 0.4}
 20%|████████████████▎                                                                 | 4000/20117 [2:29:43<10:03:11,  2.25s/it] 20%|████████████████▎                                                                 | 4001/20117 [2:29:45<10:03:36,  2.25s/it] 20%|████████████████▎                                                                 | 4002/20117 [2:29:47<10:01:23,  2.24s/it] 20%|████████████████▎                                                                 | 4003/20117 [2:29:50<10:01:19,  2.24s/it] 20%|████████████████▌                                                                  | 4004/20117 [2:29:52<9:57:43,  2.23s/it] 20%|████████████████▎                                                                 | 4005/20117 [2:29:54<10:27:19,  2.34s/it] 20%|████████████████▎                                                                 | 4006/20117 [2:29:57<10:26:50,  2.33s/it] 20%|████████████████▎                                                                 | 4007/20117 [2:29:59<10:17:11,  2.30s/it] 20%|████████████████▎                                                                 | 4008/20117 [2:30:01<10:08:16,  2.27s/it] 20%|████████████████▎                                                                 | 4009/20117 [2:30:03<10:08:05,  2.27s/it] 20%|████████████████▎                                                                 | 4010/20117 [2:30:06<10:06:54,  2.26s/it]                                                                                                                                 {'loss': 0.2796, 'grad_norm': 0.4753275215625763, 'learning_rate': 0.00018176363495439173, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 300.74, 'epoch': 0.4}
 20%|████████████████▎                                                                 | 4010/20117 [2:30:06<10:06:54,  2.26s/it] 20%|████████████████▎                                                                 | 4011/20117 [2:30:08<10:07:56,  2.26s/it] 20%|████████████████▎                                                                 | 4012/20117 [2:30:10<10:17:05,  2.30s/it] 20%|████████████████▎                                                                 | 4013/20117 [2:30:13<10:13:35,  2.29s/it] 20%|████████████████▎                                                                 | 4014/20117 [2:30:15<10:13:56,  2.29s/it] 20%|████████████████▎                                                                 | 4015/20117 [2:30:17<10:10:04,  2.27s/it] 20%|████████████████▎                                                                 | 4016/20117 [2:30:19<10:05:10,  2.26s/it] 20%|████████████████▎                                                                 | 4017/20117 [2:30:22<10:00:45,  2.24s/it] 20%|████████████████▍                                                                 | 4018/20117 [2:30:24<10:04:25,  2.25s/it] 20%|████████████████▍                                                                 | 4019/20117 [2:30:26<10:02:33,  2.25s/it] 20%|████████████████▍                                                                 | 4020/20117 [2:30:28<10:09:27,  2.27s/it]                                                                                                                                 {'loss': 0.3278, 'grad_norm': 0.41000860929489136, 'learning_rate': 0.00018167317485281168, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.42, 'epoch': 0.4}
 20%|████████████████▍                                                                 | 4020/20117 [2:30:28<10:09:27,  2.27s/it] 20%|████████████████▍                                                                 | 4021/20117 [2:30:31<10:15:15,  2.29s/it] 20%|████████████████▍                                                                 | 4022/20117 [2:30:33<10:15:28,  2.29s/it] 20%|████████████████▍                                                                 | 4023/20117 [2:30:35<10:13:28,  2.29s/it] 20%|████████████████▍                                                                 | 4024/20117 [2:30:38<10:08:11,  2.27s/it] 20%|████████████████▍                                                                 | 4025/20117 [2:30:40<10:11:54,  2.28s/it] 20%|████████████████▍                                                                 | 4026/20117 [2:30:42<10:11:30,  2.28s/it] 20%|████████████████▍                                                                 | 4027/20117 [2:30:44<10:07:36,  2.27s/it] 20%|████████████████▍                                                                 | 4028/20117 [2:30:47<10:11:58,  2.28s/it] 20%|████████████████▍                                                                 | 4029/20117 [2:30:49<10:10:33,  2.28s/it] 20%|████████████████▍                                                                 | 4030/20117 [2:30:51<10:08:57,  2.27s/it]                                                                                                                                 {'loss': 0.2514, 'grad_norm': 0.3132636845111847, 'learning_rate': 0.00018158251357293996, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 340.93, 'epoch': 0.4}
 20%|████████████████▍                                                                 | 4030/20117 [2:30:51<10:08:57,  2.27s/it] 20%|████████████████▍                                                                 | 4031/20117 [2:30:53<10:09:36,  2.27s/it] 20%|████████████████▍                                                                 | 4032/20117 [2:30:56<10:18:37,  2.31s/it] 20%|████████████████▍                                                                 | 4033/20117 [2:30:58<10:21:18,  2.32s/it] 20%|████████████████▍                                                                 | 4034/20117 [2:31:00<10:13:00,  2.29s/it] 20%|████████████████▍                                                                 | 4035/20117 [2:31:03<10:10:52,  2.28s/it] 20%|████████████████▍                                                                 | 4036/20117 [2:31:05<10:10:10,  2.28s/it] 20%|████████████████▍                                                                 | 4037/20117 [2:31:07<10:07:27,  2.27s/it] 20%|████████████████▍                                                                 | 4038/20117 [2:31:09<10:04:38,  2.26s/it] 20%|████████████████▍                                                                 | 4039/20117 [2:31:12<10:02:15,  2.25s/it] 20%|████████████████▍                                                                 | 4040/20117 [2:31:14<10:04:47,  2.26s/it]                                                                                                                                 {'loss': 0.219, 'grad_norm': 0.3330332338809967, 'learning_rate': 0.00018149165133809442, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 331.95, 'epoch': 0.4}
 20%|████████████████▍                                                                 | 4040/20117 [2:31:14<10:04:47,  2.26s/it] 20%|████████████████▍                                                                 | 4041/20117 [2:31:16<10:07:11,  2.27s/it] 20%|████████████████▍                                                                 | 4042/20117 [2:31:19<10:11:35,  2.28s/it] 20%|████████████████▍                                                                 | 4043/20117 [2:31:21<10:06:40,  2.26s/it] 20%|████████████████▍                                                                 | 4044/20117 [2:31:23<10:02:36,  2.25s/it] 20%|████████████████▍                                                                 | 4045/20117 [2:31:25<10:01:48,  2.25s/it] 20%|████████████████▍                                                                 | 4046/20117 [2:31:27<10:05:08,  2.26s/it] 20%|████████████████▍                                                                 | 4047/20117 [2:31:30<10:03:35,  2.25s/it] 20%|████████████████▌                                                                 | 4048/20117 [2:31:32<10:02:30,  2.25s/it] 20%|████████████████▋                                                                  | 4049/20117 [2:31:34<9:56:55,  2.23s/it] 20%|████████████████▋                                                                  | 4050/20117 [2:31:36<9:55:55,  2.23s/it]                                                                                                                                 {'loss': 0.3451, 'grad_norm': 0.5818430781364441, 'learning_rate': 0.000181400588372088, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 433.43, 'epoch': 0.4}
 20%|████████████████▋                                                                  | 4050/20117 [2:31:36<9:55:55,  2.23s/it] 20%|████████████████▋                                                                  | 4051/20117 [2:31:39<9:56:56,  2.23s/it] 20%|████████████████▋                                                                  | 4052/20117 [2:31:41<9:53:54,  2.22s/it] 20%|████████████████▋                                                                  | 4053/20117 [2:31:43<9:57:39,  2.23s/it] 20%|████████████████▌                                                                 | 4054/20117 [2:31:45<10:00:47,  2.24s/it] 20%|████████████████▌                                                                 | 4055/20117 [2:31:48<10:15:30,  2.30s/it] 20%|████████████████▌                                                                 | 4056/20117 [2:31:51<10:51:31,  2.43s/it] 20%|████████████████▌                                                                 | 4057/20117 [2:31:53<10:34:50,  2.37s/it] 20%|████████████████▌                                                                 | 4058/20117 [2:31:55<10:20:44,  2.32s/it] 20%|████████████████▌                                                                 | 4059/20117 [2:31:57<10:11:18,  2.28s/it] 20%|████████████████▌                                                                 | 4060/20117 [2:31:59<10:04:46,  2.26s/it]                                                                                                                                 {'loss': 0.1907, 'grad_norm': 0.4116646945476532, 'learning_rate': 0.00018130932489922804, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 326.32, 'epoch': 0.4}
 20%|████████████████▌                                                                 | 4060/20117 [2:31:59<10:04:46,  2.26s/it] 20%|████████████████▌                                                                 | 4061/20117 [2:32:02<10:03:27,  2.26s/it] 20%|████████████████▌                                                                 | 4062/20117 [2:32:04<10:03:39,  2.26s/it] 20%|████████████████▊                                                                  | 4063/20117 [2:32:06<9:59:23,  2.24s/it] 20%|████████████████▊                                                                  | 4064/20117 [2:32:08<9:55:59,  2.23s/it] 20%|████████████████▌                                                                 | 4065/20117 [2:32:11<10:00:05,  2.24s/it] 20%|████████████████▊                                                                  | 4066/20117 [2:32:13<9:55:58,  2.23s/it] 20%|████████████████▊                                                                  | 4067/20117 [2:32:15<9:56:15,  2.23s/it] 20%|████████████████▊                                                                  | 4068/20117 [2:32:17<9:57:42,  2.23s/it] 20%|████████████████▊                                                                  | 4069/20117 [2:32:19<9:56:24,  2.23s/it] 20%|████████████████▌                                                                 | 4070/20117 [2:32:22<10:04:01,  2.26s/it]                                                                                                                                 {'loss': 0.18, 'grad_norm': 0.31805434823036194, 'learning_rate': 0.0001812178611443157, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 284.15, 'epoch': 0.4}
 20%|████████████████▌                                                                 | 4070/20117 [2:32:22<10:04:01,  2.26s/it] 20%|████████████████▌                                                                 | 4071/20117 [2:32:24<10:06:47,  2.27s/it] 20%|████████████████▌                                                                 | 4072/20117 [2:32:26<10:04:04,  2.26s/it] 20%|████████████████▌                                                                 | 4073/20117 [2:32:29<10:07:12,  2.27s/it] 20%|████████████████▌                                                                 | 4074/20117 [2:32:31<10:02:56,  2.25s/it] 20%|████████████████▌                                                                 | 4075/20117 [2:32:33<10:08:52,  2.28s/it] 20%|████████████████▌                                                                 | 4076/20117 [2:32:35<10:06:13,  2.27s/it] 20%|████████████████▌                                                                 | 4077/20117 [2:32:38<10:04:20,  2.26s/it] 20%|████████████████▌                                                                 | 4078/20117 [2:32:40<10:02:03,  2.25s/it] 20%|████████████████▋                                                                 | 4079/20117 [2:32:42<10:02:19,  2.25s/it] 20%|████████████████▊                                                                  | 4080/20117 [2:32:44<9:59:35,  2.24s/it]                                                                                                                                 {'loss': 0.2596, 'grad_norm': 0.5397758483886719, 'learning_rate': 0.0001811261973326456, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.06, 'epoch': 0.41}
 20%|████████████████▊                                                                  | 4080/20117 [2:32:44<9:59:35,  2.24s/it] 20%|████████████████▊                                                                  | 4081/20117 [2:32:47<9:58:33,  2.24s/it] 20%|████████████████▋                                                                 | 4082/20117 [2:32:49<10:03:17,  2.26s/it] 20%|████████████████▋                                                                 | 4083/20117 [2:32:51<10:00:31,  2.25s/it] 20%|████████████████▋                                                                 | 4084/20117 [2:32:53<10:01:07,  2.25s/it] 20%|████████████████▋                                                                 | 4085/20117 [2:32:56<10:06:04,  2.27s/it] 20%|████████████████▋                                                                 | 4086/20117 [2:32:58<10:08:24,  2.28s/it] 20%|████████████████▋                                                                 | 4087/20117 [2:33:00<10:05:08,  2.27s/it] 20%|████████████████▋                                                                 | 4088/20117 [2:33:03<10:11:21,  2.29s/it] 20%|████████████████▋                                                                 | 4089/20117 [2:33:05<10:10:04,  2.28s/it] 20%|████████████████▋                                                                 | 4090/20117 [2:33:07<10:08:36,  2.28s/it]                                                                                                                                 {'loss': 0.2464, 'grad_norm': 0.4468703866004944, 'learning_rate': 0.00018103433369000502, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.56, 'epoch': 0.41}
 20%|████████████████▋                                                                 | 4090/20117 [2:33:07<10:08:36,  2.28s/it] 20%|████████████████▋                                                                 | 4091/20117 [2:33:09<10:11:13,  2.29s/it] 20%|████████████████▋                                                                 | 4092/20117 [2:33:12<10:10:11,  2.28s/it] 20%|████████████████▋                                                                 | 4093/20117 [2:33:14<10:11:24,  2.29s/it] 20%|████████████████▋                                                                 | 4094/20117 [2:33:16<10:06:55,  2.27s/it] 20%|████████████████▋                                                                 | 4095/20117 [2:33:18<10:06:11,  2.27s/it] 20%|████████████████▋                                                                 | 4096/20117 [2:33:21<10:05:43,  2.27s/it] 20%|████████████████▋                                                                 | 4097/20117 [2:33:23<10:02:57,  2.26s/it] 20%|████████████████▋                                                                 | 4098/20117 [2:33:25<10:01:42,  2.25s/it] 20%|████████████████▋                                                                 | 4099/20117 [2:33:27<10:04:13,  2.26s/it] 20%|████████████████▋                                                                 | 4100/20117 [2:33:30<10:02:26,  2.26s/it]                                                                                                                                 {'loss': 0.2, 'grad_norm': 0.3228696286678314, 'learning_rate': 0.0001809422704426736, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 301.0, 'epoch': 0.41}
 20%|████████████████▋                                                                 | 4100/20117 [2:33:30<10:02:26,  2.26s/it] 20%|████████████████▋                                                                 | 4101/20117 [2:33:32<10:01:14,  2.25s/it] 20%|████████████████▋                                                                 | 4102/20117 [2:33:34<10:01:07,  2.25s/it] 20%|████████████████▋                                                                 | 4103/20117 [2:33:37<10:11:11,  2.29s/it] 20%|████████████████▋                                                                 | 4104/20117 [2:33:39<10:10:55,  2.29s/it] 20%|████████████████▋                                                                 | 4105/20117 [2:33:41<10:05:42,  2.27s/it] 20%|████████████████▋                                                                 | 4106/20117 [2:33:43<10:00:46,  2.25s/it] 20%|████████████████▋                                                                 | 4107/20117 [2:33:46<10:21:03,  2.33s/it] 20%|████████████████▋                                                                 | 4108/20117 [2:33:48<10:09:55,  2.29s/it] 20%|████████████████▋                                                                 | 4109/20117 [2:33:50<10:03:45,  2.26s/it] 20%|████████████████▊                                                                 | 4110/20117 [2:33:52<10:04:41,  2.27s/it]                                                                                                                                 {'loss': 0.2642, 'grad_norm': 0.5811059474945068, 'learning_rate': 0.00018085000781742252, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 366.31, 'epoch': 0.41}
 20%|████████████████▊                                                                 | 4110/20117 [2:33:52<10:04:41,  2.27s/it] 20%|████████████████▊                                                                 | 4111/20117 [2:33:55<10:08:50,  2.28s/it] 20%|████████████████▊                                                                 | 4112/20117 [2:33:57<10:03:35,  2.26s/it] 20%|████████████████▊                                                                 | 4113/20117 [2:33:59<10:02:10,  2.26s/it] 20%|████████████████▉                                                                  | 4114/20117 [2:34:01<9:57:59,  2.24s/it] 20%|████████████████▉                                                                  | 4115/20117 [2:34:04<9:56:16,  2.24s/it] 20%|████████████████▉                                                                  | 4116/20117 [2:34:06<9:54:06,  2.23s/it] 20%|████████████████▉                                                                  | 4117/20117 [2:34:08<9:56:34,  2.24s/it] 20%|████████████████▊                                                                 | 4118/20117 [2:34:11<10:07:02,  2.28s/it] 20%|████████████████▊                                                                 | 4119/20117 [2:34:13<10:02:16,  2.26s/it] 20%|████████████████▊                                                                 | 4120/20117 [2:34:15<10:01:01,  2.25s/it]                                                                                                                                 {'loss': 0.2658, 'grad_norm': 0.6858221292495728, 'learning_rate': 0.00018075754604151415, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 340.12, 'epoch': 0.41}
 20%|████████████████▊                                                                 | 4120/20117 [2:34:15<10:01:01,  2.25s/it] 20%|████████████████▊                                                                 | 4121/20117 [2:34:17<10:01:43,  2.26s/it] 20%|████████████████▊                                                                 | 4122/20117 [2:34:19<10:00:07,  2.25s/it] 20%|█████████████████                                                                  | 4123/20117 [2:34:22<9:55:20,  2.23s/it] 21%|█████████████████                                                                  | 4124/20117 [2:34:24<9:53:25,  2.23s/it] 21%|█████████████████                                                                  | 4125/20117 [2:34:26<9:50:58,  2.22s/it] 21%|████████████████▊                                                                 | 4126/20117 [2:34:28<10:00:17,  2.25s/it] 21%|████████████████▊                                                                 | 4127/20117 [2:34:31<10:09:05,  2.29s/it] 21%|████████████████▊                                                                 | 4128/20117 [2:34:33<10:09:56,  2.29s/it] 21%|████████████████▊                                                                 | 4129/20117 [2:34:35<10:14:08,  2.30s/it] 21%|████████████████▊                                                                 | 4130/20117 [2:34:38<10:06:10,  2.28s/it]                                                                                                                                 {'loss': 0.2542, 'grad_norm': 0.3095184862613678, 'learning_rate': 0.00018066488534270142, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 399.71, 'epoch': 0.41}
 21%|████████████████▊                                                                 | 4130/20117 [2:34:38<10:06:10,  2.28s/it] 21%|████████████████▊                                                                 | 4131/20117 [2:34:40<10:02:29,  2.26s/it] 21%|█████████████████                                                                  | 4132/20117 [2:34:42<9:57:20,  2.24s/it] 21%|█████████████████                                                                  | 4133/20117 [2:34:44<9:57:33,  2.24s/it] 21%|█████████████████                                                                  | 4134/20117 [2:34:47<9:57:26,  2.24s/it] 21%|█████████████████                                                                  | 4135/20117 [2:34:49<9:52:25,  2.22s/it] 21%|█████████████████                                                                  | 4136/20117 [2:34:51<9:50:48,  2.22s/it] 21%|█████████████████                                                                  | 4137/20117 [2:34:53<9:56:32,  2.24s/it] 21%|████████████████▊                                                                 | 4138/20117 [2:34:56<10:03:37,  2.27s/it] 21%|████████████████▊                                                                 | 4139/20117 [2:34:58<10:12:36,  2.30s/it] 21%|████████████████▉                                                                 | 4140/20117 [2:35:00<10:05:03,  2.27s/it]                                                                                                                                 {'loss': 0.289, 'grad_norm': 0.5910835266113281, 'learning_rate': 0.0001805720259492271, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 366.05, 'epoch': 0.41}
 21%|████████████████▉                                                                 | 4140/20117 [2:35:00<10:05:03,  2.27s/it] 21%|████████████████▉                                                                 | 4141/20117 [2:35:02<10:04:05,  2.27s/it] 21%|████████████████▉                                                                 | 4142/20117 [2:35:05<10:23:06,  2.34s/it] 21%|████████████████▉                                                                 | 4143/20117 [2:35:07<10:35:31,  2.39s/it] 21%|████████████████▉                                                                 | 4144/20117 [2:35:10<10:52:39,  2.45s/it] 21%|████████████████▉                                                                 | 4145/20117 [2:35:13<10:58:41,  2.47s/it] 21%|████████████████▉                                                                 | 4146/20117 [2:35:15<10:44:52,  2.42s/it] 21%|████████████████▉                                                                 | 4147/20117 [2:35:17<10:27:27,  2.36s/it] 21%|████████████████▉                                                                 | 4148/20117 [2:35:19<10:15:12,  2.31s/it] 21%|████████████████▉                                                                 | 4149/20117 [2:35:21<10:09:54,  2.29s/it] 21%|████████████████▉                                                                 | 4150/20117 [2:35:24<10:11:10,  2.30s/it]                                                                                                                                 {'loss': 0.2581, 'grad_norm': 0.3704458773136139, 'learning_rate': 0.00018047896808982364, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 348.57, 'epoch': 0.41}
 21%|████████████████▉                                                                 | 4150/20117 [2:35:24<10:11:10,  2.30s/it] 21%|████████████████▉                                                                 | 4151/20117 [2:35:26<10:02:35,  2.26s/it] 21%|█████████████████▏                                                                 | 4152/20117 [2:35:28<9:58:55,  2.25s/it] 21%|█████████████████▏                                                                 | 4153/20117 [2:35:30<9:57:03,  2.24s/it] 21%|████████████████▉                                                                 | 4154/20117 [2:35:33<10:00:17,  2.26s/it] 21%|█████████████████▏                                                                 | 4155/20117 [2:35:35<9:56:22,  2.24s/it] 21%|█████████████████▏                                                                 | 4156/20117 [2:35:37<9:54:02,  2.23s/it] 21%|█████████████████▏                                                                 | 4157/20117 [2:35:39<9:52:52,  2.23s/it] 21%|█████████████████▏                                                                 | 4158/20117 [2:35:42<9:50:16,  2.22s/it] 21%|█████████████████▏                                                                 | 4159/20117 [2:35:44<9:47:03,  2.21s/it] 21%|████████████████▉                                                                 | 4160/20117 [2:35:46<10:06:32,  2.28s/it]                                                                                                                                 {'loss': 0.2207, 'grad_norm': 0.4034087359905243, 'learning_rate': 0.00018038571199371215, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 323.67, 'epoch': 0.41}
 21%|████████████████▉                                                                 | 4160/20117 [2:35:46<10:06:32,  2.28s/it] 21%|████████████████▉                                                                 | 4161/20117 [2:35:48<10:02:17,  2.26s/it] 21%|█████████████████▏                                                                 | 4162/20117 [2:35:51<9:57:55,  2.25s/it] 21%|█████████████████▏                                                                 | 4163/20117 [2:35:53<9:55:59,  2.24s/it] 21%|█████████████████▏                                                                 | 4164/20117 [2:35:55<9:59:43,  2.26s/it] 21%|█████████████████▏                                                                 | 4165/20117 [2:35:57<9:55:56,  2.24s/it] 21%|█████████████████▏                                                                 | 4166/20117 [2:36:00<9:58:45,  2.25s/it] 21%|████████████████▉                                                                 | 4167/20117 [2:36:02<10:00:52,  2.26s/it] 21%|████████████████▉                                                                 | 4168/20117 [2:36:04<10:04:00,  2.27s/it] 21%|████████████████▉                                                                 | 4169/20117 [2:36:06<10:04:56,  2.28s/it] 21%|████████████████▉                                                                 | 4170/20117 [2:36:09<10:17:25,  2.32s/it]                                                                                                                                 {'loss': 0.2927, 'grad_norm': 0.5003114342689514, 'learning_rate': 0.0001802922578906021, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 325.75, 'epoch': 0.41}
 21%|████████████████▉                                                                 | 4170/20117 [2:36:09<10:17:25,  2.32s/it] 21%|█████████████████                                                                 | 4171/20117 [2:36:11<10:17:32,  2.32s/it] 21%|█████████████████                                                                 | 4172/20117 [2:36:14<10:20:41,  2.34s/it] 21%|█████████████████                                                                 | 4173/20117 [2:36:16<10:17:04,  2.32s/it] 21%|█████████████████                                                                 | 4174/20117 [2:36:18<10:11:20,  2.30s/it] 21%|█████████████████                                                                 | 4175/20117 [2:36:20<10:12:53,  2.31s/it] 21%|█████████████████                                                                 | 4176/20117 [2:36:23<10:10:14,  2.30s/it] 21%|█████████████████                                                                 | 4177/20117 [2:36:25<10:04:53,  2.28s/it] 21%|█████████████████                                                                 | 4178/20117 [2:36:27<10:02:16,  2.27s/it] 21%|█████████████████▏                                                                 | 4179/20117 [2:36:29<9:57:50,  2.25s/it] 21%|█████████████████                                                                 | 4180/20117 [2:36:32<10:09:52,  2.30s/it]                                                                                                                                 {'loss': 0.2906, 'grad_norm': 0.29613539576530457, 'learning_rate': 0.0001801986060106907, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.06, 'epoch': 0.42}
 21%|█████████████████                                                                 | 4180/20117 [2:36:32<10:09:52,  2.30s/it] 21%|█████████████████                                                                 | 4181/20117 [2:36:34<10:14:17,  2.31s/it] 21%|█████████████████                                                                 | 4182/20117 [2:36:36<10:13:33,  2.31s/it] 21%|█████████████████                                                                 | 4183/20117 [2:36:39<10:11:21,  2.30s/it] 21%|█████████████████                                                                 | 4184/20117 [2:36:41<10:11:33,  2.30s/it] 21%|█████████████████                                                                 | 4185/20117 [2:36:43<10:08:51,  2.29s/it] 21%|█████████████████                                                                 | 4186/20117 [2:36:46<10:08:27,  2.29s/it] 21%|█████████████████                                                                 | 4187/20117 [2:36:48<10:04:49,  2.28s/it] 21%|█████████████████                                                                 | 4188/20117 [2:36:50<10:02:30,  2.27s/it] 21%|█████████████████                                                                 | 4189/20117 [2:36:52<10:04:04,  2.28s/it] 21%|█████████████████                                                                 | 4190/20117 [2:36:55<10:10:54,  2.30s/it]                                                                                                                                 {'loss': 0.2853, 'grad_norm': 0.49740076065063477, 'learning_rate': 0.00018010475658466235, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 420.46, 'epoch': 0.42}
 21%|█████████████████                                                                 | 4190/20117 [2:36:55<10:10:54,  2.30s/it] 21%|█████████████████                                                                 | 4191/20117 [2:36:57<10:18:33,  2.33s/it] 21%|█████████████████                                                                 | 4192/20117 [2:36:59<10:10:04,  2.30s/it] 21%|█████████████████                                                                 | 4193/20117 [2:37:02<10:04:34,  2.28s/it] 21%|█████████████████                                                                 | 4194/20117 [2:37:04<10:07:30,  2.29s/it] 21%|█████████████████                                                                 | 4195/20117 [2:37:06<10:08:10,  2.29s/it] 21%|█████████████████                                                                 | 4196/20117 [2:37:08<10:05:34,  2.28s/it] 21%|█████████████████                                                                 | 4197/20117 [2:37:11<10:09:22,  2.30s/it] 21%|█████████████████                                                                 | 4198/20117 [2:37:13<10:01:42,  2.27s/it] 21%|█████████████████▎                                                                 | 4199/20117 [2:37:15<9:59:10,  2.26s/it] 21%|█████████████████▎                                                                 | 4200/20117 [2:37:17<9:58:00,  2.25s/it]                                                                                                                                 {'loss': 0.1658, 'grad_norm': 0.32316842675209045, 'learning_rate': 0.000180010709843688, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 288.98, 'epoch': 0.42}
 21%|█████████████████▎                                                                 | 4200/20117 [2:37:18<9:58:00,  2.25s/it] 21%|█████████████████▎                                                                 | 4201/20117 [2:37:20<9:53:24,  2.24s/it] 21%|█████████████████▎                                                                 | 4202/20117 [2:37:22<9:54:20,  2.24s/it] 21%|█████████████████▎                                                                 | 4203/20117 [2:37:24<9:56:29,  2.25s/it] 21%|█████████████████▎                                                                 | 4204/20117 [2:37:26<9:56:15,  2.25s/it] 21%|█████████████████▎                                                                 | 4205/20117 [2:37:29<9:58:58,  2.26s/it] 21%|█████████████████▎                                                                 | 4206/20117 [2:37:31<9:58:48,  2.26s/it] 21%|█████████████████▏                                                                | 4207/20117 [2:37:33<10:03:06,  2.27s/it] 21%|█████████████████▏                                                                | 4208/20117 [2:37:36<10:03:41,  2.28s/it] 21%|█████████████████▏                                                                | 4209/20117 [2:37:38<10:17:37,  2.33s/it] 21%|█████████████████▏                                                                | 4210/20117 [2:37:40<10:12:59,  2.31s/it]                                                                                                                                 {'loss': 0.2719, 'grad_norm': 0.29272300004959106, 'learning_rate': 0.00017991646601942467, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 364.92, 'epoch': 0.42}
 21%|█████████████████▏                                                                | 4210/20117 [2:37:40<10:12:59,  2.31s/it] 21%|█████████████████▏                                                                | 4211/20117 [2:37:43<10:11:05,  2.31s/it] 21%|█████████████████▏                                                                | 4212/20117 [2:37:45<10:06:21,  2.29s/it] 21%|█████████████████▏                                                                | 4213/20117 [2:37:47<10:27:00,  2.37s/it] 21%|█████████████████▏                                                                | 4214/20117 [2:37:50<10:14:20,  2.32s/it] 21%|█████████████████▏                                                                | 4215/20117 [2:37:52<10:12:55,  2.31s/it] 21%|█████████████████▏                                                                | 4216/20117 [2:37:54<10:05:20,  2.28s/it] 21%|█████████████████▏                                                                | 4217/20117 [2:37:56<10:01:11,  2.27s/it] 21%|█████████████████▍                                                                 | 4218/20117 [2:37:59<9:57:28,  2.25s/it] 21%|█████████████████▍                                                                 | 4219/20117 [2:38:01<9:59:34,  2.26s/it] 21%|█████████████████▍                                                                 | 4220/20117 [2:38:03<9:56:35,  2.25s/it]                                                                                                                                 {'loss': 0.2755, 'grad_norm': 0.5684819221496582, 'learning_rate': 0.0001798220253440148, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 317.91, 'epoch': 0.42}
 21%|█████████████████▍                                                                 | 4220/20117 [2:38:03<9:56:35,  2.25s/it] 21%|█████████████████▍                                                                 | 4221/20117 [2:38:05<9:55:23,  2.25s/it] 21%|█████████████████▍                                                                 | 4222/20117 [2:38:08<9:56:17,  2.25s/it] 21%|█████████████████▏                                                                | 4223/20117 [2:38:10<10:02:39,  2.28s/it] 21%|█████████████████▍                                                                 | 4224/20117 [2:38:12<9:58:53,  2.26s/it] 21%|█████████████████▍                                                                 | 4225/20117 [2:38:14<9:57:53,  2.26s/it] 21%|█████████████████▍                                                                 | 4226/20117 [2:38:17<9:52:39,  2.24s/it] 21%|█████████████████▍                                                                 | 4227/20117 [2:38:19<9:56:02,  2.25s/it] 21%|█████████████████▍                                                                 | 4228/20117 [2:38:21<9:53:52,  2.24s/it] 21%|█████████████████▍                                                                 | 4229/20117 [2:38:23<9:55:39,  2.25s/it] 21%|█████████████████▍                                                                 | 4230/20117 [2:38:26<9:57:20,  2.26s/it]                                                                                                                                 {'loss': 0.2131, 'grad_norm': 0.4488314986228943, 'learning_rate': 0.00017972738805008574, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 328.07, 'epoch': 0.42}
 21%|█████████████████▍                                                                 | 4230/20117 [2:38:26<9:57:20,  2.26s/it] 21%|█████████████████▍                                                                 | 4231/20117 [2:38:28<9:58:59,  2.26s/it] 21%|█████████████████▍                                                                 | 4232/20117 [2:38:30<9:57:03,  2.26s/it] 21%|█████████████████▍                                                                 | 4233/20117 [2:38:32<9:57:02,  2.26s/it] 21%|█████████████████▎                                                                | 4234/20117 [2:38:35<10:03:57,  2.28s/it] 21%|█████████████████▎                                                                | 4235/20117 [2:38:37<10:05:39,  2.29s/it] 21%|█████████████████▍                                                                 | 4236/20117 [2:38:39<9:59:22,  2.26s/it] 21%|█████████████████▍                                                                 | 4237/20117 [2:38:42<9:59:06,  2.26s/it] 21%|█████████████████▎                                                                | 4238/20117 [2:38:44<10:01:46,  2.27s/it] 21%|█████████████████▎                                                                | 4239/20117 [2:38:46<10:01:51,  2.27s/it] 21%|█████████████████▍                                                                 | 4240/20117 [2:38:48<9:59:44,  2.27s/it]                                                                                                                                 {'loss': 0.2987, 'grad_norm': 0.3249848484992981, 'learning_rate': 0.0001796325543707491, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 421.46, 'epoch': 0.42}
 21%|█████████████████▍                                                                 | 4240/20117 [2:38:48<9:59:44,  2.27s/it] 21%|█████████████████▍                                                                 | 4241/20117 [2:38:51<9:56:22,  2.25s/it] 21%|█████████████████▌                                                                 | 4242/20117 [2:38:53<9:59:21,  2.27s/it] 21%|█████████████████▌                                                                 | 4243/20117 [2:38:55<9:58:13,  2.26s/it] 21%|█████████████████▌                                                                 | 4244/20117 [2:38:57<9:52:33,  2.24s/it] 21%|█████████████████▌                                                                 | 4245/20117 [2:39:00<9:51:32,  2.24s/it] 21%|█████████████████▌                                                                 | 4246/20117 [2:39:02<9:52:01,  2.24s/it] 21%|█████████████████▌                                                                 | 4247/20117 [2:39:04<9:52:14,  2.24s/it] 21%|█████████████████▌                                                                 | 4248/20117 [2:39:06<9:55:55,  2.25s/it] 21%|█████████████████▌                                                                 | 4249/20117 [2:39:08<9:50:51,  2.23s/it] 21%|█████████████████▌                                                                 | 4250/20117 [2:39:11<9:52:08,  2.24s/it]                                                                                                                                 {'loss': 0.2498, 'grad_norm': 0.6481621265411377, 'learning_rate': 0.00017953752453960038, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.83, 'epoch': 0.42}
 21%|█████████████████▌                                                                 | 4250/20117 [2:39:11<9:52:08,  2.24s/it] 21%|█████████████████▌                                                                 | 4251/20117 [2:39:13<9:52:49,  2.24s/it] 21%|█████████████████▌                                                                 | 4252/20117 [2:39:15<9:57:34,  2.26s/it] 21%|█████████████████▎                                                                | 4253/20117 [2:39:18<10:02:13,  2.28s/it] 21%|█████████████████▎                                                                | 4254/20117 [2:39:20<10:01:06,  2.27s/it] 21%|█████████████████▌                                                                 | 4255/20117 [2:39:22<9:58:07,  2.26s/it] 21%|█████████████████▌                                                                 | 4256/20117 [2:39:24<9:54:46,  2.25s/it] 21%|█████████████████▌                                                                 | 4257/20117 [2:39:27<9:53:45,  2.25s/it] 21%|█████████████████▌                                                                 | 4258/20117 [2:39:29<9:56:10,  2.26s/it] 21%|█████████████████▎                                                                | 4259/20117 [2:39:31<10:00:24,  2.27s/it] 21%|█████████████████▌                                                                 | 4260/20117 [2:39:33<9:59:20,  2.27s/it]                                                                                                                                 {'loss': 0.2295, 'grad_norm': 0.3045104146003723, 'learning_rate': 0.00017944229879071806, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 398.66, 'epoch': 0.42}
 21%|█████████████████▌                                                                 | 4260/20117 [2:39:33<9:59:20,  2.27s/it] 21%|█████████████████▌                                                                 | 4261/20117 [2:39:36<9:56:26,  2.26s/it] 21%|█████████████████▌                                                                 | 4262/20117 [2:39:38<9:57:12,  2.26s/it] 21%|█████████████████▌                                                                 | 4263/20117 [2:39:40<9:57:50,  2.26s/it] 21%|█████████████████▌                                                                 | 4264/20117 [2:39:42<9:59:09,  2.27s/it] 21%|█████████████████▌                                                                 | 4265/20117 [2:39:45<9:58:47,  2.27s/it] 21%|█████████████████▍                                                                | 4266/20117 [2:39:47<10:01:50,  2.28s/it] 21%|█████████████████▍                                                                | 4267/20117 [2:39:49<10:04:06,  2.29s/it] 21%|█████████████████▍                                                                | 4268/20117 [2:39:52<10:38:07,  2.42s/it] 21%|█████████████████▍                                                                | 4269/20117 [2:39:54<10:30:03,  2.39s/it] 21%|█████████████████▍                                                                | 4270/20117 [2:39:57<10:16:31,  2.33s/it]                                                                                                                                 {'loss': 0.2406, 'grad_norm': 0.32762956619262695, 'learning_rate': 0.0001793468773586633, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 399.72, 'epoch': 0.42}
 21%|█████████████████▍                                                                | 4270/20117 [2:39:57<10:16:31,  2.33s/it] 21%|█████████████████▍                                                                | 4271/20117 [2:39:59<10:06:52,  2.30s/it] 21%|█████████████████▍                                                                | 4272/20117 [2:40:01<10:05:30,  2.29s/it] 21%|█████████████████▍                                                                | 4273/20117 [2:40:03<10:06:52,  2.30s/it] 21%|█████████████████▍                                                                | 4274/20117 [2:40:06<10:04:51,  2.29s/it] 21%|█████████████████▍                                                                | 4275/20117 [2:40:08<10:02:59,  2.28s/it] 21%|█████████████████▍                                                                | 4276/20117 [2:40:10<10:00:49,  2.28s/it] 21%|█████████████████▍                                                                | 4277/20117 [2:40:12<10:03:15,  2.29s/it] 21%|█████████████████▍                                                                | 4278/20117 [2:40:15<10:01:37,  2.28s/it] 21%|█████████████████▍                                                                | 4279/20117 [2:40:17<10:01:16,  2.28s/it] 21%|█████████████████▋                                                                 | 4280/20117 [2:40:19<9:58:53,  2.27s/it]                                                                                                                                 {'loss': 0.2523, 'grad_norm': 0.6278170347213745, 'learning_rate': 0.00017925126047847924, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 324.65, 'epoch': 0.43}
 21%|█████████████████▋                                                                 | 4280/20117 [2:40:19<9:58:53,  2.27s/it] 21%|█████████████████▋                                                                 | 4281/20117 [2:40:22<9:56:16,  2.26s/it] 21%|█████████████████▋                                                                 | 4282/20117 [2:40:24<9:53:12,  2.25s/it] 21%|█████████████████▋                                                                 | 4283/20117 [2:40:26<9:50:08,  2.24s/it] 21%|█████████████████▋                                                                 | 4284/20117 [2:40:28<9:53:01,  2.25s/it] 21%|█████████████████▋                                                                 | 4285/20117 [2:40:30<9:54:33,  2.25s/it] 21%|█████████████████▋                                                                 | 4286/20117 [2:40:33<9:54:11,  2.25s/it] 21%|█████████████████▋                                                                 | 4287/20117 [2:40:35<9:53:21,  2.25s/it] 21%|█████████████████▋                                                                 | 4288/20117 [2:40:37<9:55:38,  2.26s/it] 21%|█████████████████▋                                                                 | 4289/20117 [2:40:40<9:55:10,  2.26s/it] 21%|█████████████████▋                                                                 | 4290/20117 [2:40:42<9:55:07,  2.26s/it]                                                                                                                                 {'loss': 0.2615, 'grad_norm': 0.45905986428260803, 'learning_rate': 0.00017915544838569052, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 353.66, 'epoch': 0.43}
 21%|█████████████████▋                                                                 | 4290/20117 [2:40:42<9:55:07,  2.26s/it] 21%|█████████████████▋                                                                 | 4291/20117 [2:40:44<9:52:29,  2.25s/it] 21%|█████████████████▋                                                                 | 4292/20117 [2:40:46<9:51:29,  2.24s/it] 21%|█████████████████▋                                                                 | 4293/20117 [2:40:48<9:50:18,  2.24s/it] 21%|█████████████████▋                                                                 | 4294/20117 [2:40:51<9:48:06,  2.23s/it] 21%|█████████████████▋                                                                 | 4295/20117 [2:40:53<9:53:34,  2.25s/it] 21%|█████████████████▋                                                                 | 4296/20117 [2:40:55<9:52:55,  2.25s/it] 21%|█████████████████▋                                                                 | 4297/20117 [2:40:57<9:56:18,  2.26s/it] 21%|█████████████████▋                                                                 | 4298/20117 [2:41:00<9:53:49,  2.25s/it] 21%|█████████████████▋                                                                 | 4299/20117 [2:41:02<9:51:36,  2.24s/it] 21%|█████████████████▋                                                                 | 4300/20117 [2:41:04<9:52:51,  2.25s/it]                                                                                                                                 {'loss': 0.2519, 'grad_norm': 0.48581770062446594, 'learning_rate': 0.00017905944131630253, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 321.73, 'epoch': 0.43}
 21%|█████████████████▋                                                                 | 4300/20117 [2:41:04<9:52:51,  2.25s/it] 21%|█████████████████▋                                                                 | 4301/20117 [2:41:06<9:53:11,  2.25s/it] 21%|█████████████████▋                                                                 | 4302/20117 [2:41:09<9:54:44,  2.26s/it] 21%|█████████████████▊                                                                 | 4303/20117 [2:41:11<9:56:31,  2.26s/it] 21%|█████████████████▌                                                                | 4304/20117 [2:41:13<10:03:03,  2.29s/it] 21%|█████████████████▌                                                                | 4305/20117 [2:41:16<10:05:00,  2.30s/it] 21%|█████████████████▌                                                                | 4306/20117 [2:41:18<10:04:32,  2.29s/it] 21%|█████████████████▌                                                                | 4307/20117 [2:41:20<10:01:36,  2.28s/it] 21%|█████████████████▊                                                                 | 4308/20117 [2:41:22<9:58:14,  2.27s/it] 21%|█████████████████▊                                                                 | 4309/20117 [2:41:25<9:59:51,  2.28s/it] 21%|█████████████████▊                                                                 | 4310/20117 [2:41:27<9:55:15,  2.26s/it]                                                                                                                                 {'loss': 0.2382, 'grad_norm': 0.49877023696899414, 'learning_rate': 0.00017896323950680098, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 353.68, 'epoch': 0.43}
 21%|█████████████████▊                                                                 | 4310/20117 [2:41:27<9:55:15,  2.26s/it] 21%|█████████████████▊                                                                 | 4311/20117 [2:41:29<9:57:23,  2.27s/it] 21%|█████████████████▊                                                                 | 4312/20117 [2:41:32<9:58:59,  2.27s/it] 21%|█████████████████▌                                                                | 4313/20117 [2:41:34<10:04:31,  2.30s/it] 21%|█████████████████▌                                                                | 4314/20117 [2:41:36<10:03:08,  2.29s/it] 21%|█████████████████▊                                                                 | 4315/20117 [2:41:38<9:58:07,  2.27s/it] 21%|█████████████████▊                                                                 | 4316/20117 [2:41:41<9:55:58,  2.26s/it] 21%|█████████████████▊                                                                 | 4317/20117 [2:41:43<9:50:40,  2.24s/it] 21%|█████████████████▊                                                                 | 4318/20117 [2:41:45<9:49:19,  2.24s/it] 21%|█████████████████▌                                                                | 4319/20117 [2:41:48<10:19:31,  2.35s/it] 21%|█████████████████▌                                                                | 4320/20117 [2:41:50<10:12:10,  2.33s/it]                                                                                                                                 {'loss': 0.2478, 'grad_norm': 0.580008327960968, 'learning_rate': 0.00017886684319415127, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 368.72, 'epoch': 0.43}
 21%|█████████████████▌                                                                | 4320/20117 [2:41:50<10:12:10,  2.33s/it] 21%|█████████████████▌                                                                | 4321/20117 [2:41:52<10:06:39,  2.30s/it] 21%|█████████████████▌                                                                | 4322/20117 [2:41:54<10:06:25,  2.30s/it] 21%|█████████████████▌                                                                | 4323/20117 [2:41:57<10:05:34,  2.30s/it] 21%|█████████████████▊                                                                 | 4324/20117 [2:41:59<9:59:19,  2.28s/it] 21%|█████████████████▊                                                                 | 4325/20117 [2:42:01<9:55:55,  2.26s/it] 22%|█████████████████▊                                                                 | 4326/20117 [2:42:03<9:49:32,  2.24s/it] 22%|█████████████████▊                                                                 | 4327/20117 [2:42:06<9:51:04,  2.25s/it] 22%|█████████████████▊                                                                 | 4328/20117 [2:42:08<9:52:50,  2.25s/it] 22%|█████████████████▊                                                                 | 4329/20117 [2:42:10<9:50:14,  2.24s/it] 22%|█████████████████▊                                                                 | 4330/20117 [2:42:12<9:49:43,  2.24s/it]                                                                                                                                 {'loss': 0.2202, 'grad_norm': 0.3998342454433441, 'learning_rate': 0.00017877025261579788, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 395.43, 'epoch': 0.43}
 22%|█████████████████▊                                                                 | 4330/20117 [2:42:12<9:49:43,  2.24s/it] 22%|█████████████████▊                                                                 | 4331/20117 [2:42:15<9:53:45,  2.26s/it] 22%|█████████████████▊                                                                 | 4332/20117 [2:42:17<9:55:41,  2.26s/it] 22%|█████████████████▉                                                                 | 4333/20117 [2:42:19<9:51:33,  2.25s/it] 22%|█████████████████▉                                                                 | 4334/20117 [2:42:21<9:50:27,  2.24s/it] 22%|█████████████████▉                                                                 | 4335/20117 [2:42:24<9:51:14,  2.25s/it] 22%|█████████████████▋                                                                | 4336/20117 [2:42:26<10:01:33,  2.29s/it] 22%|█████████████████▋                                                                | 4337/20117 [2:42:28<10:04:03,  2.30s/it] 22%|█████████████████▋                                                                | 4338/20117 [2:42:31<10:09:19,  2.32s/it] 22%|█████████████████▋                                                                | 4339/20117 [2:42:33<10:04:39,  2.30s/it] 22%|█████████████████▋                                                                | 4340/20117 [2:42:35<10:07:56,  2.31s/it]                                                                                                                                 {'loss': 0.2521, 'grad_norm': 0.38088393211364746, 'learning_rate': 0.00017867346800966383, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 427.5, 'epoch': 0.43}
 22%|█████████████████▋                                                                | 4340/20117 [2:42:35<10:07:56,  2.31s/it] 22%|█████████████████▋                                                                | 4341/20117 [2:42:38<10:03:00,  2.29s/it] 22%|█████████████████▋                                                                | 4342/20117 [2:42:40<10:00:18,  2.28s/it] 22%|█████████████████▉                                                                 | 4343/20117 [2:42:42<9:58:26,  2.28s/it] 22%|█████████████████▉                                                                 | 4344/20117 [2:42:44<9:59:34,  2.28s/it] 22%|█████████████████▉                                                                 | 4345/20117 [2:42:47<9:55:56,  2.27s/it] 22%|█████████████████▉                                                                 | 4346/20117 [2:42:49<9:49:50,  2.24s/it] 22%|█████████████████▉                                                                 | 4347/20117 [2:42:51<9:51:43,  2.25s/it] 22%|█████████████████▉                                                                 | 4348/20117 [2:42:53<9:52:45,  2.26s/it] 22%|█████████████████▉                                                                 | 4349/20117 [2:42:56<9:50:03,  2.25s/it] 22%|█████████████████▉                                                                 | 4350/20117 [2:42:58<9:49:04,  2.24s/it]                                                                                                                                 {'loss': 0.2353, 'grad_norm': 0.3859814703464508, 'learning_rate': 0.00017857648961415004, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 376.77, 'epoch': 0.43}
 22%|█████████████████▉                                                                 | 4350/20117 [2:42:58<9:49:04,  2.24s/it] 22%|█████████████████▉                                                                 | 4351/20117 [2:43:00<9:46:06,  2.23s/it] 22%|█████████████████▉                                                                 | 4352/20117 [2:43:02<9:42:17,  2.22s/it] 22%|█████████████████▉                                                                 | 4353/20117 [2:43:04<9:42:35,  2.22s/it] 22%|█████████████████▉                                                                 | 4354/20117 [2:43:07<9:38:30,  2.20s/it] 22%|█████████████████▉                                                                 | 4355/20117 [2:43:09<9:38:48,  2.20s/it] 22%|█████████████████▉                                                                 | 4356/20117 [2:43:11<9:43:08,  2.22s/it] 22%|█████████████████▉                                                                 | 4357/20117 [2:43:13<9:50:26,  2.25s/it] 22%|█████████████████▉                                                                 | 4358/20117 [2:43:16<9:52:36,  2.26s/it] 22%|█████████████████▉                                                                 | 4359/20117 [2:43:18<9:55:34,  2.27s/it] 22%|█████████████████▉                                                                 | 4360/20117 [2:43:20<9:56:39,  2.27s/it]                                                                                                                                 {'loss': 0.2567, 'grad_norm': 0.34923896193504333, 'learning_rate': 0.00017847931766813482, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 368.67, 'epoch': 0.43}
 22%|█████████████████▉                                                                 | 4360/20117 [2:43:20<9:56:39,  2.27s/it] 22%|█████████████████▉                                                                 | 4361/20117 [2:43:22<9:55:31,  2.27s/it] 22%|█████████████████▊                                                                | 4362/20117 [2:43:25<10:01:38,  2.29s/it] 22%|██████████████████                                                                 | 4363/20117 [2:43:27<9:58:01,  2.28s/it] 22%|██████████████████                                                                 | 4364/20117 [2:43:29<9:59:30,  2.28s/it] 22%|██████████████████                                                                 | 4365/20117 [2:43:32<9:58:51,  2.28s/it] 22%|█████████████████▊                                                                | 4366/20117 [2:43:34<10:02:23,  2.29s/it] 22%|█████████████████▊                                                                | 4367/20117 [2:43:36<10:08:20,  2.32s/it] 22%|█████████████████▊                                                                | 4368/20117 [2:43:39<10:08:36,  2.32s/it] 22%|█████████████████▊                                                                | 4369/20117 [2:43:41<10:00:32,  2.29s/it] 22%|██████████████████                                                                 | 4370/20117 [2:43:43<9:58:29,  2.28s/it]                                                                                                                                 {'loss': 0.2291, 'grad_norm': 0.34750089049339294, 'learning_rate': 0.0001783819524109732, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.66, 'epoch': 0.43}
 22%|██████████████████                                                                 | 4370/20117 [2:43:43<9:58:29,  2.28s/it] 22%|██████████████████                                                                 | 4371/20117 [2:43:45<9:55:26,  2.27s/it] 22%|█████████████████▊                                                                | 4372/20117 [2:43:48<10:02:10,  2.29s/it] 22%|█████████████████▊                                                                | 4373/20117 [2:43:51<10:38:23,  2.43s/it] 22%|█████████████████▊                                                                | 4374/20117 [2:43:53<10:30:23,  2.40s/it] 22%|█████████████████▊                                                                | 4375/20117 [2:43:55<10:22:26,  2.37s/it] 22%|█████████████████▊                                                                | 4376/20117 [2:43:57<10:16:08,  2.35s/it] 22%|█████████████████▊                                                                | 4377/20117 [2:44:00<10:19:10,  2.36s/it] 22%|█████████████████▊                                                                | 4378/20117 [2:44:02<10:15:48,  2.35s/it] 22%|█████████████████▊                                                                | 4379/20117 [2:44:05<10:16:51,  2.35s/it] 22%|█████████████████▊                                                                | 4380/20117 [2:44:07<10:09:13,  2.32s/it]                                                                                                                                 {'loss': 0.2866, 'grad_norm': 0.36663779616355896, 'learning_rate': 0.0001782843940824964, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 411.45, 'epoch': 0.44}
 22%|█████████████████▊                                                                | 4380/20117 [2:44:07<10:09:13,  2.32s/it] 22%|█████████████████▊                                                                | 4381/20117 [2:44:09<10:05:27,  2.31s/it] 22%|█████████████████▊                                                                | 4382/20117 [2:44:11<10:11:06,  2.33s/it] 22%|█████████████████▊                                                                | 4383/20117 [2:44:14<10:14:13,  2.34s/it] 22%|█████████████████▊                                                                | 4384/20117 [2:44:16<10:05:39,  2.31s/it] 22%|██████████████████                                                                 | 4385/20117 [2:44:18<9:56:51,  2.28s/it] 22%|█████████████████▉                                                                | 4386/20117 [2:44:21<10:00:23,  2.29s/it] 22%|██████████████████                                                                 | 4387/20117 [2:44:23<9:55:03,  2.27s/it] 22%|██████████████████                                                                 | 4388/20117 [2:44:25<9:51:55,  2.26s/it] 22%|██████████████████                                                                 | 4389/20117 [2:44:27<9:55:06,  2.27s/it] 22%|██████████████████                                                                 | 4390/20117 [2:44:30<9:59:45,  2.29s/it]                                                                                                                                 {'loss': 0.2563, 'grad_norm': 0.3929060697555542, 'learning_rate': 0.00017818664292301118, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.12, 'epoch': 0.44}
 22%|██████████████████                                                                 | 4390/20117 [2:44:30<9:59:45,  2.29s/it] 22%|██████████████████                                                                 | 4391/20117 [2:44:32<9:54:34,  2.27s/it] 22%|██████████████████                                                                 | 4392/20117 [2:44:34<9:50:44,  2.25s/it] 22%|██████████████████                                                                 | 4393/20117 [2:44:36<9:50:41,  2.25s/it] 22%|██████████████████▏                                                                | 4394/20117 [2:44:39<9:50:38,  2.25s/it] 22%|██████████████████▏                                                                | 4395/20117 [2:44:41<9:49:26,  2.25s/it] 22%|██████████████████▏                                                                | 4396/20117 [2:44:43<9:50:47,  2.25s/it] 22%|██████████████████▏                                                                | 4397/20117 [2:44:45<9:51:57,  2.26s/it] 22%|██████████████████▏                                                                | 4398/20117 [2:44:48<9:55:02,  2.27s/it] 22%|██████████████████▏                                                                | 4399/20117 [2:44:50<9:54:52,  2.27s/it] 22%|██████████████████▏                                                                | 4400/20117 [2:44:52<9:57:31,  2.28s/it]                                                                                                                                 {'loss': 0.2268, 'grad_norm': 0.4182446599006653, 'learning_rate': 0.0001780886991732993, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 297.67, 'epoch': 0.44}
 22%|██████████████████▏                                                                | 4400/20117 [2:44:52<9:57:31,  2.28s/it] 22%|█████████████████▉                                                                | 4401/20117 [2:44:55<10:00:37,  2.29s/it] 22%|█████████████████▉                                                                | 4402/20117 [2:44:57<10:12:17,  2.34s/it] 22%|█████████████████▉                                                                | 4403/20117 [2:44:59<10:10:42,  2.33s/it] 22%|█████████████████▉                                                                | 4404/20117 [2:45:02<10:03:20,  2.30s/it] 22%|█████████████████▉                                                                | 4405/20117 [2:45:04<10:05:42,  2.31s/it] 22%|█████████████████▉                                                                | 4406/20117 [2:45:06<10:02:32,  2.30s/it] 22%|██████████████████▏                                                                | 4407/20117 [2:45:08<9:56:50,  2.28s/it] 22%|██████████████████▏                                                                | 4408/20117 [2:45:11<9:57:35,  2.28s/it] 22%|██████████████████▏                                                                | 4409/20117 [2:45:13<9:57:33,  2.28s/it] 22%|██████████████████▏                                                                | 4410/20117 [2:45:15<9:52:56,  2.27s/it]                                                                                                                                 {'loss': 0.2629, 'grad_norm': 0.5998858213424683, 'learning_rate': 0.00017799056307461696, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 344.26, 'epoch': 0.44}
 22%|██████████████████▏                                                                | 4410/20117 [2:45:15<9:52:56,  2.27s/it] 22%|██████████████████▏                                                                | 4411/20117 [2:45:17<9:55:47,  2.28s/it] 22%|█████████████████▉                                                                | 4412/20117 [2:45:20<10:11:41,  2.34s/it] 22%|█████████████████▉                                                                | 4413/20117 [2:45:22<10:13:38,  2.34s/it] 22%|█████████████████▉                                                                | 4414/20117 [2:45:25<10:09:36,  2.33s/it] 22%|█████████████████▉                                                                | 4415/20117 [2:45:27<10:05:28,  2.31s/it] 22%|██████████████████                                                                | 4416/20117 [2:45:29<10:09:09,  2.33s/it] 22%|██████████████████                                                                | 4417/20117 [2:45:32<10:04:02,  2.31s/it] 22%|██████████████████▏                                                                | 4418/20117 [2:45:34<9:57:26,  2.28s/it] 22%|██████████████████                                                                | 4419/20117 [2:45:36<10:00:00,  2.29s/it] 22%|██████████████████                                                                | 4420/20117 [2:45:38<10:00:01,  2.29s/it]                                                                                                                                 {'loss': 0.2523, 'grad_norm': 0.4282694160938263, 'learning_rate': 0.0001778922348686941, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 328.11, 'epoch': 0.44}
 22%|██████████████████                                                                | 4420/20117 [2:45:38<10:00:01,  2.29s/it] 22%|██████████████████▏                                                                | 4421/20117 [2:45:41<9:59:51,  2.29s/it] 22%|██████████████████                                                                | 4422/20117 [2:45:43<10:02:27,  2.30s/it] 22%|██████████████████▏                                                                | 4423/20117 [2:45:45<9:59:43,  2.29s/it] 22%|██████████████████                                                                | 4424/20117 [2:45:48<10:19:54,  2.37s/it] 22%|██████████████████                                                                | 4425/20117 [2:45:50<10:09:56,  2.33s/it] 22%|██████████████████                                                                | 4426/20117 [2:45:52<10:03:45,  2.31s/it] 22%|██████████████████                                                                | 4427/20117 [2:45:55<10:02:43,  2.30s/it] 22%|██████████████████                                                                | 4428/20117 [2:45:57<10:00:13,  2.30s/it] 22%|██████████████████▎                                                                | 4429/20117 [2:45:59<9:57:25,  2.28s/it] 22%|██████████████████▎                                                                | 4430/20117 [2:46:01<9:54:13,  2.27s/it]                                                                                                                                 {'loss': 0.27, 'grad_norm': 0.5925072431564331, 'learning_rate': 0.00017779371479773382, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.68, 'epoch': 0.44}
 22%|██████████████████▎                                                                | 4430/20117 [2:46:01<9:54:13,  2.27s/it] 22%|██████████████████▎                                                                | 4431/20117 [2:46:04<9:50:12,  2.26s/it] 22%|██████████████████▎                                                                | 4432/20117 [2:46:06<9:54:42,  2.27s/it] 22%|██████████████████▎                                                                | 4433/20117 [2:46:08<9:52:05,  2.27s/it] 22%|██████████████████▎                                                                | 4434/20117 [2:46:10<9:50:57,  2.26s/it] 22%|██████████████████▎                                                                | 4435/20117 [2:46:13<9:48:33,  2.25s/it] 22%|██████████████████▎                                                                | 4436/20117 [2:46:15<9:47:56,  2.25s/it] 22%|██████████████████▎                                                                | 4437/20117 [2:46:17<9:49:57,  2.26s/it] 22%|██████████████████▎                                                                | 4438/20117 [2:46:19<9:48:44,  2.25s/it] 22%|██████████████████▎                                                                | 4439/20117 [2:46:22<9:46:31,  2.24s/it] 22%|██████████████████▎                                                                | 4440/20117 [2:46:24<9:45:46,  2.24s/it]                                                                                                                                 {'loss': 0.3033, 'grad_norm': 0.5149052739143372, 'learning_rate': 0.00017769500310441192, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 370.47, 'epoch': 0.44}
 22%|██████████████████▎                                                                | 4440/20117 [2:46:24<9:45:46,  2.24s/it] 22%|██████████████████▎                                                                | 4441/20117 [2:46:26<9:49:03,  2.25s/it] 22%|██████████████████▎                                                                | 4442/20117 [2:46:28<9:49:22,  2.26s/it] 22%|██████████████████▎                                                                | 4443/20117 [2:46:31<9:51:03,  2.26s/it] 22%|██████████████████▎                                                                | 4444/20117 [2:46:33<9:56:18,  2.28s/it] 22%|██████████████████▎                                                                | 4445/20117 [2:46:35<9:54:27,  2.28s/it] 22%|██████████████████▎                                                                | 4446/20117 [2:46:38<9:58:52,  2.29s/it] 22%|██████████████████▎                                                                | 4447/20117 [2:46:40<9:55:46,  2.28s/it] 22%|██████████████████▎                                                                | 4448/20117 [2:46:42<9:55:51,  2.28s/it] 22%|██████████████████▎                                                                | 4449/20117 [2:46:44<9:59:38,  2.30s/it] 22%|██████████████████▎                                                                | 4450/20117 [2:46:47<9:56:28,  2.28s/it]                                                                                                                                 {'loss': 0.2193, 'grad_norm': 0.418197363615036, 'learning_rate': 0.00017759610003187617, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.55, 'epoch': 0.44}
 22%|██████████████████▎                                                                | 4450/20117 [2:46:47<9:56:28,  2.28s/it] 22%|██████████████████▎                                                                | 4451/20117 [2:46:49<9:52:14,  2.27s/it] 22%|██████████████████▎                                                                | 4452/20117 [2:46:51<9:55:01,  2.28s/it] 22%|██████████████████▎                                                                | 4453/20117 [2:46:53<9:51:37,  2.27s/it] 22%|██████████████████▍                                                                | 4454/20117 [2:46:56<9:47:14,  2.25s/it] 22%|██████████████████▍                                                                | 4455/20117 [2:46:58<9:47:10,  2.25s/it] 22%|██████████████████▍                                                                | 4456/20117 [2:47:00<9:51:47,  2.27s/it] 22%|██████████████████▍                                                                | 4457/20117 [2:47:02<9:49:17,  2.26s/it] 22%|██████████████████▍                                                                | 4458/20117 [2:47:05<9:50:23,  2.26s/it] 22%|██████████████████▍                                                                | 4459/20117 [2:47:07<9:54:31,  2.28s/it] 22%|██████████████████▍                                                                | 4460/20117 [2:47:09<9:53:38,  2.27s/it]                                                                                                                                 {'loss': 0.1978, 'grad_norm': 0.4415562152862549, 'learning_rate': 0.00017749700582374574, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.08, 'epoch': 0.44}
 22%|██████████████████▍                                                                | 4460/20117 [2:47:09<9:53:38,  2.27s/it] 22%|██████████████████▍                                                                | 4461/20117 [2:47:12<9:59:57,  2.30s/it] 22%|██████████████████▏                                                               | 4462/20117 [2:47:14<10:01:15,  2.30s/it] 22%|██████████████████▏                                                               | 4463/20117 [2:47:16<10:00:02,  2.30s/it] 22%|██████████████████▍                                                                | 4464/20117 [2:47:18<9:50:22,  2.26s/it] 22%|██████████████████▍                                                                | 4465/20117 [2:47:21<9:50:52,  2.27s/it] 22%|██████████████████▍                                                                | 4466/20117 [2:47:23<9:45:36,  2.24s/it] 22%|██████████████████▍                                                                | 4467/20117 [2:47:25<9:44:34,  2.24s/it] 22%|██████████████████▍                                                                | 4468/20117 [2:47:27<9:43:39,  2.24s/it] 22%|██████████████████▍                                                                | 4469/20117 [2:47:30<9:40:10,  2.22s/it] 22%|██████████████████▍                                                                | 4470/20117 [2:47:32<9:37:11,  2.21s/it]                                                                                                                                 {'loss': 0.2448, 'grad_norm': 0.32262691855430603, 'learning_rate': 0.0001773977207241106, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.4, 'epoch': 0.44}
 22%|██████████████████▍                                                                | 4470/20117 [2:47:32<9:37:11,  2.21s/it] 22%|██████████████████▍                                                                | 4471/20117 [2:47:34<9:41:01,  2.23s/it] 22%|██████████████████▍                                                                | 4472/20117 [2:47:36<9:43:31,  2.24s/it] 22%|██████████████████▍                                                                | 4473/20117 [2:47:39<9:46:17,  2.25s/it] 22%|██████████████████▍                                                                | 4474/20117 [2:47:41<9:44:54,  2.24s/it] 22%|██████████████████▍                                                                | 4475/20117 [2:47:43<9:47:41,  2.25s/it] 22%|██████████████████▍                                                                | 4476/20117 [2:47:45<9:43:08,  2.24s/it] 22%|██████████████████▏                                                               | 4477/20117 [2:47:48<10:12:29,  2.35s/it] 22%|██████████████████▎                                                               | 4478/20117 [2:47:50<10:05:52,  2.32s/it] 22%|██████████████████▎                                                               | 4479/20117 [2:47:52<10:01:30,  2.31s/it] 22%|██████████████████▍                                                                | 4480/20117 [2:47:55<9:56:05,  2.29s/it]                                                                                                                                 {'loss': 0.2772, 'grad_norm': 0.49002590775489807, 'learning_rate': 0.00017729824497753093, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 331.65, 'epoch': 0.45}
 22%|██████████████████▍                                                                | 4480/20117 [2:47:55<9:56:05,  2.29s/it] 22%|██████████████████▍                                                                | 4481/20117 [2:47:57<9:57:18,  2.29s/it] 22%|██████████████████▍                                                                | 4482/20117 [2:47:59<9:53:14,  2.28s/it] 22%|██████████████████▍                                                                | 4483/20117 [2:48:02<9:56:57,  2.29s/it] 22%|██████████████████▌                                                                | 4484/20117 [2:48:04<9:56:15,  2.29s/it] 22%|██████████████████▌                                                                | 4485/20117 [2:48:06<9:50:46,  2.27s/it] 22%|██████████████████▌                                                                | 4486/20117 [2:48:08<9:52:28,  2.27s/it] 22%|██████████████████▌                                                                | 4487/20117 [2:48:11<9:55:14,  2.29s/it] 22%|██████████████████▌                                                                | 4488/20117 [2:48:13<9:56:33,  2.29s/it] 22%|██████████████████▌                                                                | 4489/20117 [2:48:15<9:50:07,  2.27s/it] 22%|██████████████████▌                                                                | 4490/20117 [2:48:17<9:50:25,  2.27s/it]                                                                                                                                 {'loss': 0.2557, 'grad_norm': 0.4270131587982178, 'learning_rate': 0.0001771985788290365, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 300.31, 'epoch': 0.45}
 22%|██████████████████▌                                                                | 4490/20117 [2:48:17<9:50:25,  2.27s/it] 22%|██████████████████▌                                                                | 4491/20117 [2:48:20<9:50:16,  2.27s/it] 22%|██████████████████▌                                                                | 4492/20117 [2:48:22<9:50:00,  2.27s/it] 22%|██████████████████▌                                                                | 4493/20117 [2:48:24<9:51:24,  2.27s/it] 22%|██████████████████▌                                                                | 4494/20117 [2:48:27<9:51:21,  2.27s/it] 22%|██████████████████▌                                                                | 4495/20117 [2:48:29<9:49:04,  2.26s/it] 22%|██████████████████▌                                                                | 4496/20117 [2:48:31<9:49:10,  2.26s/it] 22%|██████████████████▌                                                                | 4497/20117 [2:48:33<9:58:28,  2.30s/it] 22%|██████████████████▌                                                                | 4498/20117 [2:48:36<9:54:22,  2.28s/it] 22%|██████████████████▎                                                               | 4499/20117 [2:48:38<10:01:26,  2.31s/it] 22%|██████████████████▌                                                                | 4500/20117 [2:48:40<9:59:46,  2.30s/it]                                                                                                                                 {'loss': 0.2696, 'grad_norm': 0.5524002909660339, 'learning_rate': 0.00017709872252412616, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 348.17, 'epoch': 0.45}
 22%|██████████████████▌                                                                | 4500/20117 [2:48:40<9:59:46,  2.30s/it] 22%|██████████████████▎                                                               | 4501/20117 [2:48:43<10:05:06,  2.32s/it] 22%|██████████████████▎                                                               | 4502/20117 [2:48:45<10:02:25,  2.31s/it] 22%|██████████████████▌                                                                | 4503/20117 [2:48:47<9:58:28,  2.30s/it] 22%|██████████████████▌                                                                | 4504/20117 [2:48:49<9:54:45,  2.29s/it] 22%|██████████████████▌                                                                | 4505/20117 [2:48:52<9:51:34,  2.27s/it] 22%|██████████████████▌                                                                | 4506/20117 [2:48:54<9:52:56,  2.28s/it] 22%|██████████████████▌                                                                | 4507/20117 [2:48:56<9:55:16,  2.29s/it] 22%|██████████████████▌                                                                | 4508/20117 [2:48:59<9:56:37,  2.29s/it] 22%|██████████████████▌                                                                | 4509/20117 [2:49:01<9:56:21,  2.29s/it] 22%|██████████████████▌                                                                | 4510/20117 [2:49:03<9:55:12,  2.29s/it]                                                                                                                                 {'loss': 0.1997, 'grad_norm': 0.32532012462615967, 'learning_rate': 0.00017699867630876703, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 315.46, 'epoch': 0.45}
 22%|██████████████████▌                                                                | 4510/20117 [2:49:03<9:55:12,  2.29s/it] 22%|██████████████████▌                                                                | 4511/20117 [2:49:05<9:53:43,  2.28s/it] 22%|██████████████████▌                                                                | 4512/20117 [2:49:08<9:50:30,  2.27s/it] 22%|██████████████████▌                                                                | 4513/20117 [2:49:10<9:51:34,  2.27s/it] 22%|██████████████████▌                                                                | 4514/20117 [2:49:12<9:48:31,  2.26s/it] 22%|██████████████████▋                                                                | 4515/20117 [2:49:15<9:48:01,  2.26s/it] 22%|██████████████████▋                                                                | 4516/20117 [2:49:17<9:47:58,  2.26s/it] 22%|██████████████████▋                                                                | 4517/20117 [2:49:19<9:52:27,  2.28s/it] 22%|██████████████████▋                                                                | 4518/20117 [2:49:21<9:50:02,  2.27s/it] 22%|██████████████████▋                                                                | 4519/20117 [2:49:24<9:46:57,  2.26s/it] 22%|██████████████████▋                                                                | 4520/20117 [2:49:26<9:45:11,  2.25s/it]                                                                                                                                 {'loss': 0.2824, 'grad_norm': 0.49136292934417725, 'learning_rate': 0.0001768984404293941, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 391.14, 'epoch': 0.45}
 22%|██████████████████▋                                                                | 4520/20117 [2:49:26<9:45:11,  2.25s/it] 22%|██████████████████▋                                                                | 4521/20117 [2:49:28<9:50:09,  2.27s/it] 22%|██████████████████▋                                                                | 4522/20117 [2:49:30<9:49:01,  2.27s/it] 22%|██████████████████▋                                                                | 4523/20117 [2:49:33<9:47:45,  2.26s/it] 22%|██████████████████▋                                                                | 4524/20117 [2:49:35<9:44:43,  2.25s/it] 22%|██████████████████▋                                                                | 4525/20117 [2:49:37<9:48:05,  2.26s/it] 22%|██████████████████▋                                                                | 4526/20117 [2:49:39<9:50:29,  2.27s/it] 23%|██████████████████▋                                                                | 4527/20117 [2:49:42<9:49:30,  2.27s/it] 23%|██████████████████▋                                                                | 4528/20117 [2:49:44<9:47:07,  2.26s/it] 23%|██████████████████▋                                                                | 4529/20117 [2:49:46<9:47:43,  2.26s/it] 23%|██████████████████▋                                                                | 4530/20117 [2:49:48<9:45:34,  2.25s/it]                                                                                                                                 {'loss': 0.2931, 'grad_norm': 0.43822959065437317, 'learning_rate': 0.00017679801513290956, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 382.92, 'epoch': 0.45}
 23%|██████████████████▋                                                                | 4530/20117 [2:49:48<9:45:34,  2.25s/it] 23%|██████████████████▍                                                               | 4531/20117 [2:49:51<10:13:04,  2.36s/it] 23%|██████████████████▍                                                               | 4532/20117 [2:49:53<10:06:08,  2.33s/it] 23%|██████████████████▋                                                                | 4533/20117 [2:49:56<9:57:53,  2.30s/it] 23%|██████████████████▋                                                                | 4534/20117 [2:49:58<9:52:21,  2.28s/it] 23%|██████████████████▋                                                                | 4535/20117 [2:50:00<9:51:28,  2.28s/it] 23%|██████████████████▋                                                                | 4536/20117 [2:50:02<9:51:59,  2.28s/it] 23%|██████████████████▋                                                                | 4537/20117 [2:50:05<9:51:09,  2.28s/it] 23%|██████████████████▋                                                                | 4538/20117 [2:50:07<9:45:54,  2.26s/it] 23%|██████████████████▋                                                                | 4539/20117 [2:50:09<9:43:11,  2.25s/it] 23%|██████████████████▋                                                                | 4540/20117 [2:50:11<9:50:43,  2.28s/it]                                                                                                                                 {'loss': 0.2444, 'grad_norm': 0.448585569858551, 'learning_rate': 0.00017669740066668214, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 364.48, 'epoch': 0.45}
 23%|██████████████████▋                                                                | 4540/20117 [2:50:11<9:50:43,  2.28s/it] 23%|██████████████████▋                                                                | 4541/20117 [2:50:14<9:44:54,  2.25s/it] 23%|██████████████████▋                                                                | 4542/20117 [2:50:16<9:39:43,  2.23s/it] 23%|██████████████████▋                                                                | 4543/20117 [2:50:18<9:32:49,  2.21s/it] 23%|██████████████████▋                                                                | 4544/20117 [2:50:20<9:34:22,  2.21s/it] 23%|██████████████████▊                                                                | 4545/20117 [2:50:22<9:35:52,  2.22s/it] 23%|██████████████████▊                                                                | 4546/20117 [2:50:25<9:48:58,  2.27s/it] 23%|██████████████████▊                                                                | 4547/20117 [2:50:27<9:57:40,  2.30s/it] 23%|██████████████████▌                                                               | 4548/20117 [2:50:30<10:11:33,  2.36s/it] 23%|██████████████████▌                                                               | 4549/20117 [2:50:32<10:19:32,  2.39s/it] 23%|██████████████████▌                                                               | 4550/20117 [2:50:34<10:10:58,  2.35s/it]                                                                                                                                 {'loss': 0.2227, 'grad_norm': 0.3125511407852173, 'learning_rate': 0.0001765965972785465, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.51, 'epoch': 0.45}
 23%|██████████████████▌                                                               | 4550/20117 [2:50:34<10:10:58,  2.35s/it] 23%|██████████████████▌                                                               | 4551/20117 [2:50:37<10:06:14,  2.34s/it] 23%|██████████████████▌                                                               | 4552/20117 [2:50:39<10:03:07,  2.32s/it] 23%|██████████████████▊                                                                | 4553/20117 [2:50:41<9:58:52,  2.31s/it] 23%|██████████████████▊                                                                | 4554/20117 [2:50:44<9:56:39,  2.30s/it] 23%|██████████████████▊                                                                | 4555/20117 [2:50:46<9:55:32,  2.30s/it] 23%|██████████████████▊                                                                | 4556/20117 [2:50:48<9:53:18,  2.29s/it] 23%|██████████████████▊                                                                | 4557/20117 [2:50:50<9:52:13,  2.28s/it] 23%|██████████████████▊                                                                | 4558/20117 [2:50:53<9:51:31,  2.28s/it] 23%|██████████████████▊                                                                | 4559/20117 [2:50:55<9:44:48,  2.26s/it] 23%|██████████████████▊                                                                | 4560/20117 [2:50:57<9:44:37,  2.25s/it]                                                                                                                                 {'loss': 0.3157, 'grad_norm': 0.6492682695388794, 'learning_rate': 0.00017649560521680266, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 361.61, 'epoch': 0.45}
 23%|██████████████████▊                                                                | 4560/20117 [2:50:57<9:44:37,  2.25s/it] 23%|██████████████████▊                                                                | 4561/20117 [2:50:59<9:41:53,  2.24s/it] 23%|██████████████████▊                                                                | 4562/20117 [2:51:02<9:42:04,  2.25s/it] 23%|██████████████████▊                                                                | 4563/20117 [2:51:04<9:42:42,  2.25s/it] 23%|██████████████████▊                                                                | 4564/20117 [2:51:06<9:48:31,  2.27s/it] 23%|██████████████████▊                                                                | 4565/20117 [2:51:08<9:50:21,  2.28s/it] 23%|██████████████████▊                                                                | 4566/20117 [2:51:11<9:49:23,  2.27s/it] 23%|██████████████████▊                                                                | 4567/20117 [2:51:13<9:52:23,  2.29s/it] 23%|██████████████████▊                                                                | 4568/20117 [2:51:15<9:57:28,  2.31s/it] 23%|██████████████████▊                                                                | 4569/20117 [2:51:18<9:53:32,  2.29s/it] 23%|██████████████████▊                                                                | 4570/20117 [2:51:20<9:50:35,  2.28s/it]                                                                                                                                 {'loss': 0.2644, 'grad_norm': 0.23529411852359772, 'learning_rate': 0.0001763944247302155, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 333.05, 'epoch': 0.45}
 23%|██████████████████▊                                                                | 4570/20117 [2:51:20<9:50:35,  2.28s/it] 23%|██████████████████▊                                                                | 4571/20117 [2:51:22<9:52:50,  2.29s/it] 23%|██████████████████▊                                                                | 4572/20117 [2:51:24<9:54:34,  2.29s/it] 23%|██████████████████▊                                                                | 4573/20117 [2:51:27<9:51:35,  2.28s/it] 23%|██████████████████▊                                                                | 4574/20117 [2:51:29<9:50:17,  2.28s/it] 23%|██████████████████▉                                                                | 4575/20117 [2:51:31<9:57:56,  2.31s/it] 23%|██████████████████▉                                                                | 4576/20117 [2:51:34<9:56:29,  2.30s/it] 23%|██████████████████▉                                                                | 4577/20117 [2:51:36<9:52:07,  2.29s/it] 23%|██████████████████▉                                                                | 4578/20117 [2:51:38<9:52:54,  2.29s/it] 23%|██████████████████▉                                                                | 4579/20117 [2:51:40<9:54:37,  2.30s/it] 23%|██████████████████▉                                                                | 4580/20117 [2:51:43<9:53:06,  2.29s/it]                                                                                                                                 {'loss': 0.1995, 'grad_norm': 0.37334245443344116, 'learning_rate': 0.00017629305606801387, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.12, 'epoch': 0.46}
 23%|██████████████████▉                                                                | 4580/20117 [2:51:43<9:53:06,  2.29s/it] 23%|██████████████████▉                                                                | 4581/20117 [2:51:45<9:54:06,  2.29s/it] 23%|██████████████████▋                                                               | 4582/20117 [2:51:48<10:13:06,  2.37s/it] 23%|██████████████████▋                                                               | 4583/20117 [2:51:50<10:03:32,  2.33s/it] 23%|██████████████████▉                                                                | 4584/20117 [2:51:52<9:52:41,  2.29s/it] 23%|██████████████████▉                                                                | 4585/20117 [2:51:54<9:46:19,  2.26s/it] 23%|██████████████████▉                                                                | 4586/20117 [2:51:57<9:44:54,  2.26s/it] 23%|██████████████████▉                                                                | 4587/20117 [2:51:59<9:46:19,  2.27s/it] 23%|██████████████████▉                                                                | 4588/20117 [2:52:01<9:39:25,  2.24s/it] 23%|██████████████████▉                                                                | 4589/20117 [2:52:03<9:41:07,  2.25s/it] 23%|██████████████████▉                                                                | 4590/20117 [2:52:05<9:41:02,  2.25s/it]                                                                                                                                 {'loss': 0.201, 'grad_norm': 0.26320332288742065, 'learning_rate': 0.00017619149947989028, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 333.8, 'epoch': 0.46}
 23%|██████████████████▉                                                                | 4590/20117 [2:52:05<9:41:02,  2.25s/it] 23%|██████████████████▉                                                                | 4591/20117 [2:52:08<9:40:48,  2.24s/it] 23%|██████████████████▉                                                                | 4592/20117 [2:52:10<9:46:50,  2.27s/it] 23%|██████████████████▉                                                                | 4593/20117 [2:52:12<9:47:45,  2.27s/it] 23%|██████████████████▉                                                                | 4594/20117 [2:52:15<9:45:53,  2.26s/it] 23%|██████████████████▉                                                                | 4595/20117 [2:52:17<9:48:51,  2.28s/it] 23%|██████████████████▉                                                                | 4596/20117 [2:52:19<9:46:23,  2.27s/it] 23%|██████████████████▉                                                                | 4597/20117 [2:52:21<9:40:25,  2.24s/it] 23%|██████████████████▉                                                                | 4598/20117 [2:52:24<9:42:58,  2.25s/it] 23%|██████████████████▉                                                                | 4599/20117 [2:52:26<9:41:02,  2.25s/it] 23%|██████████████████▉                                                                | 4600/20117 [2:52:28<9:43:12,  2.26s/it]                                                                                                                                 {'loss': 0.3037, 'grad_norm': 0.4711815416812897, 'learning_rate': 0.000176089755216, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 398.88, 'epoch': 0.46}
 23%|██████████████████▉                                                                | 4600/20117 [2:52:28<9:43:12,  2.26s/it] 23%|██████████████████▉                                                                | 4601/20117 [2:52:30<9:41:27,  2.25s/it] 23%|██████████████████▉                                                                | 4602/20117 [2:52:33<9:42:10,  2.25s/it] 23%|██████████████████▉                                                                | 4603/20117 [2:52:35<9:41:27,  2.25s/it] 23%|██████████████████▉                                                                | 4604/20117 [2:52:37<9:51:14,  2.29s/it] 23%|██████████████████▉                                                                | 4605/20117 [2:52:39<9:50:27,  2.28s/it] 23%|███████████████████                                                                | 4606/20117 [2:52:42<9:55:29,  2.30s/it] 23%|███████████████████                                                                | 4607/20117 [2:52:44<9:50:52,  2.29s/it] 23%|███████████████████                                                                | 4608/20117 [2:52:46<9:50:52,  2.29s/it] 23%|██████████████████▊                                                               | 4609/20117 [2:52:49<10:03:56,  2.34s/it] 23%|███████████████████                                                                | 4610/20117 [2:52:51<9:59:58,  2.32s/it]                                                                                                                                 {'loss': 0.2393, 'grad_norm': 0.3597434461116791, 'learning_rate': 0.0001759878235269607, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 340.42, 'epoch': 0.46}
 23%|███████████████████                                                                | 4610/20117 [2:52:51<9:59:58,  2.32s/it] 23%|███████████████████                                                                | 4611/20117 [2:52:53<9:55:07,  2.30s/it] 23%|███████████████████                                                                | 4612/20117 [2:52:56<9:54:34,  2.30s/it] 23%|███████████████████                                                                | 4613/20117 [2:52:58<9:52:49,  2.29s/it] 23%|███████████████████                                                                | 4614/20117 [2:53:00<9:53:16,  2.30s/it] 23%|███████████████████                                                                | 4615/20117 [2:53:03<9:54:25,  2.30s/it] 23%|███████████████████                                                                | 4616/20117 [2:53:05<9:56:29,  2.31s/it] 23%|███████████████████                                                                | 4617/20117 [2:53:07<9:50:32,  2.29s/it] 23%|███████████████████                                                                | 4618/20117 [2:53:09<9:48:17,  2.28s/it] 23%|███████████████████                                                                | 4619/20117 [2:53:12<9:48:38,  2.28s/it] 23%|███████████████████                                                                | 4620/20117 [2:53:14<9:44:37,  2.26s/it]                                                                                                                                 {'loss': 0.314, 'grad_norm': 0.5157446265220642, 'learning_rate': 0.00017588570466385166, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 374.07, 'epoch': 0.46}
 23%|███████████████████                                                                | 4620/20117 [2:53:14<9:44:37,  2.26s/it] 23%|███████████████████                                                                | 4621/20117 [2:53:16<9:45:33,  2.27s/it] 23%|███████████████████                                                                | 4622/20117 [2:53:18<9:45:06,  2.27s/it] 23%|███████████████████                                                                | 4623/20117 [2:53:21<9:47:27,  2.27s/it] 23%|███████████████████                                                                | 4624/20117 [2:53:23<9:43:53,  2.26s/it] 23%|███████████████████                                                                | 4625/20117 [2:53:25<9:48:38,  2.28s/it] 23%|███████████████████                                                                | 4626/20117 [2:53:28<9:47:02,  2.27s/it] 23%|███████████████████                                                                | 4627/20117 [2:53:30<9:44:33,  2.26s/it] 23%|███████████████████                                                                | 4628/20117 [2:53:32<9:43:52,  2.26s/it] 23%|███████████████████                                                                | 4629/20117 [2:53:34<9:47:10,  2.27s/it] 23%|███████████████████                                                                | 4630/20117 [2:53:37<9:48:06,  2.28s/it]                                                                                                                                 {'loss': 0.2606, 'grad_norm': 0.4747403562068939, 'learning_rate': 0.0001757833988782132, 'memory/max_active (GiB)': 21.47, 'memory/max_allocated (GiB)': 21.47, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 333.75, 'epoch': 0.46}
 23%|███████████████████                                                                | 4630/20117 [2:53:37<9:48:06,  2.28s/it] 23%|███████████████████                                                                | 4631/20117 [2:53:39<9:45:57,  2.27s/it] 23%|███████████████████                                                                | 4632/20117 [2:53:41<9:43:30,  2.26s/it] 23%|███████████████████                                                                | 4633/20117 [2:53:43<9:38:49,  2.24s/it] 23%|██████████████████▉                                                               | 4634/20117 [2:53:46<10:03:40,  2.34s/it] 23%|███████████████████                                                                | 4635/20117 [2:53:48<9:56:25,  2.31s/it] 23%|███████████████████▏                                                               | 4636/20117 [2:53:50<9:53:36,  2.30s/it] 23%|███████████████████▏                                                               | 4637/20117 [2:53:53<9:46:26,  2.27s/it] 23%|███████████████████▏                                                               | 4638/20117 [2:53:55<9:49:37,  2.29s/it] 23%|███████████████████▏                                                               | 4639/20117 [2:53:57<9:49:06,  2.28s/it] 23%|███████████████████▏                                                               | 4640/20117 [2:53:59<9:42:41,  2.26s/it]                                                                                                                                 {'loss': 0.2106, 'grad_norm': 0.45278453826904297, 'learning_rate': 0.00017568090642204612, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.11, 'epoch': 0.46}
 23%|███████████████████▏                                                               | 4640/20117 [2:53:59<9:42:41,  2.26s/it] 23%|███████████████████▏                                                               | 4641/20117 [2:54:02<9:41:53,  2.26s/it] 23%|███████████████████▏                                                               | 4642/20117 [2:54:04<9:41:22,  2.25s/it] 23%|███████████████████▏                                                               | 4643/20117 [2:54:06<9:40:35,  2.25s/it] 23%|███████████████████▏                                                               | 4644/20117 [2:54:08<9:39:40,  2.25s/it] 23%|███████████████████▏                                                               | 4645/20117 [2:54:11<9:39:54,  2.25s/it] 23%|███████████████████▏                                                               | 4646/20117 [2:54:13<9:39:02,  2.25s/it] 23%|███████████████████▏                                                               | 4647/20117 [2:54:15<9:41:15,  2.25s/it] 23%|███████████████████▏                                                               | 4648/20117 [2:54:17<9:44:32,  2.27s/it] 23%|███████████████████▏                                                               | 4649/20117 [2:54:20<9:44:01,  2.27s/it] 23%|███████████████████▏                                                               | 4650/20117 [2:54:22<9:40:08,  2.25s/it]                                                                                                                                 {'loss': 0.2457, 'grad_norm': 0.7563058137893677, 'learning_rate': 0.00017557822754781102, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 370.21, 'epoch': 0.46}
 23%|███████████████████▏                                                               | 4650/20117 [2:54:22<9:40:08,  2.25s/it] 23%|███████████████████▏                                                               | 4651/20117 [2:54:24<9:36:53,  2.24s/it] 23%|███████████████████▏                                                               | 4652/20117 [2:54:26<9:40:01,  2.25s/it] 23%|███████████████████▏                                                               | 4653/20117 [2:54:29<9:39:20,  2.25s/it] 23%|███████████████████▏                                                               | 4654/20117 [2:54:31<9:43:15,  2.26s/it] 23%|███████████████████▏                                                               | 4655/20117 [2:54:33<9:50:02,  2.29s/it] 23%|███████████████████▏                                                               | 4656/20117 [2:54:36<9:52:29,  2.30s/it] 23%|███████████████████▏                                                               | 4657/20117 [2:54:38<9:51:40,  2.30s/it] 23%|███████████████████▏                                                               | 4658/20117 [2:54:40<9:51:35,  2.30s/it] 23%|███████████████████▏                                                               | 4659/20117 [2:54:43<9:59:44,  2.33s/it] 23%|███████████████████▏                                                               | 4660/20117 [2:54:45<9:59:14,  2.33s/it]                                                                                                                                 {'loss': 0.2659, 'grad_norm': 0.27436432242393494, 'learning_rate': 0.00017547536250842765, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.64, 'epoch': 0.46}
 23%|███████████████████▏                                                               | 4660/20117 [2:54:45<9:59:14,  2.33s/it] 23%|███████████████████▏                                                               | 4661/20117 [2:54:47<9:51:26,  2.30s/it] 23%|███████████████████▏                                                               | 4662/20117 [2:54:49<9:53:38,  2.30s/it] 23%|███████████████████▏                                                               | 4663/20117 [2:54:52<9:51:43,  2.30s/it] 23%|███████████████████▏                                                               | 4664/20117 [2:54:54<9:51:24,  2.30s/it] 23%|███████████████████▏                                                               | 4665/20117 [2:54:56<9:45:47,  2.27s/it] 23%|███████████████████▎                                                               | 4666/20117 [2:54:59<9:43:37,  2.27s/it] 23%|███████████████████▎                                                               | 4667/20117 [2:55:01<9:39:21,  2.25s/it] 23%|███████████████████▎                                                               | 4668/20117 [2:55:03<9:45:36,  2.27s/it] 23%|███████████████████▎                                                               | 4669/20117 [2:55:05<9:47:29,  2.28s/it] 23%|███████████████████▎                                                               | 4670/20117 [2:55:08<9:51:19,  2.30s/it]                                                                                                                                 {'loss': 0.2744, 'grad_norm': 0.3469400107860565, 'learning_rate': 0.00017537231155727428, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 354.35, 'epoch': 0.46}
 23%|███████████████████▎                                                               | 4670/20117 [2:55:08<9:51:19,  2.30s/it] 23%|███████████████████▎                                                               | 4671/20117 [2:55:10<9:50:57,  2.30s/it] 23%|███████████████████▎                                                               | 4672/20117 [2:55:12<9:59:06,  2.33s/it] 23%|███████████████████▎                                                               | 4673/20117 [2:55:15<9:50:58,  2.30s/it] 23%|███████████████████▎                                                               | 4674/20117 [2:55:17<9:50:25,  2.29s/it] 23%|███████████████████▎                                                               | 4675/20117 [2:55:19<9:45:24,  2.27s/it] 23%|███████████████████▎                                                               | 4676/20117 [2:55:21<9:38:33,  2.25s/it] 23%|███████████████████▎                                                               | 4677/20117 [2:55:24<9:36:37,  2.24s/it] 23%|███████████████████▎                                                               | 4678/20117 [2:55:26<9:34:20,  2.23s/it] 23%|███████████████████▎                                                               | 4679/20117 [2:55:28<9:34:00,  2.23s/it] 23%|███████████████████▎                                                               | 4680/20117 [2:55:30<9:34:48,  2.23s/it]                                                                                                                                 {'loss': 0.236, 'grad_norm': 0.29021984338760376, 'learning_rate': 0.0001752690749481873, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.93, 'epoch': 0.47}
 23%|███████████████████▎                                                               | 4680/20117 [2:55:30<9:34:48,  2.23s/it] 23%|███████████████████▎                                                               | 4681/20117 [2:55:33<9:42:53,  2.27s/it] 23%|███████████████████▎                                                               | 4682/20117 [2:55:35<9:39:39,  2.25s/it] 23%|███████████████████▎                                                               | 4683/20117 [2:55:37<9:40:16,  2.26s/it] 23%|███████████████████▎                                                               | 4684/20117 [2:55:39<9:39:01,  2.25s/it] 23%|███████████████████▎                                                               | 4685/20117 [2:55:42<9:37:11,  2.24s/it] 23%|███████████████████                                                               | 4686/20117 [2:55:44<10:14:56,  2.39s/it] 23%|███████████████████                                                               | 4687/20117 [2:55:46<10:01:52,  2.34s/it] 23%|███████████████████                                                               | 4688/20117 [2:55:49<10:00:07,  2.33s/it] 23%|███████████████████▎                                                               | 4689/20117 [2:55:51<9:54:42,  2.31s/it] 23%|███████████████████▎                                                               | 4690/20117 [2:55:53<9:48:18,  2.29s/it]                                                                                                                                 {'loss': 0.2694, 'grad_norm': 0.28982868790626526, 'learning_rate': 0.00017516565293546025, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 433.34, 'epoch': 0.47}
 23%|███████████████████▎                                                               | 4690/20117 [2:55:53<9:48:18,  2.29s/it] 23%|███████████████████▎                                                               | 4691/20117 [2:55:56<9:48:15,  2.29s/it] 23%|███████████████████▎                                                               | 4692/20117 [2:55:58<9:44:24,  2.27s/it] 23%|███████████████████▎                                                               | 4693/20117 [2:56:00<9:46:24,  2.28s/it] 23%|███████████████████▎                                                               | 4694/20117 [2:56:02<9:44:40,  2.27s/it] 23%|███████████████████▎                                                               | 4695/20117 [2:56:05<9:45:22,  2.28s/it] 23%|███████████████████▍                                                               | 4696/20117 [2:56:07<9:42:30,  2.27s/it] 23%|███████████████████▍                                                               | 4697/20117 [2:56:09<9:38:49,  2.25s/it] 23%|███████████████████▍                                                               | 4698/20117 [2:56:11<9:39:37,  2.26s/it] 23%|███████████████████▍                                                               | 4699/20117 [2:56:14<9:38:07,  2.25s/it] 23%|███████████████████▍                                                               | 4700/20117 [2:56:16<9:37:28,  2.25s/it]                                                                                                                                 {'loss': 0.2209, 'grad_norm': 0.39744624495506287, 'learning_rate': 0.00017506204577384337, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.58, 'epoch': 0.47}
 23%|███████████████████▍                                                               | 4700/20117 [2:56:16<9:37:28,  2.25s/it] 23%|███████████████████▍                                                               | 4701/20117 [2:56:18<9:36:33,  2.24s/it] 23%|███████████████████▍                                                               | 4702/20117 [2:56:20<9:31:47,  2.23s/it] 23%|███████████████████▍                                                               | 4703/20117 [2:56:22<9:31:51,  2.23s/it] 23%|███████████████████▍                                                               | 4704/20117 [2:56:25<9:39:29,  2.26s/it] 23%|███████████████████▍                                                               | 4705/20117 [2:56:27<9:40:11,  2.26s/it] 23%|███████████████████▍                                                               | 4706/20117 [2:56:29<9:41:20,  2.26s/it] 23%|███████████████████▍                                                               | 4707/20117 [2:56:32<9:35:40,  2.24s/it] 23%|███████████████████▍                                                               | 4708/20117 [2:56:34<9:31:41,  2.23s/it] 23%|███████████████████▍                                                               | 4709/20117 [2:56:36<9:33:45,  2.23s/it] 23%|███████████████████▍                                                               | 4710/20117 [2:56:38<9:36:36,  2.25s/it]                                                                                                                                 {'loss': 0.2147, 'grad_norm': 0.14510154724121094, 'learning_rate': 0.00017495825371854302, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 342.34, 'epoch': 0.47}
 23%|███████████████████▍                                                               | 4710/20117 [2:56:38<9:36:36,  2.25s/it] 23%|███████████████████▍                                                               | 4711/20117 [2:56:41<9:37:22,  2.25s/it] 23%|███████████████████▍                                                               | 4712/20117 [2:56:43<9:36:04,  2.24s/it] 23%|███████████████████▍                                                               | 4713/20117 [2:56:45<9:36:55,  2.25s/it] 23%|███████████████████▍                                                               | 4714/20117 [2:56:47<9:39:07,  2.26s/it] 23%|███████████████████▍                                                               | 4715/20117 [2:56:50<9:40:49,  2.26s/it] 23%|███████████████████▍                                                               | 4716/20117 [2:56:52<9:37:12,  2.25s/it] 23%|███████████████████▍                                                               | 4717/20117 [2:56:54<9:36:53,  2.25s/it] 23%|███████████████████▍                                                               | 4718/20117 [2:56:56<9:35:04,  2.24s/it] 23%|███████████████████▍                                                               | 4719/20117 [2:56:58<9:32:41,  2.23s/it] 23%|███████████████████▍                                                               | 4720/20117 [2:57:01<9:36:25,  2.25s/it]                                                                                                                                 {'loss': 0.2411, 'grad_norm': 0.26011091470718384, 'learning_rate': 0.000174854277025221, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 395.32, 'epoch': 0.47}
 23%|███████████████████▍                                                               | 4720/20117 [2:57:01<9:36:25,  2.25s/it] 23%|███████████████████▍                                                               | 4721/20117 [2:57:03<9:38:50,  2.26s/it] 23%|███████████████████▍                                                               | 4722/20117 [2:57:05<9:35:00,  2.24s/it] 23%|███████████████████▍                                                               | 4723/20117 [2:57:07<9:36:47,  2.25s/it] 23%|███████████████████▍                                                               | 4724/20117 [2:57:10<9:40:32,  2.26s/it] 23%|███████████████████▍                                                               | 4725/20117 [2:57:12<9:41:23,  2.27s/it] 23%|███████████████████▍                                                               | 4726/20117 [2:57:14<9:38:47,  2.26s/it] 23%|███████████████████▌                                                               | 4727/20117 [2:57:17<9:37:21,  2.25s/it] 24%|███████████████████▌                                                               | 4728/20117 [2:57:19<9:39:06,  2.26s/it] 24%|███████████████████▌                                                               | 4729/20117 [2:57:21<9:37:01,  2.25s/it] 24%|███████████████████▌                                                               | 4730/20117 [2:57:23<9:37:08,  2.25s/it]                                                                                                                                 {'loss': 0.2466, 'grad_norm': 0.5020186901092529, 'learning_rate': 0.00017475011594999385, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.96, 'epoch': 0.47}
 24%|███████████████████▌                                                               | 4730/20117 [2:57:23<9:37:08,  2.25s/it] 24%|███████████████████▌                                                               | 4731/20117 [2:57:25<9:32:08,  2.23s/it] 24%|███████████████████▌                                                               | 4732/20117 [2:57:28<9:28:31,  2.22s/it] 24%|███████████████████▌                                                               | 4733/20117 [2:57:30<9:24:42,  2.20s/it] 24%|███████████████████▌                                                               | 4734/20117 [2:57:32<9:21:50,  2.19s/it] 24%|███████████████████▌                                                               | 4735/20117 [2:57:34<9:21:28,  2.19s/it] 24%|███████████████████▌                                                               | 4736/20117 [2:57:36<9:30:10,  2.22s/it] 24%|███████████████████▌                                                               | 4737/20117 [2:57:39<9:53:37,  2.32s/it] 24%|███████████████████▌                                                               | 4738/20117 [2:57:41<9:49:05,  2.30s/it] 24%|███████████████████▌                                                               | 4739/20117 [2:57:44<9:51:38,  2.31s/it] 24%|███████████████████▌                                                               | 4740/20117 [2:57:46<9:55:34,  2.32s/it]                                                                                                                                 {'loss': 0.3094, 'grad_norm': 0.5160934925079346, 'learning_rate': 0.0001746457707494323, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.47, 'epoch': 0.47}
 24%|███████████████████▌                                                               | 4740/20117 [2:57:46<9:55:34,  2.32s/it] 24%|███████████████████▌                                                               | 4741/20117 [2:57:48<9:57:29,  2.33s/it] 24%|███████████████████▌                                                               | 4742/20117 [2:57:51<9:52:27,  2.31s/it] 24%|███████████████████▌                                                               | 4743/20117 [2:57:53<9:48:49,  2.30s/it] 24%|███████████████████▌                                                               | 4744/20117 [2:57:55<9:50:42,  2.31s/it] 24%|███████████████████▌                                                               | 4745/20117 [2:57:57<9:51:12,  2.31s/it] 24%|███████████████████▌                                                               | 4746/20117 [2:58:00<9:51:55,  2.31s/it] 24%|███████████████████▌                                                               | 4747/20117 [2:58:02<9:48:57,  2.30s/it] 24%|███████████████████▌                                                               | 4748/20117 [2:58:04<9:46:53,  2.29s/it] 24%|███████████████████▌                                                               | 4749/20117 [2:58:07<9:46:41,  2.29s/it] 24%|███████████████████▌                                                               | 4750/20117 [2:58:09<9:38:45,  2.26s/it]                                                                                                                                 {'loss': 0.2324, 'grad_norm': 0.43567317724227905, 'learning_rate': 0.00017454124168056066, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 346.39, 'epoch': 0.47}
 24%|███████████████████▌                                                               | 4750/20117 [2:58:09<9:38:45,  2.26s/it] 24%|███████████████████▌                                                               | 4751/20117 [2:58:11<9:37:44,  2.26s/it] 24%|███████████████████▌                                                               | 4752/20117 [2:58:13<9:44:22,  2.28s/it] 24%|███████████████████▌                                                               | 4753/20117 [2:58:16<9:41:41,  2.27s/it] 24%|███████████████████▌                                                               | 4754/20117 [2:58:18<9:45:11,  2.29s/it] 24%|███████████████████▌                                                               | 4755/20117 [2:58:20<9:47:12,  2.29s/it] 24%|███████████████████▌                                                               | 4756/20117 [2:58:23<9:46:09,  2.29s/it] 24%|███████████████████▋                                                               | 4757/20117 [2:58:25<9:47:14,  2.29s/it] 24%|███████████████████▋                                                               | 4758/20117 [2:58:27<9:50:32,  2.31s/it] 24%|███████████████████▋                                                               | 4759/20117 [2:58:29<9:44:04,  2.28s/it] 24%|███████████████████▋                                                               | 4760/20117 [2:58:32<9:39:33,  2.26s/it]                                                                                                                                 {'loss': 0.1983, 'grad_norm': 0.39488813281059265, 'learning_rate': 0.0001744365290008561, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.53, 'epoch': 0.47}
 24%|███████████████████▋                                                               | 4760/20117 [2:58:32<9:39:33,  2.26s/it] 24%|███████████████████▋                                                               | 4761/20117 [2:58:34<9:36:45,  2.25s/it] 24%|███████████████████▋                                                               | 4762/20117 [2:58:36<9:37:08,  2.26s/it] 24%|███████████████████▋                                                               | 4763/20117 [2:58:38<9:39:02,  2.26s/it] 24%|███████████████████▋                                                               | 4764/20117 [2:58:41<9:42:48,  2.28s/it] 24%|███████████████████▋                                                               | 4765/20117 [2:58:43<9:36:49,  2.25s/it] 24%|███████████████████▋                                                               | 4766/20117 [2:58:45<9:39:02,  2.26s/it] 24%|███████████████████▋                                                               | 4767/20117 [2:58:48<9:41:27,  2.27s/it] 24%|███████████████████▋                                                               | 4768/20117 [2:58:50<9:36:04,  2.25s/it] 24%|███████████████████▋                                                               | 4769/20117 [2:58:52<9:42:26,  2.28s/it] 24%|███████████████████▋                                                               | 4770/20117 [2:58:54<9:39:55,  2.27s/it]                                                                                                                                 {'loss': 0.2783, 'grad_norm': 0.2595832943916321, 'learning_rate': 0.00017433163296824808, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 431.69, 'epoch': 0.47}
 24%|███████████████████▋                                                               | 4770/20117 [2:58:54<9:39:55,  2.27s/it] 24%|███████████████████▋                                                               | 4771/20117 [2:58:57<9:40:21,  2.27s/it] 24%|███████████████████▋                                                               | 4772/20117 [2:58:59<9:41:59,  2.28s/it] 24%|███████████████████▋                                                               | 4773/20117 [2:59:01<9:38:15,  2.26s/it] 24%|███████████████████▋                                                               | 4774/20117 [2:59:03<9:32:49,  2.24s/it] 24%|███████████████████▋                                                               | 4775/20117 [2:59:06<9:31:24,  2.23s/it] 24%|███████████████████▋                                                               | 4776/20117 [2:59:08<9:25:45,  2.21s/it] 24%|███████████████████▋                                                               | 4777/20117 [2:59:10<9:30:37,  2.23s/it] 24%|███████████████████▋                                                               | 4778/20117 [2:59:12<9:32:06,  2.24s/it] 24%|███████████████████▋                                                               | 4779/20117 [2:59:14<9:34:51,  2.25s/it] 24%|███████████████████▋                                                               | 4780/20117 [2:59:17<9:32:38,  2.24s/it]                                                                                                                                 {'loss': 0.2223, 'grad_norm': 0.3657257556915283, 'learning_rate': 0.00017422655384111772, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 333.82, 'epoch': 0.48}
 24%|███████████████████▋                                                               | 4780/20117 [2:59:17<9:32:38,  2.24s/it] 24%|███████████████████▋                                                               | 4781/20117 [2:59:19<9:31:19,  2.24s/it] 24%|███████████████████▋                                                               | 4782/20117 [2:59:21<9:30:32,  2.23s/it] 24%|███████████████████▋                                                               | 4783/20117 [2:59:23<9:36:40,  2.26s/it] 24%|███████████████████▋                                                               | 4784/20117 [2:59:26<9:33:46,  2.25s/it] 24%|███████████████████▋                                                               | 4785/20117 [2:59:28<9:34:42,  2.25s/it] 24%|███████████████████▋                                                               | 4786/20117 [2:59:30<9:33:35,  2.24s/it] 24%|███████████████████▊                                                               | 4787/20117 [2:59:32<9:37:31,  2.26s/it] 24%|███████████████████▊                                                               | 4788/20117 [2:59:35<9:34:30,  2.25s/it] 24%|███████████████████▊                                                               | 4789/20117 [2:59:37<9:36:39,  2.26s/it] 24%|███████████████████▊                                                               | 4790/20117 [2:59:39<9:38:12,  2.26s/it]                                                                                                                                 {'loss': 0.2042, 'grad_norm': 0.3269219994544983, 'learning_rate': 0.00017412129187829712, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 356.0, 'epoch': 0.48}
 24%|███████████████████▊                                                               | 4790/20117 [2:59:39<9:38:12,  2.26s/it] 24%|███████████████████▊                                                               | 4791/20117 [2:59:42<9:58:23,  2.34s/it] 24%|███████████████████▊                                                               | 4792/20117 [2:59:44<9:53:43,  2.32s/it] 24%|███████████████████▊                                                               | 4793/20117 [2:59:46<9:51:56,  2.32s/it] 24%|███████████████████▊                                                               | 4794/20117 [2:59:49<9:53:36,  2.32s/it] 24%|███████████████████▊                                                               | 4795/20117 [2:59:51<9:49:28,  2.31s/it] 24%|███████████████████▊                                                               | 4796/20117 [2:59:53<9:51:56,  2.32s/it] 24%|███████████████████▊                                                               | 4797/20117 [2:59:56<9:43:59,  2.29s/it] 24%|███████████████████▊                                                               | 4798/20117 [2:59:58<9:43:03,  2.28s/it] 24%|███████████████████▊                                                               | 4799/20117 [3:00:00<9:40:24,  2.27s/it] 24%|███████████████████▊                                                               | 4800/20117 [3:00:02<9:39:05,  2.27s/it]                                                                                                                                 {'loss': 0.216, 'grad_norm': 0.35581299662590027, 'learning_rate': 0.00017401584733906872, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 306.32, 'epoch': 0.48}
 24%|███████████████████▊                                                               | 4800/20117 [3:00:02<9:39:05,  2.27s/it] 24%|███████████████████▊                                                               | 4801/20117 [3:00:05<9:40:34,  2.27s/it] 24%|███████████████████▊                                                               | 4802/20117 [3:00:07<9:43:11,  2.28s/it] 24%|███████████████████▊                                                               | 4803/20117 [3:00:09<9:46:01,  2.30s/it] 24%|███████████████████▊                                                               | 4804/20117 [3:00:11<9:41:28,  2.28s/it] 24%|███████████████████▊                                                               | 4805/20117 [3:00:14<9:36:38,  2.26s/it] 24%|███████████████████▊                                                               | 4806/20117 [3:00:16<9:38:05,  2.27s/it] 24%|███████████████████▊                                                               | 4807/20117 [3:00:18<9:33:09,  2.25s/it] 24%|███████████████████▊                                                               | 4808/20117 [3:00:20<9:33:00,  2.25s/it] 24%|███████████████████▊                                                               | 4809/20117 [3:00:23<9:31:53,  2.24s/it] 24%|███████████████████▊                                                               | 4810/20117 [3:00:25<9:30:59,  2.24s/it]                                                                                                                                 {'loss': 0.3306, 'grad_norm': 0.5693733096122742, 'learning_rate': 0.00017391022048316476, 'memory/max_active (GiB)': 19.19, 'memory/max_allocated (GiB)': 19.19, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.5, 'epoch': 0.48}
 24%|███████████████████▊                                                               | 4810/20117 [3:00:25<9:30:59,  2.24s/it] 24%|███████████████████▊                                                               | 4811/20117 [3:00:27<9:28:03,  2.23s/it] 24%|███████████████████▊                                                               | 4812/20117 [3:00:29<9:30:13,  2.24s/it] 24%|███████████████████▊                                                               | 4813/20117 [3:00:32<9:28:14,  2.23s/it] 24%|███████████████████▊                                                               | 4814/20117 [3:00:34<9:28:50,  2.23s/it] 24%|███████████████████▊                                                               | 4815/20117 [3:00:36<9:30:51,  2.24s/it] 24%|███████████████████▊                                                               | 4816/20117 [3:00:38<9:27:25,  2.23s/it] 24%|███████████████████▊                                                               | 4817/20117 [3:00:40<9:29:31,  2.23s/it] 24%|███████████████████▉                                                               | 4818/20117 [3:00:43<9:32:48,  2.25s/it] 24%|███████████████████▉                                                               | 4819/20117 [3:00:45<9:31:15,  2.24s/it] 24%|███████████████████▉                                                               | 4820/20117 [3:00:47<9:37:36,  2.27s/it]                                                                                                                                 {'loss': 0.2469, 'grad_norm': 0.33154231309890747, 'learning_rate': 0.00017380441157076643, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.91, 'epoch': 0.48}
 24%|███████████████████▉                                                               | 4820/20117 [3:00:47<9:37:36,  2.27s/it] 24%|███████████████████▉                                                               | 4821/20117 [3:00:50<9:38:36,  2.27s/it] 24%|███████████████████▉                                                               | 4822/20117 [3:00:52<9:35:22,  2.26s/it] 24%|███████████████████▉                                                               | 4823/20117 [3:00:54<9:34:42,  2.25s/it] 24%|███████████████████▉                                                               | 4824/20117 [3:00:56<9:38:28,  2.27s/it] 24%|███████████████████▉                                                               | 4825/20117 [3:00:59<9:33:44,  2.25s/it] 24%|███████████████████▉                                                               | 4826/20117 [3:01:01<9:29:43,  2.24s/it] 24%|███████████████████▉                                                               | 4827/20117 [3:01:03<9:30:56,  2.24s/it] 24%|███████████████████▉                                                               | 4828/20117 [3:01:06<9:51:56,  2.32s/it] 24%|███████████████████▉                                                               | 4829/20117 [3:01:08<9:51:26,  2.32s/it] 24%|███████████████████▋                                                              | 4830/20117 [3:01:10<10:12:41,  2.40s/it]                                                                                                                                 {'loss': 0.2286, 'grad_norm': 0.417501837015152, 'learning_rate': 0.00017369842086250347, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 261.89, 'epoch': 0.48}
 24%|███████████████████▋                                                              | 4830/20117 [3:01:10<10:12:41,  2.40s/it] 24%|███████████████████▋                                                              | 4831/20117 [3:01:13<10:16:10,  2.42s/it] 24%|███████████████████▋                                                              | 4832/20117 [3:01:15<10:02:06,  2.36s/it] 24%|███████████████████▉                                                               | 4833/20117 [3:01:17<9:53:22,  2.33s/it] 24%|███████████████████▉                                                               | 4834/20117 [3:01:20<9:44:18,  2.29s/it] 24%|███████████████████▉                                                               | 4835/20117 [3:01:22<9:42:12,  2.29s/it] 24%|███████████████████▉                                                               | 4836/20117 [3:01:24<9:42:34,  2.29s/it] 24%|███████████████████▉                                                               | 4837/20117 [3:01:26<9:36:32,  2.26s/it] 24%|███████████████████▉                                                               | 4838/20117 [3:01:29<9:32:07,  2.25s/it] 24%|███████████████████▉                                                               | 4839/20117 [3:01:31<9:29:39,  2.24s/it] 24%|███████████████████▉                                                               | 4840/20117 [3:01:33<9:25:16,  2.22s/it]                                                                                                                                 {'loss': 0.2415, 'grad_norm': 0.2794663608074188, 'learning_rate': 0.00017359224861945345, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 350.79, 'epoch': 0.48}
 24%|███████████████████▉                                                               | 4840/20117 [3:01:33<9:25:16,  2.22s/it] 24%|███████████████████▉                                                               | 4841/20117 [3:01:35<9:23:55,  2.21s/it] 24%|███████████████████▉                                                               | 4842/20117 [3:01:37<9:26:53,  2.23s/it] 24%|███████████████████▉                                                               | 4843/20117 [3:01:40<9:50:15,  2.32s/it] 24%|███████████████████▉                                                               | 4844/20117 [3:01:42<9:45:23,  2.30s/it] 24%|███████████████████▉                                                               | 4845/20117 [3:01:45<9:45:27,  2.30s/it] 24%|███████████████████▊                                                              | 4846/20117 [3:01:47<10:00:20,  2.36s/it] 24%|███████████████████▊                                                              | 4847/20117 [3:01:50<10:11:14,  2.40s/it] 24%|███████████████████▊                                                              | 4848/20117 [3:01:52<10:04:03,  2.37s/it] 24%|████████████████████                                                               | 4849/20117 [3:01:54<9:51:21,  2.32s/it] 24%|████████████████████                                                               | 4850/20117 [3:01:56<9:43:21,  2.29s/it]                                                                                                                                 {'loss': 0.2396, 'grad_norm': 0.31123244762420654, 'learning_rate': 0.00017348589510314096, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.51, 'epoch': 0.48}
 24%|████████████████████                                                               | 4850/20117 [3:01:56<9:43:21,  2.29s/it] 24%|████████████████████                                                               | 4851/20117 [3:01:59<9:41:09,  2.28s/it] 24%|████████████████████                                                               | 4852/20117 [3:02:01<9:37:42,  2.27s/it] 24%|████████████████████                                                               | 4853/20117 [3:02:03<9:37:29,  2.27s/it] 24%|████████████████████                                                               | 4854/20117 [3:02:05<9:32:25,  2.25s/it] 24%|████████████████████                                                               | 4855/20117 [3:02:07<9:32:09,  2.25s/it] 24%|████████████████████                                                               | 4856/20117 [3:02:10<9:29:41,  2.24s/it] 24%|████████████████████                                                               | 4857/20117 [3:02:12<9:31:41,  2.25s/it] 24%|████████████████████                                                               | 4858/20117 [3:02:14<9:30:48,  2.24s/it] 24%|████████████████████                                                               | 4859/20117 [3:02:16<9:32:05,  2.25s/it] 24%|████████████████████                                                               | 4860/20117 [3:02:19<9:35:24,  2.26s/it]                                                                                                                                 {'loss': 0.2286, 'grad_norm': 0.21615763008594513, 'learning_rate': 0.00017337936057553726, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 430.28, 'epoch': 0.48}
 24%|████████████████████                                                               | 4860/20117 [3:02:19<9:35:24,  2.26s/it] 24%|████████████████████                                                               | 4861/20117 [3:02:21<9:40:16,  2.28s/it] 24%|████████████████████                                                               | 4862/20117 [3:02:23<9:39:16,  2.28s/it] 24%|████████████████████                                                               | 4863/20117 [3:02:26<9:38:09,  2.27s/it] 24%|████████████████████                                                               | 4864/20117 [3:02:28<9:34:31,  2.26s/it] 24%|████████████████████                                                               | 4865/20117 [3:02:30<9:39:15,  2.28s/it] 24%|████████████████████                                                               | 4866/20117 [3:02:32<9:34:41,  2.26s/it] 24%|████████████████████                                                               | 4867/20117 [3:02:35<9:39:16,  2.28s/it] 24%|████████████████████                                                               | 4868/20117 [3:02:37<9:35:45,  2.27s/it] 24%|████████████████████                                                               | 4869/20117 [3:02:39<9:39:40,  2.28s/it] 24%|████████████████████                                                               | 4870/20117 [3:02:42<9:44:33,  2.30s/it]                                                                                                                                 {'loss': 0.2213, 'grad_norm': 0.38201475143432617, 'learning_rate': 0.0001732726452990594, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 337.65, 'epoch': 0.48}
 24%|████████████████████                                                               | 4870/20117 [3:02:42<9:44:33,  2.30s/it] 24%|████████████████████                                                               | 4871/20117 [3:02:44<9:47:11,  2.31s/it] 24%|████████████████████                                                               | 4872/20117 [3:02:46<9:44:03,  2.30s/it] 24%|████████████████████                                                               | 4873/20117 [3:02:48<9:40:27,  2.28s/it] 24%|████████████████████                                                               | 4874/20117 [3:02:51<9:39:49,  2.28s/it] 24%|████████████████████                                                               | 4875/20117 [3:02:53<9:40:46,  2.29s/it] 24%|████████████████████                                                               | 4876/20117 [3:02:55<9:37:52,  2.27s/it] 24%|████████████████████                                                               | 4877/20117 [3:02:58<9:42:17,  2.29s/it] 24%|████████████████████▏                                                              | 4878/20117 [3:03:00<9:47:17,  2.31s/it] 24%|████████████████████▏                                                              | 4879/20117 [3:03:02<9:46:28,  2.31s/it] 24%|████████████████████▏                                                              | 4880/20117 [3:03:05<9:46:48,  2.31s/it]                                                                                                                                 {'loss': 0.2696, 'grad_norm': 0.4545513987541199, 'learning_rate': 0.00017316574953656958, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 388.83, 'epoch': 0.49}
 24%|████████████████████▏                                                              | 4880/20117 [3:03:05<9:46:48,  2.31s/it] 24%|████████████████████▏                                                              | 4881/20117 [3:03:07<9:42:53,  2.30s/it] 24%|████████████████████▏                                                              | 4882/20117 [3:03:09<9:36:04,  2.27s/it] 24%|████████████████████▏                                                              | 4883/20117 [3:03:11<9:35:02,  2.26s/it] 24%|████████████████████▏                                                              | 4884/20117 [3:03:14<9:30:56,  2.25s/it] 24%|████████████████████▏                                                              | 4885/20117 [3:03:16<9:30:15,  2.25s/it] 24%|████████████████████▏                                                              | 4886/20117 [3:03:18<9:34:55,  2.26s/it] 24%|████████████████████▏                                                              | 4887/20117 [3:03:20<9:36:14,  2.27s/it] 24%|████████████████████▏                                                              | 4888/20117 [3:03:23<9:42:43,  2.30s/it] 24%|████████████████████▏                                                              | 4889/20117 [3:03:25<9:40:28,  2.29s/it] 24%|████████████████████▏                                                              | 4890/20117 [3:03:27<9:36:02,  2.27s/it]                                                                                                                                 {'loss': 0.1962, 'grad_norm': 0.2672022879123688, 'learning_rate': 0.00017305867355137475, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 318.92, 'epoch': 0.49}
 24%|████████████████████▏                                                              | 4890/20117 [3:03:27<9:36:02,  2.27s/it] 24%|████████████████████▏                                                              | 4891/20117 [3:03:29<9:38:36,  2.28s/it] 24%|████████████████████▏                                                              | 4892/20117 [3:03:32<9:36:29,  2.27s/it] 24%|████████████████████▏                                                              | 4893/20117 [3:03:34<9:34:31,  2.26s/it] 24%|████████████████████▏                                                              | 4894/20117 [3:03:36<9:34:54,  2.27s/it] 24%|████████████████████▏                                                              | 4895/20117 [3:03:39<9:34:19,  2.26s/it] 24%|████████████████████▏                                                              | 4896/20117 [3:03:41<9:57:07,  2.35s/it] 24%|████████████████████▏                                                              | 4897/20117 [3:03:43<9:43:32,  2.30s/it] 24%|████████████████████▏                                                              | 4898/20117 [3:03:46<9:41:27,  2.29s/it] 24%|████████████████████▏                                                              | 4899/20117 [3:03:48<9:35:42,  2.27s/it] 24%|████████████████████▏                                                              | 4900/20117 [3:03:50<9:33:25,  2.26s/it]                                                                                                                                 {'loss': 0.2107, 'grad_norm': 0.4752904772758484, 'learning_rate': 0.00017295141760722567, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 361.56, 'epoch': 0.49}
 24%|████████████████████▏                                                              | 4900/20117 [3:03:50<9:33:25,  2.26s/it] 24%|████████████████████▏                                                              | 4901/20117 [3:03:52<9:33:03,  2.26s/it] 24%|████████████████████▏                                                              | 4902/20117 [3:03:55<9:33:41,  2.26s/it] 24%|████████████████████▏                                                              | 4903/20117 [3:03:57<9:34:36,  2.27s/it] 24%|████████████████████▏                                                              | 4904/20117 [3:03:59<9:34:41,  2.27s/it] 24%|████████████████████▏                                                              | 4905/20117 [3:04:01<9:31:37,  2.25s/it] 24%|████████████████████▏                                                              | 4906/20117 [3:04:03<9:27:01,  2.24s/it] 24%|████████████████████▏                                                              | 4907/20117 [3:04:06<9:28:39,  2.24s/it] 24%|████████████████████▏                                                              | 4908/20117 [3:04:08<9:28:06,  2.24s/it] 24%|████████████████████▎                                                              | 4909/20117 [3:04:10<9:26:20,  2.23s/it] 24%|████████████████████▎                                                              | 4910/20117 [3:04:13<9:32:36,  2.26s/it]                                                                                                                                 {'loss': 0.2648, 'grad_norm': 0.4459490180015564, 'learning_rate': 0.0001728439819683164, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 302.7, 'epoch': 0.49}
 24%|████████████████████▎                                                              | 4910/20117 [3:04:13<9:32:36,  2.26s/it] 24%|████████████████████▎                                                              | 4911/20117 [3:04:15<9:39:07,  2.29s/it] 24%|████████████████████▎                                                              | 4912/20117 [3:04:17<9:32:40,  2.26s/it] 24%|████████████████████▎                                                              | 4913/20117 [3:04:19<9:32:06,  2.26s/it] 24%|████████████████████▎                                                              | 4914/20117 [3:04:22<9:31:01,  2.25s/it] 24%|████████████████████▎                                                              | 4915/20117 [3:04:24<9:32:51,  2.26s/it] 24%|████████████████████▎                                                              | 4916/20117 [3:04:26<9:29:28,  2.25s/it] 24%|████████████████████▎                                                              | 4917/20117 [3:04:28<9:28:09,  2.24s/it] 24%|████████████████████▎                                                              | 4918/20117 [3:04:31<9:38:27,  2.28s/it] 24%|████████████████████▎                                                              | 4919/20117 [3:04:33<9:44:00,  2.31s/it] 24%|████████████████████▎                                                              | 4920/20117 [3:04:35<9:35:06,  2.27s/it]                                                                                                                                 {'loss': 0.2714, 'grad_norm': 0.6155937314033508, 'learning_rate': 0.00017273636689928357, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.02, 'epoch': 0.49}
 24%|████████████████████▎                                                              | 4920/20117 [3:04:35<9:35:06,  2.27s/it] 24%|████████████████████▎                                                              | 4921/20117 [3:04:37<9:36:34,  2.28s/it] 24%|████████████████████▎                                                              | 4922/20117 [3:04:40<9:34:52,  2.27s/it] 24%|████████████████████▎                                                              | 4923/20117 [3:04:42<9:26:23,  2.24s/it] 24%|████████████████████▎                                                              | 4924/20117 [3:04:44<9:24:50,  2.23s/it] 24%|████████████████████▎                                                              | 4925/20117 [3:04:46<9:18:04,  2.20s/it] 24%|████████████████████▎                                                              | 4926/20117 [3:04:49<9:22:50,  2.22s/it] 24%|████████████████████▎                                                              | 4927/20117 [3:04:51<9:20:39,  2.21s/it] 24%|████████████████████▎                                                              | 4928/20117 [3:04:53<9:26:24,  2.24s/it] 25%|████████████████████▎                                                              | 4929/20117 [3:04:55<9:25:44,  2.23s/it] 25%|████████████████████▎                                                              | 4930/20117 [3:04:58<9:29:35,  2.25s/it]                                                                                                                                 {'loss': 0.1966, 'grad_norm': 0.4557485282421112, 'learning_rate': 0.00017262857266520595, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.72, 'epoch': 0.49}
 25%|████████████████████▎                                                              | 4930/20117 [3:04:58<9:29:35,  2.25s/it] 25%|████████████████████▎                                                              | 4931/20117 [3:05:00<9:35:56,  2.28s/it] 25%|████████████████████▎                                                              | 4932/20117 [3:05:02<9:38:16,  2.28s/it] 25%|████████████████████▎                                                              | 4933/20117 [3:05:05<9:42:27,  2.30s/it] 25%|████████████████████▎                                                              | 4934/20117 [3:05:07<9:38:02,  2.28s/it] 25%|████████████████████▎                                                              | 4935/20117 [3:05:09<9:46:32,  2.32s/it] 25%|████████████████████▎                                                              | 4936/20117 [3:05:11<9:45:45,  2.32s/it] 25%|████████████████████▎                                                              | 4937/20117 [3:05:14<9:44:13,  2.31s/it] 25%|████████████████████▎                                                              | 4938/20117 [3:05:16<9:41:33,  2.30s/it] 25%|████████████████████▍                                                              | 4939/20117 [3:05:18<9:45:48,  2.32s/it] 25%|████████████████████▍                                                              | 4940/20117 [3:05:21<9:43:06,  2.31s/it]                                                                                                                                 {'loss': 0.2803, 'grad_norm': 0.4636807143688202, 'learning_rate': 0.0001725205995316034, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 373.68, 'epoch': 0.49}
 25%|████████████████████▍                                                              | 4940/20117 [3:05:21<9:43:06,  2.31s/it] 25%|████████████████████▍                                                              | 4941/20117 [3:05:23<9:40:08,  2.29s/it] 25%|████████████████████▍                                                              | 4942/20117 [3:05:25<9:35:04,  2.27s/it] 25%|████████████████████▍                                                              | 4943/20117 [3:05:27<9:33:54,  2.27s/it] 25%|████████████████████▍                                                              | 4944/20117 [3:05:30<9:36:53,  2.28s/it] 25%|████████████████████▍                                                              | 4945/20117 [3:05:32<9:33:41,  2.27s/it] 25%|████████████████████▍                                                              | 4946/20117 [3:05:34<9:32:40,  2.26s/it] 25%|████████████████████▍                                                              | 4947/20117 [3:05:37<9:36:19,  2.28s/it] 25%|████████████████████▍                                                              | 4948/20117 [3:05:39<9:36:24,  2.28s/it] 25%|████████████████████▍                                                              | 4949/20117 [3:05:41<9:44:55,  2.31s/it] 25%|████████████████████▏                                                             | 4950/20117 [3:05:44<10:01:43,  2.38s/it]                                                                                                                                 {'loss': 0.3439, 'grad_norm': 0.45894595980644226, 'learning_rate': 0.00017241244776443666, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 426.86, 'epoch': 0.49}
 25%|████████████████████▏                                                             | 4950/20117 [3:05:44<10:01:43,  2.38s/it] 25%|████████████████████▍                                                              | 4951/20117 [3:05:46<9:49:24,  2.33s/it] 25%|████████████████████▍                                                              | 4952/20117 [3:05:48<9:40:19,  2.30s/it] 25%|████████████████████▍                                                              | 4953/20117 [3:05:50<9:41:03,  2.30s/it] 25%|████████████████████▍                                                              | 4954/20117 [3:05:53<9:35:47,  2.28s/it] 25%|████████████████████▍                                                              | 4955/20117 [3:05:55<9:31:51,  2.26s/it] 25%|████████████████████▍                                                              | 4956/20117 [3:05:57<9:33:18,  2.27s/it] 25%|████████████████████▍                                                              | 4957/20117 [3:05:59<9:33:20,  2.27s/it] 25%|████████████████████▍                                                              | 4958/20117 [3:06:02<9:33:42,  2.27s/it] 25%|████████████████████▍                                                              | 4959/20117 [3:06:04<9:30:44,  2.26s/it] 25%|████████████████████▍                                                              | 4960/20117 [3:06:06<9:30:14,  2.26s/it]                                                                                                                                 {'loss': 0.3035, 'grad_norm': 0.39921844005584717, 'learning_rate': 0.0001723041176301063, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 375.09, 'epoch': 0.49}
 25%|████████████████████▍                                                              | 4960/20117 [3:06:06<9:30:14,  2.26s/it] 25%|████████████████████▍                                                              | 4961/20117 [3:06:08<9:26:59,  2.24s/it] 25%|████████████████████▍                                                              | 4962/20117 [3:06:11<9:30:00,  2.26s/it] 25%|████████████████████▍                                                              | 4963/20117 [3:06:13<9:27:32,  2.25s/it] 25%|████████████████████▍                                                              | 4964/20117 [3:06:15<9:24:23,  2.23s/it] 25%|████████████████████▍                                                              | 4965/20117 [3:06:17<9:25:00,  2.24s/it] 25%|████████████████████▍                                                              | 4966/20117 [3:06:20<9:30:03,  2.26s/it] 25%|████████████████████▍                                                              | 4967/20117 [3:06:22<9:27:27,  2.25s/it] 25%|████████████████████▍                                                              | 4968/20117 [3:06:24<9:28:06,  2.25s/it] 25%|████████████████████▌                                                              | 4969/20117 [3:06:26<9:25:59,  2.24s/it] 25%|████████████████████▌                                                              | 4970/20117 [3:06:29<9:22:28,  2.23s/it]                                                                                                                                 {'loss': 0.2043, 'grad_norm': 0.28210845589637756, 'learning_rate': 0.00017219560939545246, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 320.21, 'epoch': 0.49}
 25%|████████████████████▌                                                              | 4970/20117 [3:06:29<9:22:28,  2.23s/it] 25%|████████████████████▌                                                              | 4971/20117 [3:06:31<9:21:14,  2.22s/it] 25%|████████████████████▌                                                              | 4972/20117 [3:06:33<9:24:29,  2.24s/it] 25%|████████████████████▌                                                              | 4973/20117 [3:06:35<9:26:18,  2.24s/it] 25%|████████████████████▌                                                              | 4974/20117 [3:06:38<9:28:22,  2.25s/it] 25%|████████████████████▌                                                              | 4975/20117 [3:06:40<9:26:40,  2.25s/it] 25%|████████████████████▌                                                              | 4976/20117 [3:06:42<9:22:50,  2.23s/it] 25%|████████████████████▌                                                              | 4977/20117 [3:06:44<9:23:08,  2.23s/it] 25%|████████████████████▌                                                              | 4978/20117 [3:06:47<9:28:43,  2.25s/it] 25%|████████████████████▌                                                              | 4979/20117 [3:06:49<9:28:04,  2.25s/it] 25%|████████████████████▌                                                              | 4980/20117 [3:06:51<9:25:07,  2.24s/it]                                                                                                                                 {'loss': 0.2293, 'grad_norm': 0.5301778316497803, 'learning_rate': 0.00017208692332775375, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.97, 'epoch': 0.5}
 25%|████████████████████▌                                                              | 4980/20117 [3:06:51<9:25:07,  2.24s/it] 25%|████████████████████▌                                                              | 4981/20117 [3:06:54<9:51:12,  2.34s/it] 25%|████████████████████▌                                                              | 4982/20117 [3:06:56<9:44:58,  2.32s/it] 25%|████████████████████▌                                                              | 4983/20117 [3:06:58<9:39:57,  2.30s/it] 25%|████████████████████▌                                                              | 4984/20117 [3:07:00<9:38:35,  2.29s/it] 25%|████████████████████▌                                                              | 4985/20117 [3:07:03<9:36:42,  2.29s/it] 25%|████████████████████▌                                                              | 4986/20117 [3:07:05<9:34:32,  2.28s/it] 25%|████████████████████▌                                                              | 4987/20117 [3:07:07<9:32:02,  2.27s/it] 25%|████████████████████▌                                                              | 4988/20117 [3:07:10<9:53:04,  2.35s/it] 25%|████████████████████▌                                                              | 4989/20117 [3:07:12<9:49:11,  2.34s/it] 25%|████████████████████▌                                                              | 4990/20117 [3:07:14<9:49:04,  2.34s/it]                                                                                                                                 {'loss': 0.3101, 'grad_norm': 0.40421542525291443, 'learning_rate': 0.000171978059694727, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 395.91, 'epoch': 0.5}
 25%|████████████████████▌                                                              | 4990/20117 [3:07:14<9:49:04,  2.34s/it] 25%|████████████████████▌                                                              | 4991/20117 [3:07:17<9:41:07,  2.31s/it] 25%|████████████████████▌                                                              | 4992/20117 [3:07:19<9:41:35,  2.31s/it] 25%|████████████████████▌                                                              | 4993/20117 [3:07:21<9:36:44,  2.29s/it] 25%|████████████████████▌                                                              | 4994/20117 [3:07:23<9:33:37,  2.28s/it] 25%|████████████████████▌                                                              | 4995/20117 [3:07:26<9:31:36,  2.27s/it] 25%|████████████████████▌                                                              | 4996/20117 [3:07:28<9:34:26,  2.28s/it] 25%|████████████████████▌                                                              | 4997/20117 [3:07:30<9:32:51,  2.27s/it] 25%|████████████████████▌                                                              | 4998/20117 [3:07:33<9:35:07,  2.28s/it] 25%|████████████████████▋                                                              | 4999/20117 [3:07:35<9:37:40,  2.29s/it] 25%|████████████████████▋                                                              | 5000/20117 [3:07:37<9:31:36,  2.27s/it]                                                                                                                                 {'loss': 0.2447, 'grad_norm': 0.383989155292511, 'learning_rate': 0.0001718690187645263, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 344.59, 'epoch': 0.5}
 25%|████████████████████▋                                                              | 5000/20117 [3:07:37<9:31:36,  2.27s/it] 25%|████████████████████▋                                                              | 5001/20117 [3:07:39<9:33:07,  2.27s/it] 25%|████████████████████▋                                                              | 5002/20117 [3:07:42<9:50:47,  2.35s/it] 25%|████████████████████▋                                                              | 5003/20117 [3:07:44<9:43:50,  2.32s/it] 25%|████████████████████▋                                                              | 5004/20117 [3:07:46<9:42:12,  2.31s/it] 25%|████████████████████▋                                                              | 5005/20117 [3:07:49<9:39:58,  2.30s/it] 25%|████████████████████▋                                                              | 5006/20117 [3:07:51<9:40:07,  2.30s/it] 25%|████████████████████▋                                                              | 5007/20117 [3:07:53<9:39:14,  2.30s/it] 25%|████████████████████▋                                                              | 5008/20117 [3:07:56<9:31:56,  2.27s/it] 25%|████████████████████▋                                                              | 5009/20117 [3:07:58<9:29:11,  2.26s/it] 25%|████████████████████▋                                                              | 5010/20117 [3:08:00<9:29:19,  2.26s/it]                                                                                                                                 {'loss': 0.3176, 'grad_norm': 0.559518039226532, 'learning_rate': 0.00017175980080574247, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 378.55, 'epoch': 0.5}
 25%|████████████████████▋                                                              | 5010/20117 [3:08:00<9:29:19,  2.26s/it] 25%|████████████████████▋                                                              | 5011/20117 [3:08:02<9:28:32,  2.26s/it] 25%|████████████████████▋                                                              | 5012/20117 [3:08:04<9:23:23,  2.24s/it] 25%|████████████████████▋                                                              | 5013/20117 [3:08:07<9:25:05,  2.24s/it] 25%|████████████████████▋                                                              | 5014/20117 [3:08:09<9:26:34,  2.25s/it] 25%|████████████████████▋                                                              | 5015/20117 [3:08:11<9:31:02,  2.27s/it] 25%|████████████████████▋                                                              | 5016/20117 [3:08:14<9:26:44,  2.25s/it] 25%|████████████████████▋                                                              | 5017/20117 [3:08:16<9:26:19,  2.25s/it] 25%|████████████████████▋                                                              | 5018/20117 [3:08:18<9:24:59,  2.25s/it] 25%|████████████████████▋                                                              | 5019/20117 [3:08:20<9:31:56,  2.27s/it] 25%|████████████████████▋                                                              | 5020/20117 [3:08:23<9:29:52,  2.26s/it]                                                                                                                                 {'loss': 0.2006, 'grad_norm': 0.46650487184524536, 'learning_rate': 0.00017165040608740255, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.42, 'epoch': 0.5}
 25%|████████████████████▋                                                              | 5020/20117 [3:08:23<9:29:52,  2.26s/it] 25%|████████████████████▋                                                              | 5021/20117 [3:08:25<9:35:39,  2.29s/it] 25%|████████████████████▋                                                              | 5022/20117 [3:08:27<9:38:52,  2.30s/it] 25%|████████████████████▋                                                              | 5023/20117 [3:08:30<9:37:02,  2.29s/it] 25%|████████████████████▋                                                              | 5024/20117 [3:08:32<9:40:11,  2.31s/it] 25%|████████████████████▋                                                              | 5025/20117 [3:08:34<9:39:27,  2.30s/it] 25%|████████████████████▋                                                              | 5026/20117 [3:08:36<9:38:47,  2.30s/it] 25%|████████████████████▋                                                              | 5027/20117 [3:08:39<9:40:21,  2.31s/it] 25%|████████████████████▋                                                              | 5028/20117 [3:08:41<9:38:41,  2.30s/it] 25%|████████████████████▋                                                              | 5029/20117 [3:08:43<9:39:30,  2.30s/it] 25%|████████████████████▊                                                              | 5030/20117 [3:08:46<9:35:58,  2.29s/it]                                                                                                                                 {'loss': 0.2542, 'grad_norm': 0.3541397750377655, 'learning_rate': 0.00017154083487896872, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.31, 'epoch': 0.5}
 25%|████████████████████▊                                                              | 5030/20117 [3:08:46<9:35:58,  2.29s/it] 25%|████████████████████▊                                                              | 5031/20117 [3:08:48<9:33:59,  2.28s/it] 25%|████████████████████▊                                                              | 5032/20117 [3:08:50<9:30:14,  2.27s/it] 25%|████████████████████▊                                                              | 5033/20117 [3:08:52<9:24:34,  2.25s/it] 25%|████████████████████▊                                                              | 5034/20117 [3:08:55<9:20:56,  2.23s/it] 25%|████████████████████▊                                                              | 5035/20117 [3:08:57<9:21:50,  2.24s/it] 25%|████████████████████▊                                                              | 5036/20117 [3:08:59<9:35:16,  2.29s/it] 25%|████████████████████▊                                                              | 5037/20117 [3:09:01<9:30:44,  2.27s/it] 25%|████████████████████▊                                                              | 5038/20117 [3:09:04<9:28:58,  2.26s/it] 25%|████████████████████▊                                                              | 5039/20117 [3:09:06<9:28:07,  2.26s/it] 25%|████████████████████▊                                                              | 5040/20117 [3:09:08<9:30:30,  2.27s/it]                                                                                                                                 {'loss': 0.2133, 'grad_norm': 0.48084208369255066, 'learning_rate': 0.00017143108745033811, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 281.78, 'epoch': 0.5}
 25%|████████████████████▊                                                              | 5040/20117 [3:09:08<9:30:30,  2.27s/it] 25%|████████████████████▊                                                              | 5041/20117 [3:09:10<9:29:03,  2.26s/it] 25%|████████████████████▊                                                              | 5042/20117 [3:09:13<9:25:04,  2.25s/it] 25%|████████████████████▊                                                              | 5043/20117 [3:09:15<9:33:03,  2.28s/it] 25%|████████████████████▊                                                              | 5044/20117 [3:09:17<9:29:10,  2.27s/it] 25%|████████████████████▊                                                              | 5045/20117 [3:09:19<9:24:35,  2.25s/it] 25%|████████████████████▊                                                              | 5046/20117 [3:09:22<9:23:14,  2.24s/it] 25%|████████████████████▊                                                              | 5047/20117 [3:09:24<9:20:13,  2.23s/it] 25%|████████████████████▊                                                              | 5048/20117 [3:09:26<9:19:46,  2.23s/it] 25%|████████████████████▊                                                              | 5049/20117 [3:09:28<9:18:26,  2.22s/it] 25%|████████████████████▊                                                              | 5050/20117 [3:09:31<9:16:44,  2.22s/it]                                                                                                                                 {'loss': 0.246, 'grad_norm': 0.28656521439552307, 'learning_rate': 0.0001713211640718418, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.53, 'epoch': 0.5}
 25%|████████████████████▊                                                              | 5050/20117 [3:09:31<9:16:44,  2.22s/it] 25%|████████████████████▊                                                              | 5051/20117 [3:09:33<9:19:16,  2.23s/it] 25%|████████████████████▊                                                              | 5052/20117 [3:09:35<9:20:17,  2.23s/it] 25%|████████████████████▊                                                              | 5053/20117 [3:09:37<9:17:01,  2.22s/it] 25%|████████████████████▊                                                              | 5054/20117 [3:09:39<9:19:05,  2.23s/it] 25%|████████████████████▊                                                              | 5055/20117 [3:09:42<9:40:46,  2.31s/it] 25%|████████████████████▊                                                              | 5056/20117 [3:09:44<9:34:41,  2.29s/it] 25%|████████████████████▊                                                              | 5057/20117 [3:09:46<9:33:30,  2.28s/it] 25%|████████████████████▊                                                              | 5058/20117 [3:09:49<9:34:52,  2.29s/it] 25%|████████████████████▊                                                              | 5059/20117 [3:09:51<9:27:08,  2.26s/it] 25%|████████████████████▉                                                              | 5060/20117 [3:09:53<9:20:58,  2.24s/it]                                                                                                                                 {'loss': 0.2497, 'grad_norm': 0.48624783754348755, 'learning_rate': 0.0001712110650142443, 'memory/max_active (GiB)': 20.65, 'memory/max_allocated (GiB)': 20.65, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 397.02, 'epoch': 0.5}
 25%|████████████████████▉                                                              | 5060/20117 [3:09:53<9:20:58,  2.24s/it] 25%|████████████████████▉                                                              | 5061/20117 [3:09:55<9:19:52,  2.23s/it] 25%|████████████████████▉                                                              | 5062/20117 [3:09:58<9:22:21,  2.24s/it] 25%|████████████████████▉                                                              | 5063/20117 [3:10:00<9:20:56,  2.24s/it] 25%|████████████████████▉                                                              | 5064/20117 [3:10:02<9:23:55,  2.25s/it] 25%|████████████████████▉                                                              | 5065/20117 [3:10:04<9:24:37,  2.25s/it] 25%|████████████████████▉                                                              | 5066/20117 [3:10:07<9:25:06,  2.25s/it] 25%|████████████████████▉                                                              | 5067/20117 [3:10:09<9:29:57,  2.27s/it] 25%|████████████████████▉                                                              | 5068/20117 [3:10:11<9:27:50,  2.26s/it] 25%|████████████████████▉                                                              | 5069/20117 [3:10:13<9:25:07,  2.25s/it] 25%|████████████████████▉                                                              | 5070/20117 [3:10:16<9:24:38,  2.25s/it]                                                                                                                                 {'loss': 0.2366, 'grad_norm': 0.3334798216819763, 'learning_rate': 0.00017110079054874288, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 371.25, 'epoch': 0.5}
 25%|████████████████████▉                                                              | 5070/20117 [3:10:16<9:24:38,  2.25s/it] 25%|████████████████████▉                                                              | 5071/20117 [3:10:18<9:19:47,  2.23s/it] 25%|████████████████████▉                                                              | 5072/20117 [3:10:20<9:18:06,  2.23s/it] 25%|████████████████████▉                                                              | 5073/20117 [3:10:23<9:37:14,  2.30s/it] 25%|████████████████████▉                                                              | 5074/20117 [3:10:25<9:31:07,  2.28s/it] 25%|████████████████████▉                                                              | 5075/20117 [3:10:27<9:29:25,  2.27s/it] 25%|████████████████████▉                                                              | 5076/20117 [3:10:29<9:23:57,  2.25s/it] 25%|████████████████████▉                                                              | 5077/20117 [3:10:32<9:24:31,  2.25s/it] 25%|████████████████████▉                                                              | 5078/20117 [3:10:34<9:27:58,  2.27s/it] 25%|████████████████████▉                                                              | 5079/20117 [3:10:36<9:28:49,  2.27s/it] 25%|████████████████████▉                                                              | 5080/20117 [3:10:38<9:23:37,  2.25s/it]                                                                                                                                 {'loss': 0.2104, 'grad_norm': 0.4432489275932312, 'learning_rate': 0.00017099034094696685, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 384.5, 'epoch': 0.51}
 25%|████████████████████▉                                                              | 5080/20117 [3:10:38<9:23:37,  2.25s/it] 25%|████████████████████▉                                                              | 5081/20117 [3:10:41<9:22:58,  2.25s/it] 25%|████████████████████▉                                                              | 5082/20117 [3:10:43<9:26:19,  2.26s/it] 25%|████████████████████▉                                                              | 5083/20117 [3:10:45<9:28:55,  2.27s/it] 25%|████████████████████▉                                                              | 5084/20117 [3:10:47<9:24:10,  2.25s/it] 25%|████████████████████▉                                                              | 5085/20117 [3:10:50<9:24:32,  2.25s/it] 25%|████████████████████▉                                                              | 5086/20117 [3:10:52<9:29:15,  2.27s/it] 25%|████████████████████▉                                                              | 5087/20117 [3:10:54<9:23:17,  2.25s/it] 25%|████████████████████▉                                                              | 5088/20117 [3:10:56<9:25:46,  2.26s/it] 25%|████████████████████▉                                                              | 5089/20117 [3:10:59<9:25:31,  2.26s/it] 25%|█████████████████████                                                              | 5090/20117 [3:11:01<9:24:10,  2.25s/it]                                                                                                                                 {'loss': 0.2292, 'grad_norm': 0.5348425507545471, 'learning_rate': 0.00017087971648097693, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 311.48, 'epoch': 0.51}
 25%|█████████████████████                                                              | 5090/20117 [3:11:01<9:24:10,  2.25s/it] 25%|█████████████████████                                                              | 5091/20117 [3:11:03<9:23:59,  2.25s/it] 25%|█████████████████████                                                              | 5092/20117 [3:11:05<9:19:57,  2.24s/it] 25%|█████████████████████                                                              | 5093/20117 [3:11:07<9:15:04,  2.22s/it] 25%|█████████████████████                                                              | 5094/20117 [3:11:10<9:17:22,  2.23s/it] 25%|█████████████████████                                                              | 5095/20117 [3:11:12<9:19:24,  2.23s/it] 25%|█████████████████████                                                              | 5096/20117 [3:11:14<9:18:46,  2.23s/it] 25%|█████████████████████                                                              | 5097/20117 [3:11:16<9:18:23,  2.23s/it] 25%|█████████████████████                                                              | 5098/20117 [3:11:19<9:20:00,  2.24s/it] 25%|█████████████████████                                                              | 5099/20117 [3:11:21<9:21:41,  2.24s/it] 25%|█████████████████████                                                              | 5100/20117 [3:11:23<9:22:10,  2.25s/it]                                                                                                                                 {'loss': 0.2297, 'grad_norm': 0.4587235748767853, 'learning_rate': 0.00017076891742326452, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 371.52, 'epoch': 0.51}
 25%|█████████████████████                                                              | 5100/20117 [3:11:23<9:22:10,  2.25s/it] 25%|█████████████████████                                                              | 5101/20117 [3:11:26<9:27:43,  2.27s/it] 25%|█████████████████████                                                              | 5102/20117 [3:11:28<9:25:22,  2.26s/it] 25%|█████████████████████                                                              | 5103/20117 [3:11:30<9:20:44,  2.24s/it] 25%|█████████████████████                                                              | 5104/20117 [3:11:32<9:17:44,  2.23s/it] 25%|█████████████████████                                                              | 5105/20117 [3:11:34<9:20:35,  2.24s/it] 25%|█████████████████████                                                              | 5106/20117 [3:11:37<9:20:49,  2.24s/it] 25%|█████████████████████                                                              | 5107/20117 [3:11:39<9:22:47,  2.25s/it] 25%|█████████████████████                                                              | 5108/20117 [3:11:41<9:19:45,  2.24s/it] 25%|█████████████████████                                                              | 5109/20117 [3:11:44<9:39:44,  2.32s/it] 25%|█████████████████████                                                              | 5110/20117 [3:11:46<9:32:13,  2.29s/it]                                                                                                                                 {'loss': 0.2447, 'grad_norm': 0.3222121298313141, 'learning_rate': 0.00017065794404675112, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 346.06, 'epoch': 0.51}
 25%|█████████████████████                                                              | 5110/20117 [3:11:46<9:32:13,  2.29s/it] 25%|█████████████████████                                                              | 5111/20117 [3:11:48<9:26:22,  2.26s/it] 25%|█████████████████████                                                              | 5112/20117 [3:11:50<9:30:43,  2.28s/it] 25%|█████████████████████                                                              | 5113/20117 [3:11:53<9:25:44,  2.26s/it] 25%|█████████████████████                                                              | 5114/20117 [3:11:55<9:22:16,  2.25s/it] 25%|█████████████████████                                                              | 5115/20117 [3:11:57<9:18:47,  2.23s/it] 25%|█████████████████████                                                              | 5116/20117 [3:11:59<9:13:27,  2.21s/it] 25%|█████████████████████                                                              | 5117/20117 [3:12:01<9:09:44,  2.20s/it] 25%|█████████████████████                                                              | 5118/20117 [3:12:04<9:09:49,  2.20s/it] 25%|█████████████████████                                                              | 5119/20117 [3:12:06<9:11:52,  2.21s/it] 25%|█████████████████████                                                              | 5120/20117 [3:12:08<9:12:39,  2.21s/it]                                                                                                                                 {'loss': 0.192, 'grad_norm': 0.38898178935050964, 'learning_rate': 0.0001705467966247877, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 388.93, 'epoch': 0.51}
 25%|█████████████████████                                                              | 5120/20117 [3:12:08<9:12:39,  2.21s/it] 25%|█████████████████████▏                                                             | 5121/20117 [3:12:10<9:17:48,  2.23s/it] 25%|█████████████████████▏                                                             | 5122/20117 [3:12:13<9:20:39,  2.24s/it] 25%|█████████████████████▏                                                             | 5123/20117 [3:12:15<9:24:29,  2.26s/it] 25%|█████████████████████▏                                                             | 5124/20117 [3:12:17<9:31:26,  2.29s/it] 25%|█████████████████████▏                                                             | 5125/20117 [3:12:20<9:31:24,  2.29s/it] 25%|█████████████████████▏                                                             | 5126/20117 [3:12:22<9:27:08,  2.27s/it] 25%|█████████████████████▏                                                             | 5127/20117 [3:12:24<9:41:36,  2.33s/it] 25%|█████████████████████▏                                                             | 5128/20117 [3:12:27<9:40:19,  2.32s/it] 25%|█████████████████████▏                                                             | 5129/20117 [3:12:29<9:44:12,  2.34s/it] 26%|█████████████████████▏                                                             | 5130/20117 [3:12:31<9:40:14,  2.32s/it]                                                                                                                                 {'loss': 0.2604, 'grad_norm': 0.49159225821495056, 'learning_rate': 0.00017043547543115373, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 348.77, 'epoch': 0.51}
 26%|█████████████████████▏                                                             | 5130/20117 [3:12:31<9:40:14,  2.32s/it] 26%|█████████████████████▏                                                             | 5131/20117 [3:12:33<9:38:34,  2.32s/it] 26%|█████████████████████▏                                                             | 5132/20117 [3:12:36<9:30:09,  2.28s/it] 26%|█████████████████████▏                                                             | 5133/20117 [3:12:38<9:24:24,  2.26s/it] 26%|█████████████████████▏                                                             | 5134/20117 [3:12:40<9:23:09,  2.26s/it] 26%|█████████████████████▏                                                             | 5135/20117 [3:12:42<9:25:57,  2.27s/it] 26%|█████████████████████▏                                                             | 5136/20117 [3:12:45<9:25:44,  2.27s/it] 26%|█████████████████████▏                                                             | 5137/20117 [3:12:47<9:28:37,  2.28s/it] 26%|█████████████████████▏                                                             | 5138/20117 [3:12:49<9:27:37,  2.27s/it] 26%|█████████████████████▏                                                             | 5139/20117 [3:12:51<9:23:58,  2.26s/it] 26%|█████████████████████▏                                                             | 5140/20117 [3:12:54<9:24:34,  2.26s/it]                                                                                                                                 {'loss': 0.1843, 'grad_norm': 0.34262073040008545, 'learning_rate': 0.0001703239807400569, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 361.68, 'epoch': 0.51}
 26%|█████████████████████▏                                                             | 5140/20117 [3:12:54<9:24:34,  2.26s/it] 26%|█████████████████████▏                                                             | 5141/20117 [3:12:56<9:22:49,  2.25s/it] 26%|█████████████████████▏                                                             | 5142/20117 [3:12:58<9:25:35,  2.27s/it] 26%|█████████████████████▏                                                             | 5143/20117 [3:13:00<9:21:30,  2.25s/it] 26%|█████████████████████▏                                                             | 5144/20117 [3:13:03<9:23:56,  2.26s/it] 26%|█████████████████████▏                                                             | 5145/20117 [3:13:05<9:31:29,  2.29s/it] 26%|█████████████████████▏                                                             | 5146/20117 [3:13:07<9:24:28,  2.26s/it] 26%|█████████████████████▏                                                             | 5147/20117 [3:13:10<9:29:20,  2.28s/it] 26%|█████████████████████▏                                                             | 5148/20117 [3:13:12<9:24:38,  2.26s/it] 26%|█████████████████████▏                                                             | 5149/20117 [3:13:14<9:22:22,  2.25s/it] 26%|█████████████████████▏                                                             | 5150/20117 [3:13:16<9:21:34,  2.25s/it]                                                                                                                                 {'loss': 0.2527, 'grad_norm': 0.5016794800758362, 'learning_rate': 0.00017021231282613223, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.36, 'epoch': 0.51}
 26%|█████████████████████▏                                                             | 5150/20117 [3:13:16<9:21:34,  2.25s/it] 26%|█████████████████████▎                                                             | 5151/20117 [3:13:19<9:18:28,  2.24s/it] 26%|█████████████████████▎                                                             | 5152/20117 [3:13:21<9:22:28,  2.26s/it] 26%|█████████████████████▎                                                             | 5153/20117 [3:13:23<9:19:37,  2.24s/it] 26%|█████████████████████▎                                                             | 5154/20117 [3:13:25<9:18:51,  2.24s/it] 26%|█████████████████████▎                                                             | 5155/20117 [3:13:28<9:20:35,  2.25s/it] 26%|█████████████████████▎                                                             | 5156/20117 [3:13:30<9:22:47,  2.26s/it] 26%|█████████████████████▎                                                             | 5157/20117 [3:13:32<9:25:19,  2.27s/it] 26%|█████████████████████▎                                                             | 5158/20117 [3:13:34<9:26:36,  2.27s/it] 26%|█████████████████████▎                                                             | 5159/20117 [3:13:37<9:24:51,  2.27s/it] 26%|█████████████████████▎                                                             | 5160/20117 [3:13:39<9:49:21,  2.36s/it]                                                                                                                                 {'loss': 0.2349, 'grad_norm': 0.44959282875061035, 'learning_rate': 0.00017010047196444137, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.58, 'epoch': 0.51}
 26%|█████████████████████▎                                                             | 5160/20117 [3:13:39<9:49:21,  2.36s/it] 26%|█████████████████████▎                                                             | 5161/20117 [3:13:41<9:37:24,  2.32s/it] 26%|█████████████████████▎                                                             | 5162/20117 [3:13:44<9:29:09,  2.28s/it] 26%|█████████████████████▎                                                             | 5163/20117 [3:13:46<9:22:34,  2.26s/it] 26%|█████████████████████▎                                                             | 5164/20117 [3:13:48<9:21:07,  2.25s/it] 26%|█████████████████████▎                                                             | 5165/20117 [3:13:50<9:18:45,  2.24s/it] 26%|█████████████████████▎                                                             | 5166/20117 [3:13:53<9:17:32,  2.24s/it] 26%|█████████████████████▎                                                             | 5167/20117 [3:13:55<9:21:24,  2.25s/it] 26%|█████████████████████▎                                                             | 5168/20117 [3:13:57<9:23:49,  2.26s/it] 26%|█████████████████████▎                                                             | 5169/20117 [3:13:59<9:22:57,  2.26s/it] 26%|█████████████████████▎                                                             | 5170/20117 [3:14:02<9:30:36,  2.29s/it]                                                                                                                                 {'loss': 0.2564, 'grad_norm': 0.27737611532211304, 'learning_rate': 0.00016998845843047193, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 325.49, 'epoch': 0.51}
 26%|█████████████████████▎                                                             | 5170/20117 [3:14:02<9:30:36,  2.29s/it] 26%|█████████████████████▎                                                             | 5171/20117 [3:14:04<9:25:43,  2.27s/it] 26%|█████████████████████▎                                                             | 5172/20117 [3:14:06<9:30:25,  2.29s/it] 26%|█████████████████████▎                                                             | 5173/20117 [3:14:09<9:28:11,  2.28s/it] 26%|█████████████████████▎                                                             | 5174/20117 [3:14:11<9:29:29,  2.29s/it] 26%|█████████████████████▎                                                             | 5175/20117 [3:14:13<9:29:26,  2.29s/it] 26%|█████████████████████▎                                                             | 5176/20117 [3:14:15<9:24:20,  2.27s/it] 26%|█████████████████████▎                                                             | 5177/20117 [3:14:18<9:25:29,  2.27s/it] 26%|█████████████████████▎                                                             | 5178/20117 [3:14:20<9:22:43,  2.26s/it] 26%|█████████████████████▎                                                             | 5179/20117 [3:14:22<9:24:37,  2.27s/it] 26%|█████████████████████▎                                                             | 5180/20117 [3:14:24<9:21:53,  2.26s/it]                                                                                                                                 {'loss': 0.2375, 'grad_norm': 0.5175918340682983, 'learning_rate': 0.00016987627250013702, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.38, 'epoch': 0.51}
 26%|█████████████████████▎                                                             | 5180/20117 [3:14:24<9:21:53,  2.26s/it] 26%|█████████████████████▍                                                             | 5181/20117 [3:14:27<9:26:51,  2.28s/it] 26%|█████████████████████▍                                                             | 5182/20117 [3:14:29<9:24:06,  2.27s/it] 26%|█████████████████████▍                                                             | 5183/20117 [3:14:31<9:28:06,  2.28s/it] 26%|█████████████████████▍                                                             | 5184/20117 [3:14:34<9:26:05,  2.27s/it] 26%|█████████████████████▍                                                             | 5185/20117 [3:14:36<9:28:18,  2.28s/it] 26%|█████████████████████▍                                                             | 5186/20117 [3:14:38<9:27:36,  2.28s/it] 26%|█████████████████████▍                                                             | 5187/20117 [3:14:40<9:20:59,  2.25s/it] 26%|█████████████████████▍                                                             | 5188/20117 [3:14:43<9:17:41,  2.24s/it] 26%|█████████████████████▍                                                             | 5189/20117 [3:14:45<9:25:25,  2.27s/it] 26%|█████████████████████▍                                                             | 5190/20117 [3:14:47<9:20:41,  2.25s/it]                                                                                                                                 {'loss': 0.2478, 'grad_norm': 0.5541722178459167, 'learning_rate': 0.00016976391444977425, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 316.46, 'epoch': 0.52}
 26%|█████████████████████▍                                                             | 5190/20117 [3:14:47<9:20:41,  2.25s/it] 26%|█████████████████████▍                                                             | 5191/20117 [3:14:49<9:21:46,  2.26s/it] 26%|█████████████████████▍                                                             | 5192/20117 [3:14:52<9:25:55,  2.28s/it] 26%|█████████████████████▍                                                             | 5193/20117 [3:14:54<9:24:25,  2.27s/it] 26%|█████████████████████▍                                                             | 5194/20117 [3:14:56<9:19:21,  2.25s/it] 26%|█████████████████████▍                                                             | 5195/20117 [3:14:58<9:22:57,  2.26s/it] 26%|█████████████████████▍                                                             | 5196/20117 [3:15:01<9:22:03,  2.26s/it] 26%|█████████████████████▍                                                             | 5197/20117 [3:15:03<9:24:33,  2.27s/it] 26%|█████████████████████▍                                                             | 5198/20117 [3:15:05<9:26:40,  2.28s/it] 26%|█████████████████████▍                                                             | 5199/20117 [3:15:07<9:21:51,  2.26s/it] 26%|█████████████████████▍                                                             | 5200/20117 [3:15:10<9:19:04,  2.25s/it]                                                                                                                                 {'loss': 0.2371, 'grad_norm': 1.0252865552902222, 'learning_rate': 0.00016965138455614525, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 339.79, 'epoch': 0.52}
 26%|█████████████████████▍                                                             | 5200/20117 [3:15:10<9:19:04,  2.25s/it] 26%|█████████████████████▍                                                             | 5201/20117 [3:15:12<9:22:48,  2.26s/it] 26%|█████████████████████▍                                                             | 5202/20117 [3:15:14<9:15:43,  2.24s/it] 26%|█████████████████████▍                                                             | 5203/20117 [3:15:16<9:14:10,  2.23s/it] 26%|█████████████████████▍                                                             | 5204/20117 [3:15:19<9:09:41,  2.21s/it] 26%|█████████████████████▍                                                             | 5205/20117 [3:15:21<9:11:17,  2.22s/it] 26%|█████████████████████▍                                                             | 5206/20117 [3:15:23<9:13:04,  2.23s/it] 26%|█████████████████████▍                                                             | 5207/20117 [3:15:25<9:08:47,  2.21s/it] 26%|█████████████████████▍                                                             | 5208/20117 [3:15:27<9:07:26,  2.20s/it] 26%|█████████████████████▍                                                             | 5209/20117 [3:15:30<9:08:11,  2.21s/it] 26%|█████████████████████▍                                                             | 5210/20117 [3:15:32<9:06:54,  2.20s/it]                                                                                                                                 {'loss': 0.2311, 'grad_norm': 0.582073450088501, 'learning_rate': 0.00016953868309643491, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.2, 'epoch': 0.52}
 26%|█████████████████████▍                                                             | 5210/20117 [3:15:32<9:06:54,  2.20s/it] 26%|█████████████████████▍                                                             | 5211/20117 [3:15:34<9:12:02,  2.22s/it] 26%|█████████████████████▌                                                             | 5212/20117 [3:15:36<9:09:38,  2.21s/it] 26%|█████████████████████▌                                                             | 5213/20117 [3:15:38<9:09:58,  2.21s/it] 26%|█████████████████████▌                                                             | 5214/20117 [3:15:41<9:14:35,  2.23s/it] 26%|█████████████████████▌                                                             | 5215/20117 [3:15:43<9:42:33,  2.35s/it] 26%|█████████████████████▌                                                             | 5216/20117 [3:15:46<9:35:25,  2.32s/it] 26%|█████████████████████▌                                                             | 5217/20117 [3:15:48<9:25:01,  2.28s/it] 26%|█████████████████████▌                                                             | 5218/20117 [3:15:50<9:20:24,  2.26s/it] 26%|█████████████████████▌                                                             | 5219/20117 [3:15:52<9:20:18,  2.26s/it] 26%|█████████████████████▌                                                             | 5220/20117 [3:15:55<9:23:40,  2.27s/it]                                                                                                                                 {'loss': 0.2693, 'grad_norm': 0.5147042870521545, 'learning_rate': 0.0001694258103482508, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.55, 'epoch': 0.52}
 26%|█████████████████████▌                                                             | 5220/20117 [3:15:55<9:23:40,  2.27s/it] 26%|█████████████████████▌                                                             | 5221/20117 [3:15:57<9:18:17,  2.25s/it] 26%|█████████████████████▌                                                             | 5222/20117 [3:15:59<9:16:04,  2.24s/it] 26%|█████████████████████▌                                                             | 5223/20117 [3:16:01<9:10:33,  2.22s/it] 26%|█████████████████████▌                                                             | 5224/20117 [3:16:03<9:07:35,  2.21s/it] 26%|█████████████████████▌                                                             | 5225/20117 [3:16:06<9:07:57,  2.21s/it] 26%|█████████████████████▌                                                             | 5226/20117 [3:16:08<9:05:29,  2.20s/it] 26%|█████████████████████▌                                                             | 5227/20117 [3:16:10<9:03:12,  2.19s/it] 26%|█████████████████████▌                                                             | 5228/20117 [3:16:12<9:09:24,  2.21s/it] 26%|█████████████████████▌                                                             | 5229/20117 [3:16:14<9:12:45,  2.23s/it] 26%|█████████████████████▌                                                             | 5230/20117 [3:16:17<9:10:02,  2.22s/it]                                                                                                                                 {'loss': 0.2228, 'grad_norm': 0.6816319823265076, 'learning_rate': 0.0001693127665896223, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 296.31, 'epoch': 0.52}
 26%|█████████████████████▌                                                             | 5230/20117 [3:16:17<9:10:02,  2.22s/it] 26%|█████████████████████▌                                                             | 5231/20117 [3:16:19<9:13:24,  2.23s/it] 26%|█████████████████████▌                                                             | 5232/20117 [3:16:21<9:13:49,  2.23s/it] 26%|█████████████████████▌                                                             | 5233/20117 [3:16:23<9:16:46,  2.24s/it] 26%|█████████████████████▌                                                             | 5234/20117 [3:16:26<9:18:52,  2.25s/it] 26%|█████████████████████▌                                                             | 5235/20117 [3:16:28<9:14:52,  2.24s/it] 26%|█████████████████████▌                                                             | 5236/20117 [3:16:30<9:15:34,  2.24s/it] 26%|█████████████████████▌                                                             | 5237/20117 [3:16:32<9:19:01,  2.25s/it] 26%|█████████████████████▌                                                             | 5238/20117 [3:16:35<9:18:19,  2.25s/it] 26%|█████████████████████▌                                                             | 5239/20117 [3:16:37<9:15:00,  2.24s/it] 26%|█████████████████████▌                                                             | 5240/20117 [3:16:39<9:16:44,  2.25s/it]                                                                                                                                 {'loss': 0.2764, 'grad_norm': 0.765450656414032, 'learning_rate': 0.00016919955209900012, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 389.55, 'epoch': 0.52}
 26%|█████████████████████▌                                                             | 5240/20117 [3:16:39<9:16:44,  2.25s/it] 26%|█████████████████████▌                                                             | 5241/20117 [3:16:41<9:11:44,  2.23s/it] 26%|█████████████████████▋                                                             | 5242/20117 [3:16:44<9:10:47,  2.22s/it] 26%|█████████████████████▋                                                             | 5243/20117 [3:16:46<9:15:14,  2.24s/it] 26%|█████████████████████▋                                                             | 5244/20117 [3:16:48<9:15:59,  2.24s/it] 26%|█████████████████████▋                                                             | 5245/20117 [3:16:50<9:18:51,  2.25s/it] 26%|█████████████████████▋                                                             | 5246/20117 [3:16:53<9:22:34,  2.27s/it] 26%|█████████████████████▋                                                             | 5247/20117 [3:16:55<9:16:00,  2.24s/it] 26%|█████████████████████▋                                                             | 5248/20117 [3:16:57<9:16:12,  2.24s/it] 26%|█████████████████████▋                                                             | 5249/20117 [3:16:59<9:15:07,  2.24s/it] 26%|█████████████████████▋                                                             | 5250/20117 [3:17:02<9:22:04,  2.27s/it]                                                                                                                                 {'loss': 0.3197, 'grad_norm': 0.3426934778690338, 'learning_rate': 0.00016908616715525544, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 404.47, 'epoch': 0.52}
 26%|█████████████████████▋                                                             | 5250/20117 [3:17:02<9:22:04,  2.27s/it] 26%|█████████████████████▋                                                             | 5251/20117 [3:17:04<9:22:04,  2.27s/it] 26%|█████████████████████▋                                                             | 5252/20117 [3:17:06<9:16:27,  2.25s/it] 26%|█████████████████████▋                                                             | 5253/20117 [3:17:08<9:17:44,  2.25s/it] 26%|█████████████████████▋                                                             | 5254/20117 [3:17:11<9:20:33,  2.26s/it] 26%|█████████████████████▋                                                             | 5255/20117 [3:17:13<9:17:13,  2.25s/it] 26%|█████████████████████▋                                                             | 5256/20117 [3:17:15<9:22:05,  2.27s/it] 26%|█████████████████████▋                                                             | 5257/20117 [3:17:17<9:21:49,  2.27s/it] 26%|█████████████████████▋                                                             | 5258/20117 [3:17:20<9:16:52,  2.25s/it] 26%|█████████████████████▋                                                             | 5259/20117 [3:17:22<9:14:54,  2.24s/it] 26%|█████████████████████▋                                                             | 5260/20117 [3:17:24<9:14:37,  2.24s/it]                                                                                                                                 {'loss': 0.2624, 'grad_norm': 0.45074665546417236, 'learning_rate': 0.0001689726120376794, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 362.3, 'epoch': 0.52}
 26%|█████████████████████▋                                                             | 5260/20117 [3:17:24<9:14:37,  2.24s/it] 26%|█████████████████████▋                                                             | 5261/20117 [3:17:26<9:15:20,  2.24s/it] 26%|█████████████████████▋                                                             | 5262/20117 [3:17:29<9:11:05,  2.23s/it] 26%|█████████████████████▋                                                             | 5263/20117 [3:17:31<9:11:43,  2.23s/it] 26%|█████████████████████▋                                                             | 5264/20117 [3:17:33<9:09:34,  2.22s/it] 26%|█████████████████████▋                                                             | 5265/20117 [3:17:35<9:09:01,  2.22s/it] 26%|█████████████████████▋                                                             | 5266/20117 [3:17:37<9:13:21,  2.24s/it] 26%|█████████████████████▋                                                             | 5267/20117 [3:17:40<9:16:44,  2.25s/it] 26%|█████████████████████▋                                                             | 5268/20117 [3:17:42<9:19:55,  2.26s/it] 26%|█████████████████████▋                                                             | 5269/20117 [3:17:45<9:40:45,  2.35s/it] 26%|█████████████████████▋                                                             | 5270/20117 [3:17:47<9:29:28,  2.30s/it]                                                                                                                                 {'loss': 0.2068, 'grad_norm': 0.4062357246875763, 'learning_rate': 0.00016885888702598218, 'memory/max_active (GiB)': 20.52, 'memory/max_allocated (GiB)': 20.52, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 383.64, 'epoch': 0.52}
 26%|█████████████████████▋                                                             | 5270/20117 [3:17:47<9:29:28,  2.30s/it] 26%|█████████████████████▋                                                             | 5271/20117 [3:17:49<9:22:30,  2.27s/it] 26%|█████████████████████▊                                                             | 5272/20117 [3:17:51<9:19:14,  2.26s/it] 26%|█████████████████████▊                                                             | 5273/20117 [3:17:53<9:19:08,  2.26s/it] 26%|█████████████████████▊                                                             | 5274/20117 [3:17:56<9:18:59,  2.26s/it] 26%|█████████████████████▊                                                             | 5275/20117 [3:17:58<9:19:32,  2.26s/it] 26%|█████████████████████▊                                                             | 5276/20117 [3:18:00<9:18:26,  2.26s/it] 26%|█████████████████████▊                                                             | 5277/20117 [3:18:03<9:19:30,  2.26s/it] 26%|█████████████████████▊                                                             | 5278/20117 [3:18:05<9:20:51,  2.27s/it] 26%|█████████████████████▊                                                             | 5279/20117 [3:18:07<9:17:14,  2.25s/it] 26%|█████████████████████▊                                                             | 5280/20117 [3:18:09<9:21:21,  2.27s/it]                                                                                                                                 {'loss': 0.2886, 'grad_norm': 0.4385395050048828, 'learning_rate': 0.00016874499240029253, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 332.1, 'epoch': 0.52}
 26%|█████████████████████▊                                                             | 5280/20117 [3:18:09<9:21:21,  2.27s/it] 26%|█████████████████████▊                                                             | 5281/20117 [3:18:12<9:15:23,  2.25s/it] 26%|█████████████████████▊                                                             | 5282/20117 [3:18:14<9:14:55,  2.24s/it] 26%|█████████████████████▊                                                             | 5283/20117 [3:18:16<9:14:13,  2.24s/it] 26%|█████████████████████▊                                                             | 5284/20117 [3:18:18<9:14:26,  2.24s/it] 26%|█████████████████████▊                                                             | 5285/20117 [3:18:20<9:13:25,  2.24s/it] 26%|█████████████████████▊                                                             | 5286/20117 [3:18:23<9:11:01,  2.23s/it] 26%|█████████████████████▊                                                             | 5287/20117 [3:18:25<9:13:37,  2.24s/it] 26%|█████████████████████▊                                                             | 5288/20117 [3:18:27<9:17:49,  2.26s/it] 26%|█████████████████████▊                                                             | 5289/20117 [3:18:30<9:21:50,  2.27s/it] 26%|█████████████████████▊                                                             | 5290/20117 [3:18:32<9:26:02,  2.29s/it]                                                                                                                                 {'loss': 0.2305, 'grad_norm': 0.4379644989967346, 'learning_rate': 0.0001686309284411571, 'memory/max_active (GiB)': 19.08, 'memory/max_allocated (GiB)': 19.08, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 328.75, 'epoch': 0.53}
 26%|█████████████████████▊                                                             | 5290/20117 [3:18:32<9:26:02,  2.29s/it] 26%|█████████████████████▊                                                             | 5291/20117 [3:18:34<9:23:12,  2.28s/it] 26%|█████████████████████▊                                                             | 5292/20117 [3:18:36<9:25:59,  2.29s/it] 26%|█████████████████████▊                                                             | 5293/20117 [3:18:39<9:25:59,  2.29s/it] 26%|█████████████████████▊                                                             | 5294/20117 [3:18:41<9:20:24,  2.27s/it] 26%|█████████████████████▊                                                             | 5295/20117 [3:18:43<9:19:15,  2.26s/it] 26%|█████████████████████▊                                                             | 5296/20117 [3:18:45<9:17:51,  2.26s/it] 26%|█████████████████████▊                                                             | 5297/20117 [3:18:48<9:11:24,  2.23s/it] 26%|█████████████████████▊                                                             | 5298/20117 [3:18:50<9:15:44,  2.25s/it] 26%|█████████████████████▊                                                             | 5299/20117 [3:18:52<9:13:44,  2.24s/it] 26%|█████████████████████▊                                                             | 5300/20117 [3:18:54<9:12:20,  2.24s/it]                                                                                                                                 {'loss': 0.2526, 'grad_norm': 0.5357609987258911, 'learning_rate': 0.00016851669542953935, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 396.9, 'epoch': 0.53}
 26%|█████████████████████▊                                                             | 5300/20117 [3:18:54<9:12:20,  2.24s/it] 26%|█████████████████████▊                                                             | 5301/20117 [3:18:57<9:09:37,  2.23s/it] 26%|█████████████████████▉                                                             | 5302/20117 [3:18:59<9:08:15,  2.22s/it] 26%|█████████████████████▉                                                             | 5303/20117 [3:19:01<9:11:31,  2.23s/it] 26%|█████████████████████▉                                                             | 5304/20117 [3:19:03<9:14:12,  2.24s/it] 26%|█████████████████████▉                                                             | 5305/20117 [3:19:06<9:13:31,  2.24s/it] 26%|█████████████████████▉                                                             | 5306/20117 [3:19:08<9:14:02,  2.24s/it] 26%|█████████████████████▉                                                             | 5307/20117 [3:19:10<9:18:22,  2.26s/it] 26%|█████████████████████▉                                                             | 5308/20117 [3:19:12<9:12:05,  2.24s/it] 26%|█████████████████████▉                                                             | 5309/20117 [3:19:14<9:10:00,  2.23s/it] 26%|█████████████████████▉                                                             | 5310/20117 [3:19:17<9:05:15,  2.21s/it]                                                                                                                                 {'loss': 0.193, 'grad_norm': 0.5790386199951172, 'learning_rate': 0.00016840229364681948, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 313.5, 'epoch': 0.53}
 26%|█████████████████████▉                                                             | 5310/20117 [3:19:17<9:05:15,  2.21s/it] 26%|█████████████████████▉                                                             | 5311/20117 [3:19:19<9:02:21,  2.20s/it] 26%|█████████████████████▉                                                             | 5312/20117 [3:19:21<9:00:53,  2.19s/it] 26%|█████████████████████▉                                                             | 5313/20117 [3:19:23<9:03:31,  2.20s/it] 26%|█████████████████████▉                                                             | 5314/20117 [3:19:25<9:04:06,  2.21s/it] 26%|█████████████████████▉                                                             | 5315/20117 [3:19:28<9:02:01,  2.20s/it] 26%|█████████████████████▉                                                             | 5316/20117 [3:19:30<9:04:51,  2.21s/it] 26%|█████████████████████▉                                                             | 5317/20117 [3:19:32<9:09:28,  2.23s/it] 26%|█████████████████████▉                                                             | 5318/20117 [3:19:34<9:12:51,  2.24s/it] 26%|█████████████████████▉                                                             | 5319/20117 [3:19:37<9:14:14,  2.25s/it] 26%|█████████████████████▉                                                             | 5320/20117 [3:19:39<9:41:30,  2.36s/it]                                                                                                                                 {'loss': 0.2071, 'grad_norm': 0.4063149690628052, 'learning_rate': 0.00016828772337479318, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 290.65, 'epoch': 0.53}
 26%|█████████████████████▉                                                             | 5320/20117 [3:19:39<9:41:30,  2.36s/it] 26%|█████████████████████▉                                                             | 5321/20117 [3:19:42<9:36:12,  2.34s/it] 26%|█████████████████████▉                                                             | 5322/20117 [3:19:44<9:32:42,  2.32s/it] 26%|█████████████████████▉                                                             | 5323/20117 [3:19:46<9:30:08,  2.31s/it] 26%|█████████████████████▉                                                             | 5324/20117 [3:19:48<9:29:27,  2.31s/it] 26%|█████████████████████▉                                                             | 5325/20117 [3:19:51<9:26:25,  2.30s/it] 26%|█████████████████████▉                                                             | 5326/20117 [3:19:53<9:23:04,  2.28s/it] 26%|█████████████████████▉                                                             | 5327/20117 [3:19:55<9:22:14,  2.28s/it] 26%|█████████████████████▉                                                             | 5328/20117 [3:19:57<9:16:33,  2.26s/it] 26%|█████████████████████▉                                                             | 5329/20117 [3:20:00<9:13:23,  2.25s/it] 26%|█████████████████████▉                                                             | 5330/20117 [3:20:02<9:12:59,  2.24s/it]                                                                                                                                 {'loss': 0.2086, 'grad_norm': 0.3840952515602112, 'learning_rate': 0.00016817298489567127, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 372.36, 'epoch': 0.53}
 26%|█████████████████████▉                                                             | 5330/20117 [3:20:02<9:12:59,  2.24s/it] 26%|█████████████████████▉                                                             | 5331/20117 [3:20:04<9:18:56,  2.27s/it] 27%|█████████████████████▉                                                             | 5332/20117 [3:20:06<9:19:12,  2.27s/it] 27%|██████████████████████                                                             | 5333/20117 [3:20:09<9:30:37,  2.32s/it] 27%|██████████████████████                                                             | 5334/20117 [3:20:11<9:31:39,  2.32s/it] 27%|██████████████████████                                                             | 5335/20117 [3:20:14<9:35:51,  2.34s/it] 27%|██████████████████████                                                             | 5336/20117 [3:20:16<9:31:43,  2.32s/it] 27%|██████████████████████                                                             | 5337/20117 [3:20:18<9:26:56,  2.30s/it] 27%|██████████████████████                                                             | 5338/20117 [3:20:20<9:26:57,  2.30s/it] 27%|██████████████████████                                                             | 5339/20117 [3:20:23<9:32:20,  2.32s/it] 27%|██████████████████████                                                             | 5340/20117 [3:20:25<9:28:05,  2.31s/it]                                                                                                                                 {'loss': 0.2556, 'grad_norm': 0.36974212527275085, 'learning_rate': 0.0001680580784920789, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.63, 'epoch': 0.53}
 27%|██████████████████████                                                             | 5340/20117 [3:20:25<9:28:05,  2.31s/it] 27%|██████████████████████                                                             | 5341/20117 [3:20:27<9:28:21,  2.31s/it] 27%|██████████████████████                                                             | 5342/20117 [3:20:30<9:24:33,  2.29s/it] 27%|██████████████████████                                                             | 5343/20117 [3:20:32<9:20:55,  2.28s/it] 27%|██████████████████████                                                             | 5344/20117 [3:20:34<9:19:21,  2.27s/it] 27%|██████████████████████                                                             | 5345/20117 [3:20:36<9:22:32,  2.28s/it] 27%|██████████████████████                                                             | 5346/20117 [3:20:39<9:21:17,  2.28s/it] 27%|██████████████████████                                                             | 5347/20117 [3:20:41<9:18:01,  2.27s/it] 27%|██████████████████████                                                             | 5348/20117 [3:20:43<9:16:19,  2.26s/it] 27%|██████████████████████                                                             | 5349/20117 [3:20:45<9:14:49,  2.25s/it] 27%|██████████████████████                                                             | 5350/20117 [3:20:48<9:17:01,  2.26s/it]                                                                                                                                 {'loss': 0.2667, 'grad_norm': 0.5495437979698181, 'learning_rate': 0.00016794300444705477, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.36, 'epoch': 0.53}
 27%|██████████████████████                                                             | 5350/20117 [3:20:48<9:17:01,  2.26s/it] 27%|██████████████████████                                                             | 5351/20117 [3:20:50<9:21:46,  2.28s/it] 27%|██████████████████████                                                             | 5352/20117 [3:20:52<9:23:52,  2.29s/it] 27%|██████████████████████                                                             | 5353/20117 [3:20:55<9:22:22,  2.29s/it] 27%|██████████████████████                                                             | 5354/20117 [3:20:57<9:27:41,  2.31s/it] 27%|██████████████████████                                                             | 5355/20117 [3:20:59<9:27:46,  2.31s/it] 27%|██████████████████████                                                             | 5356/20117 [3:21:02<9:26:10,  2.30s/it] 27%|██████████████████████                                                             | 5357/20117 [3:21:04<9:24:34,  2.30s/it] 27%|██████████████████████                                                             | 5358/20117 [3:21:06<9:28:10,  2.31s/it] 27%|██████████████████████                                                             | 5359/20117 [3:21:09<9:28:15,  2.31s/it] 27%|██████████████████████                                                             | 5360/20117 [3:21:11<9:28:04,  2.31s/it]                                                                                                                                 {'loss': 0.2311, 'grad_norm': 0.5131349563598633, 'learning_rate': 0.0001678277630440506, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.39, 'epoch': 0.53}
 27%|██████████████████████                                                             | 5360/20117 [3:21:11<9:28:04,  2.31s/it] 27%|██████████████████████                                                             | 5361/20117 [3:21:13<9:30:14,  2.32s/it] 27%|██████████████████████                                                             | 5362/20117 [3:21:16<9:28:58,  2.31s/it] 27%|██████████████████████▏                                                            | 5363/20117 [3:21:18<9:29:09,  2.31s/it] 27%|██████████████████████▏                                                            | 5364/20117 [3:21:20<9:25:26,  2.30s/it] 27%|██████████████████████▏                                                            | 5365/20117 [3:21:22<9:22:00,  2.29s/it] 27%|██████████████████████▏                                                            | 5366/20117 [3:21:25<9:19:53,  2.28s/it] 27%|██████████████████████▏                                                            | 5367/20117 [3:21:27<9:23:51,  2.29s/it] 27%|██████████████████████▏                                                            | 5368/20117 [3:21:29<9:23:21,  2.29s/it] 27%|██████████████████████▏                                                            | 5369/20117 [3:21:32<9:22:42,  2.29s/it] 27%|██████████████████████▏                                                            | 5370/20117 [3:21:34<9:17:34,  2.27s/it]                                                                                                                                 {'loss': 0.2532, 'grad_norm': 0.28221410512924194, 'learning_rate': 0.00016771235456693035, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.44, 'epoch': 0.53}
 27%|██████████████████████▏                                                            | 5370/20117 [3:21:34<9:17:34,  2.27s/it] 27%|██████████████████████▏                                                            | 5371/20117 [3:21:36<9:21:03,  2.28s/it] 27%|██████████████████████▏                                                            | 5372/20117 [3:21:38<9:20:44,  2.28s/it] 27%|██████████████████████▏                                                            | 5373/20117 [3:21:41<9:44:24,  2.38s/it] 27%|██████████████████████▏                                                            | 5374/20117 [3:21:43<9:36:13,  2.35s/it] 27%|██████████████████████▏                                                            | 5375/20117 [3:21:45<9:26:06,  2.30s/it] 27%|██████████████████████▏                                                            | 5376/20117 [3:21:48<9:22:05,  2.29s/it] 27%|██████████████████████▏                                                            | 5377/20117 [3:21:50<9:23:08,  2.29s/it] 27%|██████████████████████▏                                                            | 5378/20117 [3:21:52<9:21:39,  2.29s/it] 27%|██████████████████████▏                                                            | 5379/20117 [3:21:54<9:17:18,  2.27s/it] 27%|██████████████████████▏                                                            | 5380/20117 [3:21:57<9:12:39,  2.25s/it]                                                                                                                                 {'loss': 0.2999, 'grad_norm': 0.5543202757835388, 'learning_rate': 0.0001675967792999695, 'memory/max_active (GiB)': 19.19, 'memory/max_allocated (GiB)': 19.19, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 393.99, 'epoch': 0.53}
 27%|██████████████████████▏                                                            | 5380/20117 [3:21:57<9:12:39,  2.25s/it] 27%|██████████████████████▏                                                            | 5381/20117 [3:21:59<9:15:35,  2.26s/it] 27%|██████████████████████▏                                                            | 5382/20117 [3:22:01<9:16:39,  2.27s/it] 27%|██████████████████████▏                                                            | 5383/20117 [3:22:03<9:16:36,  2.27s/it] 27%|██████████████████████▏                                                            | 5384/20117 [3:22:06<9:16:55,  2.27s/it] 27%|██████████████████████▏                                                            | 5385/20117 [3:22:08<9:17:39,  2.27s/it] 27%|██████████████████████▏                                                            | 5386/20117 [3:22:10<9:21:10,  2.29s/it] 27%|██████████████████████▏                                                            | 5387/20117 [3:22:13<9:17:23,  2.27s/it] 27%|██████████████████████▏                                                            | 5388/20117 [3:22:15<9:19:52,  2.28s/it] 27%|██████████████████████▏                                                            | 5389/20117 [3:22:17<9:21:24,  2.29s/it] 27%|██████████████████████▏                                                            | 5390/20117 [3:22:20<9:22:24,  2.29s/it]                                                                                                                                 {'loss': 0.2071, 'grad_norm': 0.3077482283115387, 'learning_rate': 0.00016748103752785426, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 372.33, 'epoch': 0.54}
 27%|██████████████████████▏                                                            | 5390/20117 [3:22:20<9:22:24,  2.29s/it] 27%|██████████████████████▏                                                            | 5391/20117 [3:22:22<9:19:05,  2.28s/it] 27%|██████████████████████▏                                                            | 5392/20117 [3:22:24<9:16:45,  2.27s/it] 27%|██████████████████████▎                                                            | 5393/20117 [3:22:26<9:15:32,  2.26s/it] 27%|██████████████████████▎                                                            | 5394/20117 [3:22:29<9:19:15,  2.28s/it] 27%|██████████████████████▎                                                            | 5395/20117 [3:22:31<9:13:38,  2.26s/it] 27%|██████████████████████▎                                                            | 5396/20117 [3:22:33<9:16:11,  2.27s/it] 27%|██████████████████████▎                                                            | 5397/20117 [3:22:35<9:17:33,  2.27s/it] 27%|██████████████████████▎                                                            | 5398/20117 [3:22:38<9:12:41,  2.25s/it] 27%|██████████████████████▎                                                            | 5399/20117 [3:22:40<9:18:39,  2.28s/it] 27%|██████████████████████▎                                                            | 5400/20117 [3:22:42<9:15:08,  2.26s/it]                                                                                                                                 {'loss': 0.1986, 'grad_norm': 0.6371411681175232, 'learning_rate': 0.00016736512953568117, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 325.81, 'epoch': 0.54}
 27%|██████████████████████▎                                                            | 5400/20117 [3:22:42<9:15:08,  2.26s/it] 27%|██████████████████████▎                                                            | 5401/20117 [3:22:44<9:18:35,  2.28s/it] 27%|██████████████████████▎                                                            | 5402/20117 [3:22:47<9:17:45,  2.27s/it] 27%|██████████████████████▎                                                            | 5403/20117 [3:22:49<9:15:21,  2.26s/it] 27%|██████████████████████▎                                                            | 5404/20117 [3:22:51<9:14:50,  2.26s/it] 27%|██████████████████████▎                                                            | 5405/20117 [3:22:53<9:17:02,  2.27s/it] 27%|██████████████████████▎                                                            | 5406/20117 [3:22:56<9:19:36,  2.28s/it] 27%|██████████████████████▎                                                            | 5407/20117 [3:22:58<9:21:25,  2.29s/it] 27%|██████████████████████▎                                                            | 5408/20117 [3:23:00<9:18:36,  2.28s/it] 27%|██████████████████████▎                                                            | 5409/20117 [3:23:03<9:22:12,  2.29s/it] 27%|██████████████████████▎                                                            | 5410/20117 [3:23:05<9:19:08,  2.28s/it]                                                                                                                                 {'loss': 0.253, 'grad_norm': 0.4577232003211975, 'learning_rate': 0.0001672490556089561, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 388.89, 'epoch': 0.54}
 27%|██████████████████████▎                                                            | 5410/20117 [3:23:05<9:19:08,  2.28s/it] 27%|██████████████████████▎                                                            | 5411/20117 [3:23:07<9:15:55,  2.27s/it] 27%|██████████████████████▎                                                            | 5412/20117 [3:23:09<9:17:07,  2.27s/it] 27%|██████████████████████▎                                                            | 5413/20117 [3:23:12<9:21:30,  2.29s/it] 27%|██████████████████████▎                                                            | 5414/20117 [3:23:14<9:20:49,  2.29s/it] 27%|██████████████████████▎                                                            | 5415/20117 [3:23:16<9:22:49,  2.30s/it] 27%|██████████████████████▎                                                            | 5416/20117 [3:23:19<9:21:18,  2.29s/it] 27%|██████████████████████▎                                                            | 5417/20117 [3:23:21<9:23:25,  2.30s/it] 27%|██████████████████████▎                                                            | 5418/20117 [3:23:23<9:21:56,  2.29s/it] 27%|██████████████████████▎                                                            | 5419/20117 [3:23:26<9:18:33,  2.28s/it] 27%|██████████████████████▎                                                            | 5420/20117 [3:23:28<9:23:03,  2.30s/it]                                                                                                                                 {'loss': 0.1994, 'grad_norm': 0.2831481099128723, 'learning_rate': 0.00016713281603359366, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 373.61, 'epoch': 0.54}
 27%|██████████████████████▎                                                            | 5420/20117 [3:23:28<9:23:03,  2.30s/it] 27%|██████████████████████▎                                                            | 5421/20117 [3:23:30<9:21:13,  2.29s/it] 27%|██████████████████████▎                                                            | 5422/20117 [3:23:32<9:17:49,  2.28s/it] 27%|██████████████████████▎                                                            | 5423/20117 [3:23:35<9:17:57,  2.28s/it] 27%|██████████████████████▍                                                            | 5424/20117 [3:23:37<9:19:47,  2.29s/it] 27%|██████████████████████▍                                                            | 5425/20117 [3:23:39<9:20:34,  2.29s/it] 27%|██████████████████████▍                                                            | 5426/20117 [3:23:42<9:42:05,  2.38s/it] 27%|██████████████████████▍                                                            | 5427/20117 [3:23:44<9:32:16,  2.34s/it] 27%|██████████████████████▍                                                            | 5428/20117 [3:23:46<9:21:04,  2.29s/it] 27%|██████████████████████▍                                                            | 5429/20117 [3:23:49<9:23:00,  2.30s/it] 27%|██████████████████████▍                                                            | 5430/20117 [3:23:51<9:20:31,  2.29s/it]                                                                                                                                 {'loss': 0.1997, 'grad_norm': 0.45899850130081177, 'learning_rate': 0.00016701641109591648, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 386.1, 'epoch': 0.54}
 27%|██████████████████████▍                                                            | 5430/20117 [3:23:51<9:20:31,  2.29s/it] 27%|██████████████████████▍                                                            | 5431/20117 [3:23:53<9:20:52,  2.29s/it] 27%|██████████████████████▍                                                            | 5432/20117 [3:23:55<9:21:52,  2.30s/it] 27%|██████████████████████▍                                                            | 5433/20117 [3:23:58<9:22:06,  2.30s/it] 27%|██████████████████████▍                                                            | 5434/20117 [3:24:00<9:23:13,  2.30s/it] 27%|██████████████████████▍                                                            | 5435/20117 [3:24:02<9:19:59,  2.29s/it] 27%|██████████████████████▍                                                            | 5436/20117 [3:24:05<9:25:11,  2.31s/it] 27%|██████████████████████▍                                                            | 5437/20117 [3:24:07<9:19:23,  2.29s/it] 27%|██████████████████████▍                                                            | 5438/20117 [3:24:09<9:21:27,  2.29s/it] 27%|██████████████████████▍                                                            | 5439/20117 [3:24:12<9:19:14,  2.29s/it] 27%|██████████████████████▍                                                            | 5440/20117 [3:24:14<9:19:54,  2.29s/it]                                                                                                                                 {'loss': 0.276, 'grad_norm': 0.2898833453655243, 'learning_rate': 0.0001668998410826545, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 374.49, 'epoch': 0.54}
 27%|██████████████████████▍                                                            | 5440/20117 [3:24:14<9:19:54,  2.29s/it] 27%|██████████████████████▍                                                            | 5441/20117 [3:24:16<9:20:25,  2.29s/it] 27%|██████████████████████▍                                                            | 5442/20117 [3:24:18<9:17:30,  2.28s/it] 27%|██████████████████████▍                                                            | 5443/20117 [3:24:21<9:18:48,  2.28s/it] 27%|██████████████████████▍                                                            | 5444/20117 [3:24:23<9:18:28,  2.28s/it] 27%|██████████████████████▍                                                            | 5445/20117 [3:24:25<9:18:27,  2.28s/it] 27%|██████████████████████▍                                                            | 5446/20117 [3:24:28<9:19:38,  2.29s/it] 27%|██████████████████████▍                                                            | 5447/20117 [3:24:30<9:18:50,  2.29s/it] 27%|██████████████████████▍                                                            | 5448/20117 [3:24:32<9:20:04,  2.29s/it] 27%|██████████████████████▍                                                            | 5449/20117 [3:24:34<9:23:37,  2.31s/it] 27%|██████████████████████▍                                                            | 5450/20117 [3:24:37<9:21:39,  2.30s/it]                                                                                                                                 {'loss': 0.2529, 'grad_norm': 0.37996944785118103, 'learning_rate': 0.00016678310628094438, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.7, 'epoch': 0.54}
 27%|██████████████████████▍                                                            | 5450/20117 [3:24:37<9:21:39,  2.30s/it] 27%|██████████████████████▍                                                            | 5451/20117 [3:24:39<9:22:26,  2.30s/it] 27%|██████████████████████▍                                                            | 5452/20117 [3:24:41<9:17:47,  2.28s/it] 27%|██████████████████████▍                                                            | 5453/20117 [3:24:43<9:13:34,  2.27s/it] 27%|██████████████████████▌                                                            | 5454/20117 [3:24:46<9:18:39,  2.29s/it] 27%|██████████████████████▌                                                            | 5455/20117 [3:24:48<9:15:46,  2.27s/it] 27%|██████████████████████▌                                                            | 5456/20117 [3:24:50<9:14:03,  2.27s/it] 27%|██████████████████████▌                                                            | 5457/20117 [3:24:53<9:12:40,  2.26s/it] 27%|██████████████████████▌                                                            | 5458/20117 [3:24:55<9:12:59,  2.26s/it] 27%|██████████████████████▌                                                            | 5459/20117 [3:24:57<9:15:27,  2.27s/it] 27%|██████████████████████▌                                                            | 5460/20117 [3:24:59<9:19:31,  2.29s/it]                                                                                                                                 {'loss': 0.2682, 'grad_norm': 0.255087673664093, 'learning_rate': 0.0001666662069783285, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 346.66, 'epoch': 0.54}
 27%|██████████████████████▌                                                            | 5460/20117 [3:24:59<9:19:31,  2.29s/it] 27%|██████████████████████▌                                                            | 5461/20117 [3:25:02<9:16:41,  2.28s/it] 27%|██████████████████████▌                                                            | 5462/20117 [3:25:04<9:19:03,  2.29s/it] 27%|██████████████████████▌                                                            | 5463/20117 [3:25:06<9:18:09,  2.29s/it] 27%|██████████████████████▌                                                            | 5464/20117 [3:25:09<9:16:51,  2.28s/it] 27%|██████████████████████▌                                                            | 5465/20117 [3:25:11<9:14:13,  2.27s/it] 27%|██████████████████████▌                                                            | 5466/20117 [3:25:13<9:13:14,  2.27s/it] 27%|██████████████████████▌                                                            | 5467/20117 [3:25:15<9:10:50,  2.26s/it] 27%|██████████████████████▌                                                            | 5468/20117 [3:25:18<9:10:04,  2.25s/it] 27%|██████████████████████▌                                                            | 5469/20117 [3:25:20<9:14:08,  2.27s/it] 27%|██████████████████████▌                                                            | 5470/20117 [3:25:22<9:13:21,  2.27s/it]                                                                                                                                 {'loss': 0.2516, 'grad_norm': 0.4574085474014282, 'learning_rate': 0.00016654914346275466, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 350.42, 'epoch': 0.54}
 27%|██████████████████████▌                                                            | 5470/20117 [3:25:22<9:13:21,  2.27s/it] 27%|██████████████████████▌                                                            | 5471/20117 [3:25:24<9:18:22,  2.29s/it] 27%|██████████████████████▌                                                            | 5472/20117 [3:25:27<9:16:32,  2.28s/it] 27%|██████████████████████▌                                                            | 5473/20117 [3:25:29<9:20:59,  2.30s/it] 27%|██████████████████████▌                                                            | 5474/20117 [3:25:31<9:22:27,  2.30s/it] 27%|██████████████████████▌                                                            | 5475/20117 [3:25:34<9:21:13,  2.30s/it] 27%|██████████████████████▌                                                            | 5476/20117 [3:25:36<9:18:55,  2.29s/it] 27%|██████████████████████▌                                                            | 5477/20117 [3:25:38<9:14:07,  2.27s/it] 27%|██████████████████████▌                                                            | 5478/20117 [3:25:40<9:13:55,  2.27s/it] 27%|██████████████████████▌                                                            | 5479/20117 [3:25:43<9:19:05,  2.29s/it] 27%|██████████████████████▌                                                            | 5480/20117 [3:25:45<9:43:03,  2.39s/it]                                                                                                                                 {'loss': 0.2549, 'grad_norm': 0.5208483934402466, 'learning_rate': 0.00016643191602257496, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 320.05, 'epoch': 0.54}
 27%|██████████████████████▌                                                            | 5480/20117 [3:25:45<9:43:03,  2.39s/it] 27%|██████████████████████▌                                                            | 5481/20117 [3:25:48<9:36:37,  2.36s/it] 27%|██████████████████████▌                                                            | 5482/20117 [3:25:50<9:32:07,  2.35s/it] 27%|██████████████████████▌                                                            | 5483/20117 [3:25:52<9:32:56,  2.35s/it] 27%|██████████████████████▋                                                            | 5484/20117 [3:25:55<9:25:37,  2.32s/it] 27%|██████████████████████▋                                                            | 5485/20117 [3:25:57<9:22:17,  2.31s/it] 27%|██████████████████████▋                                                            | 5486/20117 [3:25:59<9:17:03,  2.28s/it] 27%|██████████████████████▋                                                            | 5487/20117 [3:26:01<9:15:48,  2.28s/it] 27%|██████████████████████▋                                                            | 5488/20117 [3:26:04<9:14:50,  2.28s/it] 27%|██████████████████████▋                                                            | 5489/20117 [3:26:06<9:10:51,  2.26s/it] 27%|██████████████████████▋                                                            | 5490/20117 [3:26:08<9:07:43,  2.25s/it]                                                                                                                                 {'loss': 0.2151, 'grad_norm': 0.31626808643341064, 'learning_rate': 0.00016631452494654541, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.31, 'epoch': 0.55}
 27%|██████████████████████▋                                                            | 5490/20117 [3:26:08<9:07:43,  2.25s/it] 27%|██████████████████████▋                                                            | 5491/20117 [3:26:10<9:09:58,  2.26s/it] 27%|██████████████████████▋                                                            | 5492/20117 [3:26:13<9:09:21,  2.25s/it] 27%|██████████████████████▋                                                            | 5493/20117 [3:26:15<9:05:11,  2.24s/it] 27%|██████████████████████▋                                                            | 5494/20117 [3:26:17<9:08:05,  2.25s/it] 27%|██████████████████████▋                                                            | 5495/20117 [3:26:19<9:07:11,  2.25s/it] 27%|██████████████████████▋                                                            | 5496/20117 [3:26:22<9:09:54,  2.26s/it] 27%|██████████████████████▋                                                            | 5497/20117 [3:26:24<9:07:54,  2.25s/it] 27%|██████████████████████▋                                                            | 5498/20117 [3:26:26<9:13:40,  2.27s/it] 27%|██████████████████████▋                                                            | 5499/20117 [3:26:28<9:07:59,  2.25s/it] 27%|██████████████████████▋                                                            | 5500/20117 [3:26:31<9:04:37,  2.24s/it]                                                                                                                                 {'loss': 0.2847, 'grad_norm': 0.28286507725715637, 'learning_rate': 0.000166196970523825, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 406.77, 'epoch': 0.55}
 27%|██████████████████████▋                                                            | 5500/20117 [3:26:31<9:04:37,  2.24s/it] 27%|██████████████████████▋                                                            | 5501/20117 [3:26:33<8:59:39,  2.22s/it] 27%|██████████████████████▋                                                            | 5502/20117 [3:26:35<9:00:06,  2.22s/it] 27%|██████████████████████▋                                                            | 5503/20117 [3:26:37<8:58:05,  2.21s/it] 27%|██████████████████████▋                                                            | 5504/20117 [3:26:39<8:58:55,  2.21s/it] 27%|██████████████████████▋                                                            | 5505/20117 [3:26:42<8:59:14,  2.21s/it] 27%|██████████████████████▋                                                            | 5506/20117 [3:26:44<9:00:35,  2.22s/it] 27%|██████████████████████▋                                                            | 5507/20117 [3:26:46<9:11:24,  2.26s/it] 27%|██████████████████████▋                                                            | 5508/20117 [3:26:49<9:20:51,  2.30s/it] 27%|██████████████████████▋                                                            | 5509/20117 [3:26:51<9:27:37,  2.33s/it] 27%|██████████████████████▋                                                            | 5510/20117 [3:26:53<9:25:11,  2.32s/it]                                                                                                                                 {'loss': 0.1912, 'grad_norm': 0.3647201955318451, 'learning_rate': 0.00016607925304397517, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.31, 'epoch': 0.55}
 27%|██████████████████████▋                                                            | 5510/20117 [3:26:53<9:25:11,  2.32s/it] 27%|██████████████████████▋                                                            | 5511/20117 [3:26:56<9:22:47,  2.31s/it] 27%|██████████████████████▋                                                            | 5512/20117 [3:26:58<9:24:21,  2.32s/it] 27%|██████████████████████▋                                                            | 5513/20117 [3:27:00<9:24:56,  2.32s/it] 27%|██████████████████████▊                                                            | 5514/20117 [3:27:03<9:29:56,  2.34s/it] 27%|██████████████████████▊                                                            | 5515/20117 [3:27:05<9:26:44,  2.33s/it] 27%|██████████████████████▊                                                            | 5516/20117 [3:27:07<9:30:28,  2.34s/it] 27%|██████████████████████▊                                                            | 5517/20117 [3:27:10<9:30:30,  2.34s/it] 27%|██████████████████████▊                                                            | 5518/20117 [3:27:12<9:23:22,  2.32s/it] 27%|██████████████████████▊                                                            | 5519/20117 [3:27:14<9:17:20,  2.29s/it] 27%|██████████████████████▊                                                            | 5520/20117 [3:27:16<9:14:40,  2.28s/it]                                                                                                                                 {'loss': 0.2336, 'grad_norm': 0.48184719681739807, 'learning_rate': 0.0001659613727969589, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 355.21, 'epoch': 0.55}
 27%|██████████████████████▊                                                            | 5520/20117 [3:27:16<9:14:40,  2.28s/it] 27%|██████████████████████▊                                                            | 5521/20117 [3:27:19<9:22:36,  2.31s/it] 27%|██████████████████████▊                                                            | 5522/20117 [3:27:21<9:21:07,  2.31s/it] 27%|██████████████████████▊                                                            | 5523/20117 [3:27:23<9:19:06,  2.30s/it] 27%|██████████████████████▊                                                            | 5524/20117 [3:27:26<9:27:07,  2.33s/it] 27%|██████████████████████▊                                                            | 5525/20117 [3:27:28<9:30:56,  2.35s/it] 27%|██████████████████████▊                                                            | 5526/20117 [3:27:30<9:31:17,  2.35s/it] 27%|██████████████████████▊                                                            | 5527/20117 [3:27:33<9:28:34,  2.34s/it] 27%|██████████████████████▊                                                            | 5528/20117 [3:27:35<9:26:05,  2.33s/it] 27%|██████████████████████▊                                                            | 5529/20117 [3:27:37<9:22:55,  2.32s/it] 27%|██████████████████████▊                                                            | 5530/20117 [3:27:40<9:22:28,  2.31s/it]                                                                                                                                 {'loss': 0.2764, 'grad_norm': 0.5385854244232178, 'learning_rate': 0.00016584333007314017, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 384.64, 'epoch': 0.55}
 27%|██████████████████████▊                                                            | 5530/20117 [3:27:40<9:22:28,  2.31s/it] 27%|██████████████████████▊                                                            | 5531/20117 [3:27:42<9:21:42,  2.31s/it] 27%|██████████████████████▊                                                            | 5532/20117 [3:27:45<9:43:53,  2.40s/it] 28%|██████████████████████▊                                                            | 5533/20117 [3:27:47<9:35:12,  2.37s/it] 28%|██████████████████████▊                                                            | 5534/20117 [3:27:49<9:28:49,  2.34s/it] 28%|██████████████████████▊                                                            | 5535/20117 [3:27:51<9:24:44,  2.32s/it] 28%|██████████████████████▊                                                            | 5536/20117 [3:27:54<9:24:40,  2.32s/it] 28%|██████████████████████▊                                                            | 5537/20117 [3:27:56<9:20:37,  2.31s/it] 28%|██████████████████████▊                                                            | 5538/20117 [3:27:58<9:16:56,  2.29s/it] 28%|██████████████████████▊                                                            | 5539/20117 [3:28:01<9:16:34,  2.29s/it] 28%|██████████████████████▊                                                            | 5540/20117 [3:28:03<9:10:36,  2.27s/it]                                                                                                                                 {'loss': 0.3002, 'grad_norm': 0.2823385000228882, 'learning_rate': 0.00016572512516328317, 'memory/max_active (GiB)': 18.82, 'memory/max_allocated (GiB)': 18.82, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 417.12, 'epoch': 0.55}
 28%|██████████████████████▊                                                            | 5540/20117 [3:28:03<9:10:36,  2.27s/it] 28%|██████████████████████▊                                                            | 5541/20117 [3:28:05<9:11:45,  2.27s/it] 28%|██████████████████████▊                                                            | 5542/20117 [3:28:07<9:10:04,  2.26s/it] 28%|██████████████████████▊                                                            | 5543/20117 [3:28:10<9:07:09,  2.25s/it] 28%|██████████████████████▊                                                            | 5544/20117 [3:28:12<9:12:16,  2.27s/it] 28%|██████████████████████▉                                                            | 5545/20117 [3:28:14<9:12:31,  2.27s/it] 28%|██████████████████████▉                                                            | 5546/20117 [3:28:16<9:14:41,  2.28s/it] 28%|██████████████████████▉                                                            | 5547/20117 [3:28:19<9:12:40,  2.28s/it] 28%|██████████████████████▉                                                            | 5548/20117 [3:28:21<9:11:26,  2.27s/it] 28%|██████████████████████▉                                                            | 5549/20117 [3:28:23<9:11:53,  2.27s/it] 28%|██████████████████████▉                                                            | 5550/20117 [3:28:26<9:17:42,  2.30s/it]                                                                                                                                 {'loss': 0.2066, 'grad_norm': 0.4133371114730835, 'learning_rate': 0.0001656067583585516, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 314.99, 'epoch': 0.55}
 28%|██████████████████████▉                                                            | 5550/20117 [3:28:26<9:17:42,  2.30s/it] 28%|██████████████████████▉                                                            | 5551/20117 [3:28:28<9:17:15,  2.30s/it] 28%|██████████████████████▉                                                            | 5552/20117 [3:28:30<9:16:33,  2.29s/it] 28%|██████████████████████▉                                                            | 5553/20117 [3:28:32<9:10:10,  2.27s/it] 28%|██████████████████████▉                                                            | 5554/20117 [3:28:35<9:06:44,  2.25s/it] 28%|██████████████████████▉                                                            | 5555/20117 [3:28:37<9:06:21,  2.25s/it] 28%|██████████████████████▉                                                            | 5556/20117 [3:28:39<9:11:09,  2.27s/it] 28%|██████████████████████▉                                                            | 5557/20117 [3:28:42<9:15:43,  2.29s/it] 28%|██████████████████████▉                                                            | 5558/20117 [3:28:44<9:16:05,  2.29s/it] 28%|██████████████████████▉                                                            | 5559/20117 [3:28:46<9:12:32,  2.28s/it] 28%|██████████████████████▉                                                            | 5560/20117 [3:28:48<9:09:00,  2.26s/it]                                                                                                                                 {'loss': 0.2582, 'grad_norm': 0.2247128188610077, 'learning_rate': 0.00016548822995050787, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 341.74, 'epoch': 0.55}
 28%|██████████████████████▉                                                            | 5560/20117 [3:28:48<9:09:00,  2.26s/it] 28%|██████████████████████▉                                                            | 5561/20117 [3:28:51<9:13:42,  2.28s/it] 28%|██████████████████████▉                                                            | 5562/20117 [3:28:53<9:19:50,  2.31s/it] 28%|██████████████████████▉                                                            | 5563/20117 [3:28:55<9:17:47,  2.30s/it] 28%|██████████████████████▉                                                            | 5564/20117 [3:28:58<9:17:32,  2.30s/it] 28%|██████████████████████▉                                                            | 5565/20117 [3:29:00<9:15:50,  2.29s/it] 28%|██████████████████████▉                                                            | 5566/20117 [3:29:02<9:14:18,  2.29s/it] 28%|██████████████████████▉                                                            | 5567/20117 [3:29:04<9:17:18,  2.30s/it] 28%|██████████████████████▉                                                            | 5568/20117 [3:29:07<9:25:18,  2.33s/it] 28%|██████████████████████▉                                                            | 5569/20117 [3:29:09<9:22:06,  2.32s/it] 28%|██████████████████████▉                                                            | 5570/20117 [3:29:11<9:18:40,  2.30s/it]                                                                                                                                 {'loss': 0.3169, 'grad_norm': 0.5088145136833191, 'learning_rate': 0.0001653695402311125, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.48, 'epoch': 0.55}
 28%|██████████████████████▉                                                            | 5570/20117 [3:29:11<9:18:40,  2.30s/it] 28%|██████████████████████▉                                                            | 5571/20117 [3:29:14<9:17:24,  2.30s/it] 28%|██████████████████████▉                                                            | 5572/20117 [3:29:16<9:19:52,  2.31s/it] 28%|██████████████████████▉                                                            | 5573/20117 [3:29:18<9:16:42,  2.30s/it] 28%|██████████████████████▉                                                            | 5574/20117 [3:29:21<9:14:25,  2.29s/it] 28%|███████████████████████                                                            | 5575/20117 [3:29:23<9:13:57,  2.29s/it] 28%|███████████████████████                                                            | 5576/20117 [3:29:25<9:11:14,  2.27s/it] 28%|███████████████████████                                                            | 5577/20117 [3:29:27<9:15:41,  2.29s/it] 28%|███████████████████████                                                            | 5578/20117 [3:29:30<9:16:28,  2.30s/it] 28%|███████████████████████                                                            | 5579/20117 [3:29:32<9:14:56,  2.29s/it] 28%|███████████████████████                                                            | 5580/20117 [3:29:34<9:16:21,  2.30s/it]                                                                                                                                 {'loss': 0.3391, 'grad_norm': 0.600646436214447, 'learning_rate': 0.0001652506894927234, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 338.0, 'epoch': 0.55}
 28%|███████████████████████                                                            | 5580/20117 [3:29:34<9:16:21,  2.30s/it] 28%|███████████████████████                                                            | 5581/20117 [3:29:37<9:13:04,  2.28s/it] 28%|███████████████████████                                                            | 5582/20117 [3:29:39<9:12:29,  2.28s/it] 28%|███████████████████████                                                            | 5583/20117 [3:29:41<9:11:18,  2.28s/it] 28%|███████████████████████                                                            | 5584/20117 [3:29:43<9:10:15,  2.27s/it] 28%|███████████████████████                                                            | 5585/20117 [3:29:46<9:34:58,  2.37s/it] 28%|███████████████████████                                                            | 5586/20117 [3:29:48<9:27:00,  2.34s/it] 28%|███████████████████████                                                            | 5587/20117 [3:29:51<9:25:11,  2.33s/it] 28%|███████████████████████                                                            | 5588/20117 [3:29:53<9:25:11,  2.33s/it] 28%|███████████████████████                                                            | 5589/20117 [3:29:55<9:26:01,  2.34s/it] 28%|███████████████████████                                                            | 5590/20117 [3:29:58<9:24:00,  2.33s/it]                                                                                                                                 {'loss': 0.2003, 'grad_norm': 0.34408292174339294, 'learning_rate': 0.00016513167802809502, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 341.35, 'epoch': 0.56}
 28%|███████████████████████                                                            | 5590/20117 [3:29:58<9:24:00,  2.33s/it] 28%|███████████████████████                                                            | 5591/20117 [3:30:00<9:21:33,  2.32s/it] 28%|███████████████████████                                                            | 5592/20117 [3:30:02<9:24:01,  2.33s/it] 28%|███████████████████████                                                            | 5593/20117 [3:30:05<9:27:58,  2.35s/it] 28%|███████████████████████                                                            | 5594/20117 [3:30:07<9:21:55,  2.32s/it] 28%|███████████████████████                                                            | 5595/20117 [3:30:09<9:18:40,  2.31s/it] 28%|███████████████████████                                                            | 5596/20117 [3:30:11<9:18:20,  2.31s/it] 28%|███████████████████████                                                            | 5597/20117 [3:30:14<9:17:06,  2.30s/it] 28%|███████████████████████                                                            | 5598/20117 [3:30:16<9:12:19,  2.28s/it] 28%|███████████████████████                                                            | 5599/20117 [3:30:18<9:10:31,  2.28s/it] 28%|███████████████████████                                                            | 5600/20117 [3:30:20<9:07:12,  2.26s/it]                                                                                                                                 {'loss': 0.2494, 'grad_norm': 0.44201189279556274, 'learning_rate': 0.0001650125061303778, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 386.01, 'epoch': 0.56}
 28%|███████████████████████                                                            | 5600/20117 [3:30:20<9:07:12,  2.26s/it] 28%|███████████████████████                                                            | 5601/20117 [3:30:23<9:11:33,  2.28s/it] 28%|███████████████████████                                                            | 5602/20117 [3:30:25<9:09:14,  2.27s/it] 28%|███████████████████████                                                            | 5603/20117 [3:30:27<9:08:16,  2.27s/it] 28%|███████████████████████                                                            | 5604/20117 [3:30:30<9:08:57,  2.27s/it] 28%|███████████████████████▏                                                           | 5605/20117 [3:30:32<9:09:53,  2.27s/it] 28%|███████████████████████▏                                                           | 5606/20117 [3:30:34<9:11:49,  2.28s/it] 28%|███████████████████████▏                                                           | 5607/20117 [3:30:36<9:10:48,  2.28s/it] 28%|███████████████████████▏                                                           | 5608/20117 [3:30:39<9:09:22,  2.27s/it] 28%|███████████████████████▏                                                           | 5609/20117 [3:30:41<9:10:38,  2.28s/it] 28%|███████████████████████▏                                                           | 5610/20117 [3:30:43<9:09:18,  2.27s/it]                                                                                                                                 {'loss': 0.2881, 'grad_norm': 0.5484967827796936, 'learning_rate': 0.00016489317409311717, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.21, 'epoch': 0.56}
 28%|███████████████████████▏                                                           | 5610/20117 [3:30:43<9:09:18,  2.27s/it] 28%|███████████████████████▏                                                           | 5611/20117 [3:30:46<9:10:37,  2.28s/it] 28%|███████████████████████▏                                                           | 5612/20117 [3:30:48<9:06:38,  2.26s/it] 28%|███████████████████████▏                                                           | 5613/20117 [3:30:50<9:05:06,  2.26s/it] 28%|███████████████████████▏                                                           | 5614/20117 [3:30:52<9:12:13,  2.28s/it] 28%|███████████████████████▏                                                           | 5615/20117 [3:30:55<9:10:55,  2.28s/it] 28%|███████████████████████▏                                                           | 5616/20117 [3:30:57<9:10:22,  2.28s/it] 28%|███████████████████████▏                                                           | 5617/20117 [3:30:59<9:14:30,  2.29s/it] 28%|███████████████████████▏                                                           | 5618/20117 [3:31:02<9:16:51,  2.30s/it] 28%|███████████████████████▏                                                           | 5619/20117 [3:31:04<9:20:36,  2.32s/it] 28%|███████████████████████▏                                                           | 5620/20117 [3:31:06<9:18:20,  2.31s/it]                                                                                                                                 {'loss': 0.2377, 'grad_norm': 0.45674219727516174, 'learning_rate': 0.00016477368221025333, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 365.25, 'epoch': 0.56}
 28%|███████████████████████▏                                                           | 5620/20117 [3:31:06<9:18:20,  2.31s/it] 28%|███████████████████████▏                                                           | 5621/20117 [3:31:08<9:18:06,  2.31s/it] 28%|███████████████████████▏                                                           | 5622/20117 [3:31:11<9:15:22,  2.30s/it] 28%|███████████████████████▏                                                           | 5623/20117 [3:31:13<9:18:17,  2.31s/it] 28%|███████████████████████▏                                                           | 5624/20117 [3:31:15<9:17:30,  2.31s/it] 28%|███████████████████████▏                                                           | 5625/20117 [3:31:18<9:19:12,  2.32s/it] 28%|███████████████████████▏                                                           | 5626/20117 [3:31:20<9:15:25,  2.30s/it] 28%|███████████████████████▏                                                           | 5627/20117 [3:31:22<9:12:56,  2.29s/it] 28%|███████████████████████▏                                                           | 5628/20117 [3:31:24<9:07:48,  2.27s/it] 28%|███████████████████████▏                                                           | 5629/20117 [3:31:27<9:09:58,  2.28s/it] 28%|███████████████████████▏                                                           | 5630/20117 [3:31:29<9:10:03,  2.28s/it]                                                                                                                                 {'loss': 0.2251, 'grad_norm': 0.3383237421512604, 'learning_rate': 0.00016465403077612001, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.99, 'epoch': 0.56}
 28%|███████████████████████▏                                                           | 5630/20117 [3:31:29<9:10:03,  2.28s/it] 28%|███████████████████████▏                                                           | 5631/20117 [3:31:31<9:06:02,  2.26s/it] 28%|███████████████████████▏                                                           | 5632/20117 [3:31:34<9:03:32,  2.25s/it] 28%|███████████████████████▏                                                           | 5633/20117 [3:31:36<9:05:20,  2.26s/it] 28%|███████████████████████▏                                                           | 5634/20117 [3:31:38<9:03:41,  2.25s/it] 28%|███████████████████████▏                                                           | 5635/20117 [3:31:40<9:06:21,  2.26s/it] 28%|███████████████████████▎                                                           | 5636/20117 [3:31:43<9:06:41,  2.27s/it] 28%|███████████████████████▎                                                           | 5637/20117 [3:31:45<9:03:10,  2.25s/it] 28%|███████████████████████▎                                                           | 5638/20117 [3:31:47<9:34:34,  2.38s/it] 28%|███████████████████████▎                                                           | 5639/20117 [3:31:50<9:23:34,  2.34s/it] 28%|███████████████████████▎                                                           | 5640/20117 [3:31:52<9:15:04,  2.30s/it]                                                                                                                                 {'loss': 0.2279, 'grad_norm': 0.3748398423194885, 'learning_rate': 0.00016453422008544388, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.12, 'epoch': 0.56}
 28%|███████████████████████▎                                                           | 5640/20117 [3:31:52<9:15:04,  2.30s/it] 28%|███████████████████████▎                                                           | 5641/20117 [3:31:54<9:13:07,  2.29s/it] 28%|███████████████████████▎                                                           | 5642/20117 [3:31:56<9:10:20,  2.28s/it] 28%|███████████████████████▎                                                           | 5643/20117 [3:31:59<9:10:38,  2.28s/it] 28%|███████████████████████▎                                                           | 5644/20117 [3:32:01<9:14:12,  2.30s/it] 28%|███████████████████████▎                                                           | 5645/20117 [3:32:03<9:11:02,  2.28s/it] 28%|███████████████████████▎                                                           | 5646/20117 [3:32:06<9:11:24,  2.29s/it] 28%|███████████████████████▎                                                           | 5647/20117 [3:32:08<9:07:24,  2.27s/it] 28%|███████████████████████▎                                                           | 5648/20117 [3:32:10<9:05:47,  2.26s/it] 28%|███████████████████████▎                                                           | 5649/20117 [3:32:12<9:06:26,  2.27s/it] 28%|███████████████████████▎                                                           | 5650/20117 [3:32:15<9:08:27,  2.27s/it]                                                                                                                                 {'loss': 0.261, 'grad_norm': 0.4504337012767792, 'learning_rate': 0.00016441425043334413, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.53, 'epoch': 0.56}
 28%|███████████████████████▎                                                           | 5650/20117 [3:32:15<9:08:27,  2.27s/it] 28%|███████████████████████▎                                                           | 5651/20117 [3:32:17<9:08:19,  2.27s/it] 28%|███████████████████████▎                                                           | 5652/20117 [3:32:19<9:04:54,  2.26s/it] 28%|███████████████████████▎                                                           | 5653/20117 [3:32:21<9:01:32,  2.25s/it] 28%|███████████████████████▎                                                           | 5654/20117 [3:32:24<9:02:17,  2.25s/it] 28%|███████████████████████▎                                                           | 5655/20117 [3:32:26<9:05:13,  2.26s/it] 28%|███████████████████████▎                                                           | 5656/20117 [3:32:28<9:07:44,  2.27s/it] 28%|███████████████████████▎                                                           | 5657/20117 [3:32:31<9:11:23,  2.29s/it] 28%|███████████████████████▎                                                           | 5658/20117 [3:32:33<9:10:47,  2.29s/it] 28%|███████████████████████▎                                                           | 5659/20117 [3:32:35<9:16:36,  2.31s/it] 28%|███████████████████████▎                                                           | 5660/20117 [3:32:37<9:15:30,  2.31s/it]                                                                                                                                 {'loss': 0.1855, 'grad_norm': 0.3367341458797455, 'learning_rate': 0.00016429412211533127, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.1, 'epoch': 0.56}
 28%|███████████████████████▎                                                           | 5660/20117 [3:32:37<9:15:30,  2.31s/it] 28%|███████████████████████▎                                                           | 5661/20117 [3:32:40<9:13:26,  2.30s/it] 28%|███████████████████████▎                                                           | 5662/20117 [3:32:42<9:18:12,  2.32s/it] 28%|███████████████████████▎                                                           | 5663/20117 [3:32:45<9:22:46,  2.34s/it] 28%|███████████████████████▎                                                           | 5664/20117 [3:32:47<9:20:02,  2.32s/it] 28%|███████████████████████▎                                                           | 5665/20117 [3:32:49<9:14:26,  2.30s/it] 28%|███████████████████████▍                                                           | 5666/20117 [3:32:51<9:11:55,  2.29s/it] 28%|███████████████████████▍                                                           | 5667/20117 [3:32:54<9:08:00,  2.28s/it] 28%|███████████████████████▍                                                           | 5668/20117 [3:32:56<9:22:42,  2.34s/it] 28%|███████████████████████▍                                                           | 5669/20117 [3:32:58<9:18:20,  2.32s/it] 28%|███████████████████████▍                                                           | 5670/20117 [3:33:01<9:14:32,  2.30s/it]                                                                                                                                 {'loss': 0.2428, 'grad_norm': 0.3390282988548279, 'learning_rate': 0.00016417383542730675, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.32, 'epoch': 0.56}
 28%|███████████████████████▍                                                           | 5670/20117 [3:33:01<9:14:32,  2.30s/it] 28%|███████████████████████▍                                                           | 5671/20117 [3:33:03<9:19:39,  2.32s/it] 28%|███████████████████████▍                                                           | 5672/20117 [3:33:05<9:18:26,  2.32s/it] 28%|███████████████████████▍                                                           | 5673/20117 [3:33:08<9:14:24,  2.30s/it] 28%|███████████████████████▍                                                           | 5674/20117 [3:33:10<9:13:02,  2.30s/it] 28%|███████████████████████▍                                                           | 5675/20117 [3:33:12<9:10:58,  2.29s/it] 28%|███████████████████████▍                                                           | 5676/20117 [3:33:14<9:13:31,  2.30s/it] 28%|███████████████████████▍                                                           | 5677/20117 [3:33:17<9:14:41,  2.30s/it] 28%|███████████████████████▍                                                           | 5678/20117 [3:33:19<9:25:57,  2.35s/it] 28%|███████████████████████▍                                                           | 5679/20117 [3:33:21<9:20:27,  2.33s/it] 28%|███████████████████████▍                                                           | 5680/20117 [3:33:24<9:16:05,  2.31s/it]                                                                                                                                 {'loss': 0.3651, 'grad_norm': 0.6450189352035522, 'learning_rate': 0.00016405339066556212, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 427.22, 'epoch': 0.56}
 28%|███████████████████████▍                                                           | 5680/20117 [3:33:24<9:16:05,  2.31s/it] 28%|███████████████████████▍                                                           | 5681/20117 [3:33:26<9:14:32,  2.30s/it] 28%|███████████████████████▍                                                           | 5682/20117 [3:33:28<9:19:06,  2.32s/it] 28%|███████████████████████▍                                                           | 5683/20117 [3:33:31<9:14:09,  2.30s/it] 28%|███████████████████████▍                                                           | 5684/20117 [3:33:33<9:09:51,  2.29s/it] 28%|███████████████████████▍                                                           | 5685/20117 [3:33:35<9:08:39,  2.28s/it] 28%|███████████████████████▍                                                           | 5686/20117 [3:33:37<9:06:03,  2.27s/it] 28%|███████████████████████▍                                                           | 5687/20117 [3:33:40<9:07:25,  2.28s/it] 28%|███████████████████████▍                                                           | 5688/20117 [3:33:42<9:06:09,  2.27s/it] 28%|███████████████████████▍                                                           | 5689/20117 [3:33:44<9:04:48,  2.27s/it] 28%|███████████████████████▍                                                           | 5690/20117 [3:33:47<9:21:50,  2.34s/it]                                                                                                                                 {'loss': 0.2204, 'grad_norm': 0.5655054450035095, 'learning_rate': 0.0001639327881267783, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 310.84, 'epoch': 0.57}
 28%|███████████████████████▍                                                           | 5690/20117 [3:33:47<9:21:50,  2.34s/it] 28%|███████████████████████▍                                                           | 5691/20117 [3:33:49<9:11:13,  2.29s/it] 28%|███████████████████████▍                                                           | 5692/20117 [3:33:51<9:02:15,  2.26s/it] 28%|███████████████████████▍                                                           | 5693/20117 [3:33:53<8:56:57,  2.23s/it] 28%|███████████████████████▍                                                           | 5694/20117 [3:33:55<8:56:18,  2.23s/it] 28%|███████████████████████▍                                                           | 5695/20117 [3:33:58<8:54:33,  2.22s/it] 28%|███████████████████████▌                                                           | 5696/20117 [3:34:00<8:58:47,  2.24s/it] 28%|███████████████████████▌                                                           | 5697/20117 [3:34:02<8:54:17,  2.22s/it] 28%|███████████████████████▌                                                           | 5698/20117 [3:34:04<9:01:17,  2.25s/it] 28%|███████████████████████▌                                                           | 5699/20117 [3:34:07<9:03:00,  2.26s/it] 28%|███████████████████████▌                                                           | 5700/20117 [3:34:09<9:03:27,  2.26s/it]                                                                                                                                 {'loss': 0.2294, 'grad_norm': 0.4713475704193115, 'learning_rate': 0.00016381202810802483, 'memory/max_active (GiB)': 21.47, 'memory/max_allocated (GiB)': 21.47, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.21, 'epoch': 0.57}
 28%|███████████████████████▌                                                           | 5700/20117 [3:34:09<9:03:27,  2.26s/it] 28%|███████████████████████▌                                                           | 5701/20117 [3:34:11<9:04:33,  2.27s/it] 28%|███████████████████████▌                                                           | 5702/20117 [3:34:14<9:05:24,  2.27s/it] 28%|███████████████████████▌                                                           | 5703/20117 [3:34:16<9:08:59,  2.29s/it] 28%|███████████████████████▌                                                           | 5704/20117 [3:34:18<9:12:01,  2.30s/it] 28%|███████████████████████▌                                                           | 5705/20117 [3:34:20<9:10:00,  2.29s/it] 28%|███████████████████████▌                                                           | 5706/20117 [3:34:23<9:09:12,  2.29s/it] 28%|███████████████████████▌                                                           | 5707/20117 [3:34:25<9:13:36,  2.31s/it] 28%|███████████████████████▌                                                           | 5708/20117 [3:34:27<9:09:58,  2.29s/it] 28%|███████████████████████▌                                                           | 5709/20117 [3:34:30<9:08:01,  2.28s/it] 28%|███████████████████████▌                                                           | 5710/20117 [3:34:32<9:00:20,  2.25s/it]                                                                                                                                 {'loss': 0.2522, 'grad_norm': 0.19377703964710236, 'learning_rate': 0.00016369111090675916, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 412.63, 'epoch': 0.57}
 28%|███████████████████████▌                                                           | 5710/20117 [3:34:32<9:00:20,  2.25s/it] 28%|███████████████████████▌                                                           | 5711/20117 [3:34:34<9:03:21,  2.26s/it] 28%|███████████████████████▌                                                           | 5712/20117 [3:34:36<9:07:07,  2.28s/it] 28%|███████████████████████▌                                                           | 5713/20117 [3:34:39<9:07:32,  2.28s/it] 28%|███████████████████████▌                                                           | 5714/20117 [3:34:41<9:09:34,  2.29s/it] 28%|███████████████████████▌                                                           | 5715/20117 [3:34:43<9:07:32,  2.28s/it] 28%|███████████████████████▌                                                           | 5716/20117 [3:34:46<9:07:03,  2.28s/it] 28%|███████████████████████▌                                                           | 5717/20117 [3:34:48<9:05:10,  2.27s/it] 28%|███████████████████████▌                                                           | 5718/20117 [3:34:50<9:08:05,  2.28s/it] 28%|███████████████████████▌                                                           | 5719/20117 [3:34:52<9:07:12,  2.28s/it] 28%|███████████████████████▌                                                           | 5720/20117 [3:34:55<9:06:37,  2.28s/it]                                                                                                                                 {'loss': 0.3132, 'grad_norm': 0.5475621819496155, 'learning_rate': 0.0001635700368208259, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 367.47, 'epoch': 0.57}
 28%|███████████████████████▌                                                           | 5720/20117 [3:34:55<9:06:37,  2.28s/it] 28%|███████████████████████▌                                                           | 5721/20117 [3:34:57<9:07:10,  2.28s/it] 28%|███████████████████████▌                                                           | 5722/20117 [3:34:59<9:03:54,  2.27s/it] 28%|███████████████████████▌                                                           | 5723/20117 [3:35:01<9:06:01,  2.28s/it] 28%|███████████████████████▌                                                           | 5724/20117 [3:35:04<9:10:13,  2.29s/it] 28%|███████████████████████▌                                                           | 5725/20117 [3:35:06<9:10:04,  2.29s/it] 28%|███████████████████████▌                                                           | 5726/20117 [3:35:08<9:08:26,  2.29s/it] 28%|███████████████████████▋                                                           | 5727/20117 [3:35:11<9:03:59,  2.27s/it] 28%|███████████████████████▋                                                           | 5728/20117 [3:35:13<9:03:17,  2.27s/it] 28%|███████████████████████▋                                                           | 5729/20117 [3:35:15<9:07:50,  2.28s/it] 28%|███████████████████████▋                                                           | 5730/20117 [3:35:18<9:09:45,  2.29s/it]                                                                                                                                 {'loss': 0.2623, 'grad_norm': 0.3860287368297577, 'learning_rate': 0.00016344880614845608, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.97, 'epoch': 0.57}
 28%|███████████████████████▋                                                           | 5730/20117 [3:35:18<9:09:45,  2.29s/it] 28%|███████████████████████▋                                                           | 5731/20117 [3:35:20<9:05:37,  2.28s/it] 28%|███████████████████████▋                                                           | 5732/20117 [3:35:22<9:05:24,  2.27s/it] 28%|███████████████████████▋                                                           | 5733/20117 [3:35:24<9:03:57,  2.27s/it] 29%|███████████████████████▋                                                           | 5734/20117 [3:35:27<9:05:36,  2.28s/it] 29%|███████████████████████▋                                                           | 5735/20117 [3:35:29<9:12:28,  2.30s/it] 29%|███████████████████████▋                                                           | 5736/20117 [3:35:31<9:04:59,  2.27s/it] 29%|███████████████████████▋                                                           | 5737/20117 [3:35:33<9:07:13,  2.28s/it] 29%|███████████████████████▋                                                           | 5738/20117 [3:35:36<9:07:50,  2.29s/it] 29%|███████████████████████▋                                                           | 5739/20117 [3:35:38<9:06:33,  2.28s/it] 29%|███████████████████████▋                                                           | 5740/20117 [3:35:40<9:06:44,  2.28s/it]                                                                                                                                 {'loss': 0.2365, 'grad_norm': 0.3809770345687866, 'learning_rate': 0.00016332741918826654, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 326.73, 'epoch': 0.57}
 29%|███████████████████████▋                                                           | 5740/20117 [3:35:40<9:06:44,  2.28s/it] 29%|███████████████████████▋                                                           | 5741/20117 [3:35:43<9:09:31,  2.29s/it] 29%|███████████████████████▋                                                           | 5742/20117 [3:35:45<9:36:53,  2.41s/it] 29%|███████████████████████▋                                                           | 5743/20117 [3:35:48<9:28:16,  2.37s/it] 29%|███████████████████████▋                                                           | 5744/20117 [3:35:50<9:22:44,  2.35s/it] 29%|███████████████████████▋                                                           | 5745/20117 [3:35:52<9:19:14,  2.33s/it] 29%|███████████████████████▋                                                           | 5746/20117 [3:35:54<9:17:57,  2.33s/it] 29%|███████████████████████▋                                                           | 5747/20117 [3:35:57<9:14:39,  2.32s/it] 29%|███████████████████████▋                                                           | 5748/20117 [3:35:59<9:09:20,  2.29s/it] 29%|███████████████████████▋                                                           | 5749/20117 [3:36:01<9:08:58,  2.29s/it] 29%|███████████████████████▋                                                           | 5750/20117 [3:36:04<9:09:59,  2.30s/it]                                                                                                                                 {'loss': 0.2661, 'grad_norm': 0.5006048083305359, 'learning_rate': 0.00016320587623925895, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.44, 'epoch': 0.57}
 29%|███████████████████████▋                                                           | 5750/20117 [3:36:04<9:09:59,  2.30s/it] 29%|███████████████████████▋                                                           | 5751/20117 [3:36:06<9:09:17,  2.29s/it] 29%|███████████████████████▋                                                           | 5752/20117 [3:36:08<9:07:25,  2.29s/it] 29%|███████████████████████▋                                                           | 5753/20117 [3:36:10<9:07:24,  2.29s/it] 29%|███████████████████████▋                                                           | 5754/20117 [3:36:13<9:07:43,  2.29s/it] 29%|███████████████████████▋                                                           | 5755/20117 [3:36:15<9:03:51,  2.27s/it] 29%|███████████████████████▋                                                           | 5756/20117 [3:36:17<9:02:37,  2.27s/it] 29%|███████████████████████▊                                                           | 5757/20117 [3:36:20<9:04:03,  2.27s/it] 29%|███████████████████████▊                                                           | 5758/20117 [3:36:22<9:03:04,  2.27s/it] 29%|███████████████████████▊                                                           | 5759/20117 [3:36:24<9:09:01,  2.29s/it] 29%|███████████████████████▊                                                           | 5760/20117 [3:36:26<9:06:50,  2.29s/it]                                                                                                                                 {'loss': 0.1923, 'grad_norm': 0.2004530131816864, 'learning_rate': 0.00016308417760081936, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.56, 'epoch': 0.57}
 29%|███████████████████████▊                                                           | 5760/20117 [3:36:26<9:06:50,  2.29s/it] 29%|███████████████████████▊                                                           | 5761/20117 [3:36:29<9:10:16,  2.30s/it] 29%|███████████████████████▊                                                           | 5762/20117 [3:36:31<9:12:47,  2.31s/it] 29%|███████████████████████▊                                                           | 5763/20117 [3:36:33<9:11:37,  2.31s/it] 29%|███████████████████████▊                                                           | 5764/20117 [3:36:36<9:13:42,  2.31s/it] 29%|███████████████████████▊                                                           | 5765/20117 [3:36:38<9:20:09,  2.34s/it] 29%|███████████████████████▊                                                           | 5766/20117 [3:36:40<9:14:49,  2.32s/it] 29%|███████████████████████▊                                                           | 5767/20117 [3:36:43<9:13:15,  2.31s/it] 29%|███████████████████████▊                                                           | 5768/20117 [3:36:45<9:12:32,  2.31s/it] 29%|███████████████████████▊                                                           | 5769/20117 [3:36:47<9:11:00,  2.30s/it] 29%|███████████████████████▊                                                           | 5770/20117 [3:36:50<9:12:03,  2.31s/it]                                                                                                                                 {'loss': 0.2089, 'grad_norm': 0.30563804507255554, 'learning_rate': 0.00016296232357271718, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.29, 'epoch': 0.57}
 29%|███████████████████████▊                                                           | 5770/20117 [3:36:50<9:12:03,  2.31s/it] 29%|███████████████████████▊                                                           | 5771/20117 [3:36:52<9:07:43,  2.29s/it] 29%|███████████████████████▊                                                           | 5772/20117 [3:36:54<9:04:18,  2.28s/it] 29%|███████████████████████▊                                                           | 5773/20117 [3:36:56<9:03:54,  2.28s/it] 29%|███████████████████████▊                                                           | 5774/20117 [3:36:59<9:04:42,  2.28s/it] 29%|███████████████████████▊                                                           | 5775/20117 [3:37:01<9:02:16,  2.27s/it] 29%|███████████████████████▊                                                           | 5776/20117 [3:37:03<9:05:14,  2.28s/it] 29%|███████████████████████▊                                                           | 5777/20117 [3:37:05<9:06:37,  2.29s/it] 29%|███████████████████████▊                                                           | 5778/20117 [3:37:08<9:04:18,  2.28s/it] 29%|███████████████████████▊                                                           | 5779/20117 [3:37:10<9:06:09,  2.29s/it] 29%|███████████████████████▊                                                           | 5780/20117 [3:37:12<9:14:57,  2.32s/it]                                                                                                                                 {'loss': 0.1931, 'grad_norm': 0.3142051100730896, 'learning_rate': 0.00016284031445510465, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.41, 'epoch': 0.57}
 29%|███████████████████████▊                                                           | 5780/20117 [3:37:12<9:14:57,  2.32s/it] 29%|███████████████████████▊                                                           | 5781/20117 [3:37:15<9:15:02,  2.32s/it] 29%|███████████████████████▊                                                           | 5782/20117 [3:37:17<9:14:54,  2.32s/it] 29%|███████████████████████▊                                                           | 5783/20117 [3:37:19<9:15:18,  2.32s/it] 29%|███████████████████████▊                                                           | 5784/20117 [3:37:22<9:13:21,  2.32s/it] 29%|███████████████████████▊                                                           | 5785/20117 [3:37:24<9:14:39,  2.32s/it] 29%|███████████████████████▊                                                           | 5786/20117 [3:37:26<9:12:42,  2.31s/it] 29%|███████████████████████▉                                                           | 5787/20117 [3:37:29<9:09:51,  2.30s/it] 29%|███████████████████████▉                                                           | 5788/20117 [3:37:31<9:15:08,  2.32s/it] 29%|███████████████████████▉                                                           | 5789/20117 [3:37:33<9:11:21,  2.31s/it] 29%|███████████████████████▉                                                           | 5790/20117 [3:37:36<9:11:14,  2.31s/it]                                                                                                                                 {'loss': 0.3011, 'grad_norm': 0.4678092300891876, 'learning_rate': 0.000162718150548516, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 367.12, 'epoch': 0.58}
 29%|███████████████████████▉                                                           | 5790/20117 [3:37:36<9:11:14,  2.31s/it] 29%|███████████████████████▉                                                           | 5791/20117 [3:37:38<9:10:22,  2.31s/it] 29%|███████████████████████▉                                                           | 5792/20117 [3:37:40<9:11:20,  2.31s/it] 29%|███████████████████████▉                                                           | 5793/20117 [3:37:43<9:28:18,  2.38s/it] 29%|███████████████████████▉                                                           | 5794/20117 [3:37:45<9:20:58,  2.35s/it] 29%|███████████████████████▉                                                           | 5795/20117 [3:37:47<9:14:02,  2.32s/it] 29%|███████████████████████▉                                                           | 5796/20117 [3:37:50<9:12:53,  2.32s/it] 29%|███████████████████████▉                                                           | 5797/20117 [3:37:52<9:11:01,  2.31s/it] 29%|███████████████████████▉                                                           | 5798/20117 [3:37:54<9:15:40,  2.33s/it] 29%|███████████████████████▉                                                           | 5799/20117 [3:37:57<9:12:31,  2.32s/it] 29%|███████████████████████▉                                                           | 5800/20117 [3:37:59<9:11:30,  2.31s/it]                                                                                                                                 {'loss': 0.2855, 'grad_norm': 0.44859957695007324, 'learning_rate': 0.00016259583215386675, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.51, 'epoch': 0.58}
 29%|███████████████████████▉                                                           | 5800/20117 [3:37:59<9:11:30,  2.31s/it] 29%|███████████████████████▉                                                           | 5801/20117 [3:38:01<9:08:21,  2.30s/it] 29%|███████████████████████▉                                                           | 5802/20117 [3:38:03<9:05:58,  2.29s/it] 29%|███████████████████████▉                                                           | 5803/20117 [3:38:06<9:05:52,  2.29s/it] 29%|███████████████████████▉                                                           | 5804/20117 [3:38:08<9:03:32,  2.28s/it] 29%|███████████████████████▉                                                           | 5805/20117 [3:38:10<9:06:33,  2.29s/it] 29%|███████████████████████▉                                                           | 5806/20117 [3:38:13<9:04:46,  2.28s/it] 29%|███████████████████████▉                                                           | 5807/20117 [3:38:15<9:04:18,  2.28s/it] 29%|███████████████████████▉                                                           | 5808/20117 [3:38:17<9:03:42,  2.28s/it] 29%|███████████████████████▉                                                           | 5809/20117 [3:38:19<9:03:06,  2.28s/it] 29%|███████████████████████▉                                                           | 5810/20117 [3:38:22<9:03:11,  2.28s/it]                                                                                                                                 {'loss': 0.2181, 'grad_norm': 0.3087036609649658, 'learning_rate': 0.00016247335957245303, 'memory/max_active (GiB)': 18.21, 'memory/max_allocated (GiB)': 18.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 292.23, 'epoch': 0.58}
 29%|███████████████████████▉                                                           | 5810/20117 [3:38:22<9:03:11,  2.28s/it] 29%|███████████████████████▉                                                           | 5811/20117 [3:38:24<9:03:31,  2.28s/it] 29%|███████████████████████▉                                                           | 5812/20117 [3:38:26<9:10:32,  2.31s/it] 29%|███████████████████████▉                                                           | 5813/20117 [3:38:29<9:08:47,  2.30s/it] 29%|███████████████████████▉                                                           | 5814/20117 [3:38:31<9:09:31,  2.31s/it] 29%|███████████████████████▉                                                           | 5815/20117 [3:38:33<9:10:05,  2.31s/it] 29%|███████████████████████▉                                                           | 5816/20117 [3:38:36<9:23:32,  2.36s/it] 29%|████████████████████████                                                           | 5817/20117 [3:38:38<9:26:41,  2.38s/it] 29%|████████████████████████                                                           | 5818/20117 [3:38:40<9:26:44,  2.38s/it] 29%|████████████████████████                                                           | 5819/20117 [3:38:43<9:22:22,  2.36s/it] 29%|████████████████████████                                                           | 5820/20117 [3:38:45<9:21:37,  2.36s/it]                                                                                                                                 {'loss': 0.203, 'grad_norm': 0.29946303367614746, 'learning_rate': 0.00016235073310595058, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 294.26, 'epoch': 0.58}
 29%|████████████████████████                                                           | 5820/20117 [3:38:45<9:21:37,  2.36s/it] 29%|████████████████████████                                                           | 5821/20117 [3:38:47<9:16:44,  2.34s/it] 29%|████████████████████████                                                           | 5822/20117 [3:38:50<9:17:52,  2.34s/it] 29%|████████████████████████                                                           | 5823/20117 [3:38:52<9:17:24,  2.34s/it] 29%|████████████████████████                                                           | 5824/20117 [3:38:54<9:15:44,  2.33s/it] 29%|████████████████████████                                                           | 5825/20117 [3:38:57<9:06:53,  2.30s/it] 29%|████████████████████████                                                           | 5826/20117 [3:38:59<9:03:26,  2.28s/it] 29%|████████████████████████                                                           | 5827/20117 [3:39:01<9:03:39,  2.28s/it] 29%|████████████████████████                                                           | 5828/20117 [3:39:04<9:06:53,  2.30s/it] 29%|████████████████████████                                                           | 5829/20117 [3:39:06<9:11:21,  2.32s/it] 29%|████████████████████████                                                           | 5830/20117 [3:39:08<9:11:19,  2.32s/it]                                                                                                                                 {'loss': 0.2186, 'grad_norm': 0.41978803277015686, 'learning_rate': 0.0001622279530564144, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 332.7, 'epoch': 0.58}
 29%|████████████████████████                                                           | 5830/20117 [3:39:08<9:11:19,  2.32s/it] 29%|████████████████████████                                                           | 5831/20117 [3:39:10<9:10:22,  2.31s/it] 29%|████████████████████████                                                           | 5832/20117 [3:39:13<9:10:23,  2.31s/it] 29%|████████████████████████                                                           | 5833/20117 [3:39:15<9:06:14,  2.29s/it] 29%|████████████████████████                                                           | 5834/20117 [3:39:17<9:05:29,  2.29s/it] 29%|████████████████████████                                                           | 5835/20117 [3:39:20<9:01:14,  2.27s/it] 29%|████████████████████████                                                           | 5836/20117 [3:39:22<9:02:14,  2.28s/it] 29%|████████████████████████                                                           | 5837/20117 [3:39:24<9:04:32,  2.29s/it] 29%|████████████████████████                                                           | 5838/20117 [3:39:26<9:02:28,  2.28s/it] 29%|████████████████████████                                                           | 5839/20117 [3:39:29<9:03:04,  2.28s/it] 29%|████████████████████████                                                           | 5840/20117 [3:39:31<9:02:20,  2.28s/it]                                                                                                                                 {'loss': 0.2726, 'grad_norm': 0.5330750942230225, 'learning_rate': 0.00016210501972627764, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 372.14, 'epoch': 0.58}
 29%|████████████████████████                                                           | 5840/20117 [3:39:31<9:02:20,  2.28s/it] 29%|████████████████████████                                                           | 5841/20117 [3:39:33<9:04:19,  2.29s/it] 29%|████████████████████████                                                           | 5842/20117 [3:39:36<9:03:17,  2.28s/it] 29%|████████████████████████                                                           | 5843/20117 [3:39:38<9:06:04,  2.30s/it] 29%|████████████████████████                                                           | 5844/20117 [3:39:40<9:04:05,  2.29s/it] 29%|████████████████████████                                                           | 5845/20117 [3:39:42<9:05:21,  2.29s/it] 29%|████████████████████████                                                           | 5846/20117 [3:39:45<9:30:40,  2.40s/it] 29%|████████████████████████                                                           | 5847/20117 [3:39:47<9:17:43,  2.35s/it] 29%|████████████████████████▏                                                          | 5848/20117 [3:39:50<9:16:04,  2.34s/it] 29%|████████████████████████▏                                                          | 5849/20117 [3:39:52<9:14:20,  2.33s/it] 29%|████████████████████████▏                                                          | 5850/20117 [3:39:54<9:14:26,  2.33s/it]                                                                                                                                 {'loss': 0.2121, 'grad_norm': 0.4443431794643402, 'learning_rate': 0.0001619819334183511, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 329.51, 'epoch': 0.58}
 29%|████████████████████████▏                                                          | 5850/20117 [3:39:54<9:14:26,  2.33s/it] 29%|████████████████████████▏                                                          | 5851/20117 [3:39:57<9:11:16,  2.32s/it] 29%|████████████████████████▏                                                          | 5852/20117 [3:39:59<9:07:11,  2.30s/it] 29%|████████████████████████▏                                                          | 5853/20117 [3:40:01<9:07:37,  2.30s/it] 29%|████████████████████████▏                                                          | 5854/20117 [3:40:03<9:04:41,  2.29s/it] 29%|████████████████████████▏                                                          | 5855/20117 [3:40:06<9:03:18,  2.29s/it] 29%|████████████████████████▏                                                          | 5856/20117 [3:40:08<9:03:44,  2.29s/it] 29%|████████████████████████▏                                                          | 5857/20117 [3:40:10<9:01:35,  2.28s/it] 29%|████████████████████████▏                                                          | 5858/20117 [3:40:12<8:59:58,  2.27s/it] 29%|████████████████████████▏                                                          | 5859/20117 [3:40:15<9:03:37,  2.29s/it] 29%|████████████████████████▏                                                          | 5860/20117 [3:40:17<9:02:48,  2.28s/it]                                                                                                                                 {'loss': 0.2845, 'grad_norm': 0.4602350890636444, 'learning_rate': 0.00016185869443582237, 'memory/max_active (GiB)': 18.85, 'memory/max_allocated (GiB)': 18.85, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.35, 'epoch': 0.58}
 29%|████████████████████████▏                                                          | 5860/20117 [3:40:17<9:02:48,  2.28s/it] 29%|████████████████████████▏                                                          | 5861/20117 [3:40:19<9:02:42,  2.28s/it] 29%|████████████████████████▏                                                          | 5862/20117 [3:40:22<9:08:00,  2.31s/it] 29%|████████████████████████▏                                                          | 5863/20117 [3:40:24<9:06:43,  2.30s/it] 29%|████████████████████████▏                                                          | 5864/20117 [3:40:26<9:05:45,  2.30s/it] 29%|████████████████████████▏                                                          | 5865/20117 [3:40:29<9:08:36,  2.31s/it] 29%|████████████████████████▏                                                          | 5866/20117 [3:40:31<9:12:29,  2.33s/it] 29%|████████████████████████▏                                                          | 5867/20117 [3:40:33<9:08:30,  2.31s/it] 29%|████████████████████████▏                                                          | 5868/20117 [3:40:36<9:06:08,  2.30s/it] 29%|████████████████████████▏                                                          | 5869/20117 [3:40:38<9:10:38,  2.32s/it] 29%|████████████████████████▏                                                          | 5870/20117 [3:40:40<9:09:40,  2.31s/it]                                                                                                                                 {'loss': 0.1812, 'grad_norm': 0.2751925587654114, 'learning_rate': 0.00016173530308225513, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 397.13, 'epoch': 0.58}
 29%|████████████████████████▏                                                          | 5870/20117 [3:40:40<9:09:40,  2.31s/it] 29%|████████████████████████▏                                                          | 5871/20117 [3:40:42<9:03:35,  2.29s/it] 29%|████████████████████████▏                                                          | 5872/20117 [3:40:45<9:05:04,  2.30s/it] 29%|████████████████████████▏                                                          | 5873/20117 [3:40:47<9:03:17,  2.29s/it] 29%|████████████████████████▏                                                          | 5874/20117 [3:40:49<9:03:56,  2.29s/it] 29%|████████████████████████▏                                                          | 5875/20117 [3:40:52<9:01:07,  2.28s/it] 29%|████████████████████████▏                                                          | 5876/20117 [3:40:54<9:02:11,  2.28s/it] 29%|████████████████████████▏                                                          | 5877/20117 [3:40:56<9:01:59,  2.28s/it] 29%|████████████████████████▎                                                          | 5878/20117 [3:40:58<9:02:36,  2.29s/it] 29%|████████████████████████▎                                                          | 5879/20117 [3:41:01<9:06:58,  2.30s/it] 29%|████████████████████████▎                                                          | 5880/20117 [3:41:03<9:08:44,  2.31s/it]                                                                                                                                 {'loss': 0.2088, 'grad_norm': 0.5334510803222656, 'learning_rate': 0.00016161175966158834, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 297.42, 'epoch': 0.58}
 29%|████████████████████████▎                                                          | 5880/20117 [3:41:03<9:08:44,  2.31s/it] 29%|████████████████████████▎                                                          | 5881/20117 [3:41:05<9:03:15,  2.29s/it] 29%|████████████████████████▎                                                          | 5882/20117 [3:41:08<8:53:07,  2.25s/it] 29%|████████████████████████▎                                                          | 5883/20117 [3:41:10<8:48:17,  2.23s/it] 29%|████████████████████████▎                                                          | 5884/20117 [3:41:12<8:44:40,  2.21s/it] 29%|████████████████████████▎                                                          | 5885/20117 [3:41:14<8:48:52,  2.23s/it] 29%|████████████████████████▎                                                          | 5886/20117 [3:41:16<8:48:15,  2.23s/it] 29%|████████████████████████▎                                                          | 5887/20117 [3:41:19<8:51:45,  2.24s/it] 29%|████████████████████████▎                                                          | 5888/20117 [3:41:21<8:52:04,  2.24s/it] 29%|████████████████████████▎                                                          | 5889/20117 [3:41:23<8:55:19,  2.26s/it] 29%|████████████████████████▎                                                          | 5890/20117 [3:41:26<9:03:59,  2.29s/it]                                                                                                                                 {'loss': 0.2197, 'grad_norm': 0.36908817291259766, 'learning_rate': 0.00016148806447813553, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 294.32, 'epoch': 0.59}
 29%|████████████████████████▎                                                          | 5890/20117 [3:41:26<9:03:59,  2.29s/it] 29%|████████████████████████▎                                                          | 5891/20117 [3:41:28<9:04:20,  2.30s/it] 29%|████████████████████████▎                                                          | 5892/20117 [3:41:30<9:03:39,  2.29s/it] 29%|████████████████████████▎                                                          | 5893/20117 [3:41:32<9:03:10,  2.29s/it] 29%|████████████████████████▎                                                          | 5894/20117 [3:41:35<9:02:52,  2.29s/it] 29%|████████████████████████▎                                                          | 5895/20117 [3:41:37<9:02:11,  2.29s/it] 29%|████████████████████████▎                                                          | 5896/20117 [3:41:39<9:05:04,  2.30s/it] 29%|████████████████████████▎                                                          | 5897/20117 [3:41:42<9:01:51,  2.29s/it] 29%|████████████████████████▎                                                          | 5898/20117 [3:41:44<9:00:18,  2.28s/it] 29%|████████████████████████▎                                                          | 5899/20117 [3:41:46<8:57:39,  2.27s/it] 29%|████████████████████████▎                                                          | 5900/20117 [3:41:49<9:15:08,  2.34s/it]                                                                                                                                 {'loss': 0.2757, 'grad_norm': 0.24384719133377075, 'learning_rate': 0.00016136421783658416, 'memory/max_active (GiB)': 18.81, 'memory/max_allocated (GiB)': 18.81, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 315.54, 'epoch': 0.59}
 29%|████████████████████████▎                                                          | 5900/20117 [3:41:49<9:15:08,  2.34s/it] 29%|████████████████████████▎                                                          | 5901/20117 [3:41:51<9:11:00,  2.33s/it] 29%|████████████████████████▎                                                          | 5902/20117 [3:41:53<9:06:44,  2.31s/it] 29%|████████████████████████▎                                                          | 5903/20117 [3:41:55<9:06:27,  2.31s/it] 29%|████████████████████████▎                                                          | 5904/20117 [3:41:58<9:07:32,  2.31s/it] 29%|████████████████████████▎                                                          | 5905/20117 [3:42:00<9:03:08,  2.29s/it] 29%|████████████████████████▎                                                          | 5906/20117 [3:42:02<8:59:34,  2.28s/it] 29%|████████████████████████▎                                                          | 5907/20117 [3:42:05<8:59:28,  2.28s/it] 29%|████████████████████████▍                                                          | 5908/20117 [3:42:07<9:01:26,  2.29s/it] 29%|████████████████████████▍                                                          | 5909/20117 [3:42:09<9:03:41,  2.30s/it] 29%|████████████████████████▍                                                          | 5910/20117 [3:42:12<9:05:29,  2.30s/it]                                                                                                                                 {'loss': 0.2858, 'grad_norm': 0.41709259152412415, 'learning_rate': 0.0001612402200419946, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 385.92, 'epoch': 0.59}
 29%|████████████████████████▍                                                          | 5910/20117 [3:42:12<9:05:29,  2.30s/it] 29%|████████████████████████▍                                                          | 5911/20117 [3:42:14<9:02:37,  2.29s/it] 29%|████████████████████████▍                                                          | 5912/20117 [3:42:16<9:01:44,  2.29s/it] 29%|████████████████████████▍                                                          | 5913/20117 [3:42:18<9:03:03,  2.29s/it] 29%|████████████████████████▍                                                          | 5914/20117 [3:42:21<9:03:17,  2.30s/it] 29%|████████████████████████▍                                                          | 5915/20117 [3:42:23<9:05:42,  2.31s/it] 29%|████████████████████████▍                                                          | 5916/20117 [3:42:25<9:03:14,  2.30s/it] 29%|████████████████████████▍                                                          | 5917/20117 [3:42:28<9:02:50,  2.29s/it] 29%|████████████████████████▍                                                          | 5918/20117 [3:42:30<9:02:13,  2.29s/it] 29%|████████████████████████▍                                                          | 5919/20117 [3:42:32<9:03:06,  2.30s/it] 29%|████████████████████████▍                                                          | 5920/20117 [3:42:34<9:00:10,  2.28s/it]                                                                                                                                 {'loss': 0.2347, 'grad_norm': 0.5187587738037109, 'learning_rate': 0.00016111607139979967, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.8, 'epoch': 0.59}
 29%|████████████████████████▍                                                          | 5920/20117 [3:42:34<9:00:10,  2.28s/it] 29%|████████████████████████▍                                                          | 5921/20117 [3:42:37<9:00:11,  2.28s/it] 29%|████████████████████████▍                                                          | 5922/20117 [3:42:39<9:02:34,  2.29s/it] 29%|████████████████████████▍                                                          | 5923/20117 [3:42:41<9:01:58,  2.29s/it] 29%|████████████████████████▍                                                          | 5924/20117 [3:42:44<8:59:05,  2.28s/it] 29%|████████████████████████▍                                                          | 5925/20117 [3:42:46<8:58:06,  2.27s/it] 29%|████████████████████████▍                                                          | 5926/20117 [3:42:48<8:58:44,  2.28s/it] 29%|████████████████████████▍                                                          | 5927/20117 [3:42:50<9:02:39,  2.29s/it] 29%|████████████████████████▍                                                          | 5928/20117 [3:42:53<8:57:07,  2.27s/it] 29%|████████████████████████▍                                                          | 5929/20117 [3:42:55<8:54:52,  2.26s/it] 29%|████████████████████████▍                                                          | 5930/20117 [3:42:57<9:00:28,  2.29s/it]                                                                                                                                 {'loss': 0.2563, 'grad_norm': 0.29502683877944946, 'learning_rate': 0.00016099177221580373, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.51, 'epoch': 0.59}
 29%|████████████████████████▍                                                          | 5930/20117 [3:42:57<9:00:28,  2.29s/it] 29%|████████████████████████▍                                                          | 5931/20117 [3:43:00<9:01:48,  2.29s/it] 29%|████████████████████████▍                                                          | 5932/20117 [3:43:02<9:04:57,  2.31s/it] 29%|████████████████████████▍                                                          | 5933/20117 [3:43:04<9:06:24,  2.31s/it] 29%|████████████████████████▍                                                          | 5934/20117 [3:43:06<9:01:10,  2.29s/it] 30%|████████████████████████▍                                                          | 5935/20117 [3:43:09<9:06:45,  2.31s/it] 30%|████████████████████████▍                                                          | 5936/20117 [3:43:11<9:07:08,  2.31s/it] 30%|████████████████████████▍                                                          | 5937/20117 [3:43:13<9:01:10,  2.29s/it] 30%|████████████████████████▍                                                          | 5938/20117 [3:43:16<9:00:02,  2.29s/it] 30%|████████████████████████▌                                                          | 5939/20117 [3:43:18<8:56:58,  2.27s/it] 30%|████████████████████████▌                                                          | 5940/20117 [3:43:20<8:59:32,  2.28s/it]                                                                                                                                 {'loss': 0.1982, 'grad_norm': 0.360385000705719, 'learning_rate': 0.00016086732279618188, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 315.07, 'epoch': 0.59}
 30%|████████████████████████▌                                                          | 5940/20117 [3:43:20<8:59:32,  2.28s/it] 30%|████████████████████████▌                                                          | 5941/20117 [3:43:22<8:56:53,  2.27s/it] 30%|████████████████████████▌                                                          | 5942/20117 [3:43:25<8:56:47,  2.27s/it] 30%|████████████████████████▌                                                          | 5943/20117 [3:43:27<8:55:26,  2.27s/it] 30%|████████████████████████▌                                                          | 5944/20117 [3:43:29<8:52:16,  2.25s/it] 30%|████████████████████████▌                                                          | 5945/20117 [3:43:31<8:51:18,  2.25s/it] 30%|████████████████████████▌                                                          | 5946/20117 [3:43:34<8:54:35,  2.26s/it] 30%|████████████████████████▌                                                          | 5947/20117 [3:43:36<8:54:18,  2.26s/it] 30%|████████████████████████▌                                                          | 5948/20117 [3:43:38<8:58:22,  2.28s/it] 30%|████████████████████████▌                                                          | 5949/20117 [3:43:41<9:02:44,  2.30s/it] 30%|████████████████████████▌                                                          | 5950/20117 [3:43:43<8:59:56,  2.29s/it]                                                                                                                                 {'loss': 0.2535, 'grad_norm': 0.35591524839401245, 'learning_rate': 0.0001607427234474794, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.0, 'epoch': 0.59}
 30%|████████████████████████▌                                                          | 5950/20117 [3:43:43<8:59:56,  2.29s/it] 30%|████████████████████████▌                                                          | 5951/20117 [3:43:45<9:01:04,  2.29s/it] 30%|████████████████████████▌                                                          | 5952/20117 [3:43:47<8:57:39,  2.28s/it] 30%|████████████████████████▌                                                          | 5953/20117 [3:43:50<8:54:51,  2.27s/it] 30%|████████████████████████▌                                                          | 5954/20117 [3:43:52<9:20:38,  2.38s/it] 30%|████████████████████████▌                                                          | 5955/20117 [3:43:55<9:13:51,  2.35s/it] 30%|████████████████████████▌                                                          | 5956/20117 [3:43:57<9:10:11,  2.33s/it] 30%|████████████████████████▌                                                          | 5957/20117 [3:43:59<9:06:33,  2.32s/it] 30%|████████████████████████▌                                                          | 5958/20117 [3:44:01<9:00:50,  2.29s/it] 30%|████████████████████████▌                                                          | 5959/20117 [3:44:04<9:01:01,  2.29s/it] 30%|████████████████████████▌                                                          | 5960/20117 [3:44:06<8:57:54,  2.28s/it]                                                                                                                                 {'loss': 0.3105, 'grad_norm': 0.49772289395332336, 'learning_rate': 0.0001606179744766108, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 403.62, 'epoch': 0.59}
 30%|████████████████████████▌                                                          | 5960/20117 [3:44:06<8:57:54,  2.28s/it] 30%|████████████████████████▌                                                          | 5961/20117 [3:44:08<8:54:51,  2.27s/it] 30%|████████████████████████▌                                                          | 5962/20117 [3:44:10<8:58:25,  2.28s/it] 30%|████████████████████████▌                                                          | 5963/20117 [3:44:13<8:57:13,  2.28s/it] 30%|████████████████████████▌                                                          | 5964/20117 [3:44:15<8:50:58,  2.25s/it] 30%|████████████████████████▌                                                          | 5965/20117 [3:44:17<8:49:29,  2.24s/it] 30%|████████████████████████▌                                                          | 5966/20117 [3:44:19<8:48:33,  2.24s/it] 30%|████████████████████████▌                                                          | 5967/20117 [3:44:22<8:49:32,  2.25s/it] 30%|████████████████████████▌                                                          | 5968/20117 [3:44:24<8:51:42,  2.25s/it] 30%|████████████████████████▋                                                          | 5969/20117 [3:44:26<8:51:03,  2.25s/it] 30%|████████████████████████▋                                                          | 5970/20117 [3:44:28<8:54:15,  2.27s/it]                                                                                                                                 {'loss': 0.2011, 'grad_norm': 0.3001823425292969, 'learning_rate': 0.00016049307619085915, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 289.33, 'epoch': 0.59}
 30%|████████████████████████▋                                                          | 5970/20117 [3:44:28<8:54:15,  2.27s/it] 30%|████████████████████████▋                                                          | 5971/20117 [3:44:31<8:58:34,  2.28s/it] 30%|████████████████████████▋                                                          | 5972/20117 [3:44:33<8:58:12,  2.28s/it] 30%|████████████████████████▋                                                          | 5973/20117 [3:44:35<8:58:08,  2.28s/it] 30%|████████████████████████▋                                                          | 5974/20117 [3:44:38<9:00:52,  2.29s/it] 30%|████████████████████████▋                                                          | 5975/20117 [3:44:40<9:03:18,  2.31s/it] 30%|████████████████████████▋                                                          | 5976/20117 [3:44:42<9:03:04,  2.30s/it] 30%|████████████████████████▋                                                          | 5977/20117 [3:44:45<9:02:10,  2.30s/it] 30%|████████████████████████▋                                                          | 5978/20117 [3:44:47<8:58:07,  2.28s/it] 30%|████████████████████████▋                                                          | 5979/20117 [3:44:49<8:57:58,  2.28s/it] 30%|████████████████████████▋                                                          | 5980/20117 [3:44:51<8:56:46,  2.28s/it]                                                                                                                                 {'loss': 0.2728, 'grad_norm': 0.593662679195404, 'learning_rate': 0.00016036802889787536, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 397.95, 'epoch': 0.59}
 30%|████████████████████████▋                                                          | 5980/20117 [3:44:51<8:56:46,  2.28s/it] 30%|████████████████████████▋                                                          | 5981/20117 [3:44:54<8:56:49,  2.28s/it] 30%|████████████████████████▋                                                          | 5982/20117 [3:44:56<8:55:16,  2.27s/it] 30%|████████████████████████▋                                                          | 5983/20117 [3:44:58<8:58:43,  2.29s/it] 30%|████████████████████████▋                                                          | 5984/20117 [3:45:01<9:03:26,  2.31s/it] 30%|████████████████████████▋                                                          | 5985/20117 [3:45:03<9:01:43,  2.30s/it] 30%|████████████████████████▋                                                          | 5986/20117 [3:45:05<8:58:46,  2.29s/it] 30%|████████████████████████▋                                                          | 5987/20117 [3:45:07<8:55:48,  2.28s/it] 30%|████████████████████████▋                                                          | 5988/20117 [3:45:10<8:57:42,  2.28s/it] 30%|████████████████████████▋                                                          | 5989/20117 [3:45:12<8:57:38,  2.28s/it] 30%|████████████████████████▋                                                          | 5990/20117 [3:45:14<8:58:11,  2.29s/it]                                                                                                                                 {'loss': 0.2016, 'grad_norm': 0.38041049242019653, 'learning_rate': 0.00016024283290567732, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 310.64, 'epoch': 0.6}
 30%|████████████████████████▋                                                          | 5990/20117 [3:45:14<8:58:11,  2.29s/it] 30%|████████████████████████▋                                                          | 5991/20117 [3:45:17<8:56:22,  2.28s/it] 30%|████████████████████████▋                                                          | 5992/20117 [3:45:19<8:58:31,  2.29s/it] 30%|████████████████████████▋                                                          | 5993/20117 [3:45:21<8:55:12,  2.27s/it] 30%|████████████████████████▋                                                          | 5994/20117 [3:45:23<8:52:36,  2.26s/it] 30%|████████████████████████▋                                                          | 5995/20117 [3:45:26<8:57:12,  2.28s/it] 30%|████████████████████████▋                                                          | 5996/20117 [3:45:28<8:53:19,  2.27s/it] 30%|████████████████████████▋                                                          | 5997/20117 [3:45:30<8:51:56,  2.26s/it] 30%|████████████████████████▋                                                          | 5998/20117 [3:45:32<8:53:25,  2.27s/it] 30%|████████████████████████▊                                                          | 5999/20117 [3:45:35<8:53:03,  2.27s/it] 30%|████████████████████████▊                                                          | 6000/20117 [3:45:37<8:51:17,  2.26s/it]                                                                                                                                 {'loss': 0.2, 'grad_norm': 0.28381025791168213, 'learning_rate': 0.0001601174885226492, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 368.07, 'epoch': 0.6}
 30%|████████████████████████▊                                                          | 6000/20117 [3:45:37<8:51:17,  2.26s/it] 30%|████████████████████████▊                                                          | 6001/20117 [3:45:39<8:52:38,  2.26s/it] 30%|████████████████████████▊                                                          | 6002/20117 [3:45:41<8:52:27,  2.26s/it] 30%|████████████████████████▊                                                          | 6003/20117 [3:45:44<8:55:36,  2.28s/it] 30%|████████████████████████▊                                                          | 6004/20117 [3:45:46<8:56:44,  2.28s/it] 30%|████████████████████████▊                                                          | 6005/20117 [3:45:49<9:06:55,  2.33s/it] 30%|████████████████████████▊                                                          | 6006/20117 [3:45:51<9:25:05,  2.40s/it] 30%|████████████████████████▊                                                          | 6007/20117 [3:45:53<9:16:12,  2.37s/it] 30%|████████████████████████▊                                                          | 6008/20117 [3:45:56<9:18:53,  2.38s/it] 30%|████████████████████████▊                                                          | 6009/20117 [3:45:58<9:13:53,  2.36s/it] 30%|████████████████████████▊                                                          | 6010/20117 [3:46:00<9:04:19,  2.32s/it]                                                                                                                                 {'loss': 0.2653, 'grad_norm': 0.4913434088230133, 'learning_rate': 0.0001599919960575407, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 365.97, 'epoch': 0.6}
 30%|████████████████████████▊                                                          | 6010/20117 [3:46:00<9:04:19,  2.32s/it] 30%|████████████████████████▊                                                          | 6011/20117 [3:46:03<9:00:45,  2.30s/it] 30%|████████████████████████▊                                                          | 6012/20117 [3:46:05<9:00:34,  2.30s/it] 30%|████████████████████████▊                                                          | 6013/20117 [3:46:07<8:55:57,  2.28s/it] 30%|████████████████████████▊                                                          | 6014/20117 [3:46:09<8:57:51,  2.29s/it] 30%|████████████████████████▊                                                          | 6015/20117 [3:46:12<8:55:00,  2.28s/it] 30%|████████████████████████▊                                                          | 6016/20117 [3:46:14<8:52:51,  2.27s/it] 30%|████████████████████████▊                                                          | 6017/20117 [3:46:16<8:50:30,  2.26s/it] 30%|████████████████████████▊                                                          | 6018/20117 [3:46:18<8:46:43,  2.24s/it] 30%|████████████████████████▊                                                          | 6019/20117 [3:46:21<8:45:42,  2.24s/it] 30%|████████████████████████▊                                                          | 6020/20117 [3:46:23<8:42:55,  2.23s/it]                                                                                                                                 {'loss': 0.2687, 'grad_norm': 0.4141637086868286, 'learning_rate': 0.00015986635581946638, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.69, 'epoch': 0.6}
 30%|████████████████████████▊                                                          | 6020/20117 [3:46:23<8:42:55,  2.23s/it] 30%|████████████████████████▊                                                          | 6021/20117 [3:46:25<8:45:11,  2.24s/it] 30%|████████████████████████▊                                                          | 6022/20117 [3:46:27<8:50:19,  2.26s/it] 30%|████████████████████████▊                                                          | 6023/20117 [3:46:30<8:50:42,  2.26s/it] 30%|████████████████████████▊                                                          | 6024/20117 [3:46:32<8:51:17,  2.26s/it] 30%|████████████████████████▊                                                          | 6025/20117 [3:46:34<8:50:41,  2.26s/it] 30%|████████████████████████▊                                                          | 6026/20117 [3:46:36<8:57:32,  2.29s/it] 30%|████████████████████████▊                                                          | 6027/20117 [3:46:39<9:00:21,  2.30s/it] 30%|████████████████████████▊                                                          | 6028/20117 [3:46:41<8:57:47,  2.29s/it] 30%|████████████████████████▊                                                          | 6029/20117 [3:46:43<8:54:13,  2.28s/it] 30%|████████████████████████▉                                                          | 6030/20117 [3:46:46<8:52:12,  2.27s/it]                                                                                                                                 {'loss': 0.2625, 'grad_norm': 0.45988166332244873, 'learning_rate': 0.00015974056811790462, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 402.4, 'epoch': 0.6}
 30%|████████████████████████▉                                                          | 6030/20117 [3:46:46<8:52:12,  2.27s/it] 30%|████████████████████████▉                                                          | 6031/20117 [3:46:48<8:50:46,  2.26s/it] 30%|████████████████████████▉                                                          | 6032/20117 [3:46:50<8:53:29,  2.27s/it] 30%|████████████████████████▉                                                          | 6033/20117 [3:46:52<8:48:49,  2.25s/it] 30%|████████████████████████▉                                                          | 6034/20117 [3:46:55<8:47:40,  2.25s/it] 30%|████████████████████████▉                                                          | 6035/20117 [3:46:57<8:48:51,  2.25s/it] 30%|████████████████████████▉                                                          | 6036/20117 [3:46:59<8:53:58,  2.28s/it] 30%|████████████████████████▉                                                          | 6037/20117 [3:47:01<8:52:34,  2.27s/it] 30%|████████████████████████▉                                                          | 6038/20117 [3:47:04<8:58:53,  2.30s/it] 30%|████████████████████████▉                                                          | 6039/20117 [3:47:06<9:00:16,  2.30s/it] 30%|████████████████████████▉                                                          | 6040/20117 [3:47:08<8:53:50,  2.28s/it]                                                                                                                                 {'loss': 0.2462, 'grad_norm': 0.5327023267745972, 'learning_rate': 0.0001596146332626971, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 382.41, 'epoch': 0.6}
 30%|████████████████████████▉                                                          | 6040/20117 [3:47:08<8:53:50,  2.28s/it] 30%|████████████████████████▉                                                          | 6041/20117 [3:47:11<8:55:04,  2.28s/it] 30%|████████████████████████▉                                                          | 6042/20117 [3:47:13<8:52:03,  2.27s/it] 30%|████████████████████████▉                                                          | 6043/20117 [3:47:15<8:57:11,  2.29s/it] 30%|████████████████████████▉                                                          | 6044/20117 [3:47:17<8:52:25,  2.27s/it] 30%|████████████████████████▉                                                          | 6045/20117 [3:47:20<8:55:02,  2.28s/it] 30%|████████████████████████▉                                                          | 6046/20117 [3:47:22<8:54:20,  2.28s/it] 30%|████████████████████████▉                                                          | 6047/20117 [3:47:24<8:51:42,  2.27s/it] 30%|████████████████████████▉                                                          | 6048/20117 [3:47:26<8:50:10,  2.26s/it] 30%|████████████████████████▉                                                          | 6049/20117 [3:47:29<8:53:33,  2.28s/it] 30%|████████████████████████▉                                                          | 6050/20117 [3:47:31<8:54:21,  2.28s/it]                                                                                                                                 {'loss': 0.2171, 'grad_norm': 0.2608936131000519, 'learning_rate': 0.00015948855156404802, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.97, 'epoch': 0.6}
 30%|████████████████████████▉                                                          | 6050/20117 [3:47:31<8:54:21,  2.28s/it] 30%|████████████████████████▉                                                          | 6051/20117 [3:47:33<8:54:09,  2.28s/it] 30%|████████████████████████▉                                                          | 6052/20117 [3:47:36<8:51:56,  2.27s/it] 30%|████████████████████████▉                                                          | 6053/20117 [3:47:38<8:52:06,  2.27s/it] 30%|████████████████████████▉                                                          | 6054/20117 [3:47:40<8:52:35,  2.27s/it] 30%|████████████████████████▉                                                          | 6055/20117 [3:47:42<8:52:57,  2.27s/it] 30%|████████████████████████▉                                                          | 6056/20117 [3:47:45<8:51:11,  2.27s/it] 30%|████████████████████████▉                                                          | 6057/20117 [3:47:47<8:51:21,  2.27s/it] 30%|████████████████████████▉                                                          | 6058/20117 [3:47:49<9:13:16,  2.36s/it] 30%|████████████████████████▉                                                          | 6059/20117 [3:47:52<9:03:32,  2.32s/it] 30%|█████████████████████████                                                          | 6060/20117 [3:47:54<8:59:07,  2.30s/it]                                                                                                                                 {'loss': 0.3213, 'grad_norm': 0.5496955513954163, 'learning_rate': 0.00015936232333252327, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 365.39, 'epoch': 0.6}
 30%|█████████████████████████                                                          | 6060/20117 [3:47:54<8:59:07,  2.30s/it] 30%|█████████████████████████                                                          | 6061/20117 [3:47:56<8:59:40,  2.30s/it] 30%|█████████████████████████                                                          | 6062/20117 [3:47:59<8:58:34,  2.30s/it] 30%|█████████████████████████                                                          | 6063/20117 [3:48:01<8:58:03,  2.30s/it] 30%|█████████████████████████                                                          | 6064/20117 [3:48:03<8:54:42,  2.28s/it] 30%|█████████████████████████                                                          | 6065/20117 [3:48:05<8:50:12,  2.26s/it] 30%|█████████████████████████                                                          | 6066/20117 [3:48:08<8:48:54,  2.26s/it] 30%|█████████████████████████                                                          | 6067/20117 [3:48:10<8:51:44,  2.27s/it] 30%|█████████████████████████                                                          | 6068/20117 [3:48:12<8:49:49,  2.26s/it] 30%|█████████████████████████                                                          | 6069/20117 [3:48:14<8:50:02,  2.26s/it] 30%|█████████████████████████                                                          | 6070/20117 [3:48:17<8:47:53,  2.25s/it]                                                                                                                                 {'loss': 0.1949, 'grad_norm': 0.40054014325141907, 'learning_rate': 0.00015923594887904964, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 293.13, 'epoch': 0.6}
 30%|█████████████████████████                                                          | 6070/20117 [3:48:17<8:47:53,  2.25s/it] 30%|█████████████████████████                                                          | 6071/20117 [3:48:19<8:48:09,  2.26s/it] 30%|█████████████████████████                                                          | 6072/20117 [3:48:21<8:46:12,  2.25s/it] 30%|█████████████████████████                                                          | 6073/20117 [3:48:23<8:39:41,  2.22s/it] 30%|█████████████████████████                                                          | 6074/20117 [3:48:26<8:41:26,  2.23s/it] 30%|█████████████████████████                                                          | 6075/20117 [3:48:28<8:41:58,  2.23s/it] 30%|█████████████████████████                                                          | 6076/20117 [3:48:30<8:41:33,  2.23s/it] 30%|█████████████████████████                                                          | 6077/20117 [3:48:32<8:45:38,  2.25s/it] 30%|█████████████████████████                                                          | 6078/20117 [3:48:35<8:45:50,  2.25s/it] 30%|█████████████████████████                                                          | 6079/20117 [3:48:37<8:41:54,  2.23s/it] 30%|█████████████████████████                                                          | 6080/20117 [3:48:39<8:44:55,  2.24s/it]                                                                                                                                 {'loss': 0.2662, 'grad_norm': 0.7320500612258911, 'learning_rate': 0.0001591094285149141, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.97, 'epoch': 0.6}
 30%|█████████████████████████                                                          | 6080/20117 [3:48:39<8:44:55,  2.24s/it] 30%|█████████████████████████                                                          | 6081/20117 [3:48:41<8:48:00,  2.26s/it] 30%|█████████████████████████                                                          | 6082/20117 [3:48:44<8:47:12,  2.25s/it] 30%|█████████████████████████                                                          | 6083/20117 [3:48:46<8:48:18,  2.26s/it] 30%|█████████████████████████                                                          | 6084/20117 [3:48:48<8:53:20,  2.28s/it] 30%|█████████████████████████                                                          | 6085/20117 [3:48:50<8:57:36,  2.30s/it] 30%|█████████████████████████                                                          | 6086/20117 [3:48:53<9:04:50,  2.33s/it] 30%|█████████████████████████                                                          | 6087/20117 [3:48:55<8:58:42,  2.30s/it] 30%|█████████████████████████                                                          | 6088/20117 [3:48:57<8:55:01,  2.29s/it] 30%|█████████████████████████                                                          | 6089/20117 [3:49:00<8:53:26,  2.28s/it] 30%|█████████████████████████▏                                                         | 6090/20117 [3:49:02<8:52:09,  2.28s/it]                                                                                                                                 {'loss': 0.2487, 'grad_norm': 0.484331876039505, 'learning_rate': 0.00015898276255176303, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.19, 'epoch': 0.61}
 30%|█████████████████████████▏                                                         | 6090/20117 [3:49:02<8:52:09,  2.28s/it] 30%|█████████████████████████▏                                                         | 6091/20117 [3:49:04<8:46:27,  2.25s/it] 30%|█████████████████████████▏                                                         | 6092/20117 [3:49:06<8:40:49,  2.23s/it] 30%|█████████████████████████▏                                                         | 6093/20117 [3:49:09<8:42:59,  2.24s/it] 30%|█████████████████████████▏                                                         | 6094/20117 [3:49:11<8:44:17,  2.24s/it] 30%|█████████████████████████▏                                                         | 6095/20117 [3:49:13<8:47:20,  2.26s/it] 30%|█████████████████████████▏                                                         | 6096/20117 [3:49:15<8:46:03,  2.25s/it] 30%|█████████████████████████▏                                                         | 6097/20117 [3:49:18<8:47:51,  2.26s/it] 30%|█████████████████████████▏                                                         | 6098/20117 [3:49:20<8:52:00,  2.28s/it] 30%|█████████████████████████▏                                                         | 6099/20117 [3:49:22<8:49:49,  2.27s/it] 30%|█████████████████████████▏                                                         | 6100/20117 [3:49:24<8:47:49,  2.26s/it]                                                                                                                                 {'loss': 0.2546, 'grad_norm': 0.4304184019565582, 'learning_rate': 0.00015885595130160155, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 329.56, 'epoch': 0.61}
 30%|█████████████████████████▏                                                         | 6100/20117 [3:49:24<8:47:49,  2.26s/it] 30%|█████████████████████████▏                                                         | 6101/20117 [3:49:27<8:46:21,  2.25s/it] 30%|█████████████████████████▏                                                         | 6102/20117 [3:49:29<8:45:25,  2.25s/it] 30%|█████████████████████████▏                                                         | 6103/20117 [3:49:31<8:46:09,  2.25s/it] 30%|█████████████████████████▏                                                         | 6104/20117 [3:49:33<8:48:46,  2.26s/it] 30%|█████████████████████████▏                                                         | 6105/20117 [3:49:36<8:48:22,  2.26s/it] 30%|█████████████████████████▏                                                         | 6106/20117 [3:49:38<8:46:26,  2.25s/it] 30%|█████████████████████████▏                                                         | 6107/20117 [3:49:40<8:43:00,  2.24s/it] 30%|█████████████████████████▏                                                         | 6108/20117 [3:49:42<8:45:22,  2.25s/it] 30%|█████████████████████████▏                                                         | 6109/20117 [3:49:45<8:47:25,  2.26s/it] 30%|█████████████████████████▏                                                         | 6110/20117 [3:49:47<8:47:14,  2.26s/it]                                                                                                                                 {'loss': 0.2622, 'grad_norm': 0.49904513359069824, 'learning_rate': 0.00015872899507679252, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.18, 'epoch': 0.61}
 30%|█████████████████████████▏                                                         | 6110/20117 [3:49:47<8:47:14,  2.26s/it] 30%|█████████████████████████▏                                                         | 6111/20117 [3:49:50<9:13:20,  2.37s/it] 30%|█████████████████████████▏                                                         | 6112/20117 [3:49:52<9:02:37,  2.32s/it] 30%|█████████████████████████▏                                                         | 6113/20117 [3:49:54<8:57:51,  2.30s/it] 30%|█████████████████████████▏                                                         | 6114/20117 [3:49:56<8:52:12,  2.28s/it] 30%|█████████████████████████▏                                                         | 6115/20117 [3:49:58<8:48:42,  2.27s/it] 30%|█████████████████████████▏                                                         | 6116/20117 [3:50:01<8:50:04,  2.27s/it] 30%|█████████████████████████▏                                                         | 6117/20117 [3:50:03<8:47:57,  2.26s/it] 30%|█████████████████████████▏                                                         | 6118/20117 [3:50:05<8:48:28,  2.27s/it] 30%|█████████████████████████▏                                                         | 6119/20117 [3:50:08<8:46:36,  2.26s/it] 30%|█████████████████████████▎                                                         | 6120/20117 [3:50:10<8:50:10,  2.27s/it]                                                                                                                                 {'loss': 0.2424, 'grad_norm': 0.525607168674469, 'learning_rate': 0.00015860189419005595, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 331.0, 'epoch': 0.61}
 30%|█████████████████████████▎                                                         | 6120/20117 [3:50:10<8:50:10,  2.27s/it] 30%|█████████████████████████▎                                                         | 6121/20117 [3:50:12<8:48:05,  2.26s/it] 30%|█████████████████████████▎                                                         | 6122/20117 [3:50:14<8:48:00,  2.26s/it] 30%|█████████████████████████▎                                                         | 6123/20117 [3:50:17<8:51:41,  2.28s/it] 30%|█████████████████████████▎                                                         | 6124/20117 [3:50:19<8:45:38,  2.25s/it] 30%|█████████████████████████▎                                                         | 6125/20117 [3:50:21<8:50:14,  2.27s/it] 30%|█████████████████████████▎                                                         | 6126/20117 [3:50:23<8:50:18,  2.27s/it] 30%|█████████████████████████▎                                                         | 6127/20117 [3:50:26<8:53:58,  2.29s/it] 30%|█████████████████████████▎                                                         | 6128/20117 [3:50:28<8:53:15,  2.29s/it] 30%|█████████████████████████▎                                                         | 6129/20117 [3:50:30<8:50:32,  2.28s/it] 30%|█████████████████████████▎                                                         | 6130/20117 [3:50:33<8:53:09,  2.29s/it]                                                                                                                                 {'loss': 0.3099, 'grad_norm': 0.3618375360965729, 'learning_rate': 0.0001584746489544682, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 435.27, 'epoch': 0.61}
 30%|█████████████████████████▎                                                         | 6130/20117 [3:50:33<8:53:09,  2.29s/it] 30%|█████████████████████████▎                                                         | 6131/20117 [3:50:35<8:52:32,  2.28s/it] 30%|█████████████████████████▎                                                         | 6132/20117 [3:50:37<8:53:23,  2.29s/it] 30%|█████████████████████████▎                                                         | 6133/20117 [3:50:39<8:55:01,  2.30s/it] 30%|█████████████████████████▎                                                         | 6134/20117 [3:50:42<8:55:34,  2.30s/it] 30%|█████████████████████████▎                                                         | 6135/20117 [3:50:44<8:56:53,  2.30s/it] 31%|█████████████████████████▎                                                         | 6136/20117 [3:50:46<8:55:21,  2.30s/it] 31%|█████████████████████████▎                                                         | 6137/20117 [3:50:49<8:56:11,  2.30s/it] 31%|█████████████████████████▎                                                         | 6138/20117 [3:50:51<8:58:19,  2.31s/it] 31%|█████████████████████████▎                                                         | 6139/20117 [3:50:53<8:58:18,  2.31s/it] 31%|█████████████████████████▎                                                         | 6140/20117 [3:50:56<8:59:21,  2.32s/it]                                                                                                                                 {'loss': 0.2337, 'grad_norm': 0.41309383511543274, 'learning_rate': 0.00015834725968346116, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.0, 'epoch': 0.61}
 31%|█████████████████████████▎                                                         | 6140/20117 [3:50:56<8:59:21,  2.32s/it] 31%|█████████████████████████▎                                                         | 6141/20117 [3:50:58<8:59:04,  2.31s/it] 31%|█████████████████████████▎                                                         | 6142/20117 [3:51:00<9:00:26,  2.32s/it] 31%|█████████████████████████▎                                                         | 6143/20117 [3:51:03<8:55:16,  2.30s/it] 31%|█████████████████████████▎                                                         | 6144/20117 [3:51:05<8:54:20,  2.29s/it] 31%|█████████████████████████▎                                                         | 6145/20117 [3:51:07<8:54:25,  2.29s/it] 31%|█████████████████████████▎                                                         | 6146/20117 [3:51:09<8:54:37,  2.30s/it] 31%|█████████████████████████▎                                                         | 6147/20117 [3:51:12<8:51:30,  2.28s/it] 31%|█████████████████████████▎                                                         | 6148/20117 [3:51:14<8:51:27,  2.28s/it] 31%|█████████████████████████▎                                                         | 6149/20117 [3:51:16<8:53:11,  2.29s/it] 31%|█████████████████████████▎                                                         | 6150/20117 [3:51:19<8:50:05,  2.28s/it]                                                                                                                                 {'loss': 0.3318, 'grad_norm': 0.37885645031929016, 'learning_rate': 0.00015821972669082156, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 365.41, 'epoch': 0.61}
 31%|█████████████████████████▎                                                         | 6150/20117 [3:51:19<8:50:05,  2.28s/it] 31%|█████████████████████████▍                                                         | 6151/20117 [3:51:21<8:50:05,  2.28s/it] 31%|█████████████████████████▍                                                         | 6152/20117 [3:51:23<8:50:27,  2.28s/it] 31%|█████████████████████████▍                                                         | 6153/20117 [3:51:25<8:43:38,  2.25s/it] 31%|█████████████████████████▍                                                         | 6154/20117 [3:51:28<8:42:44,  2.25s/it] 31%|█████████████████████████▍                                                         | 6155/20117 [3:51:30<8:41:47,  2.24s/it] 31%|█████████████████████████▍                                                         | 6156/20117 [3:51:32<8:45:24,  2.26s/it] 31%|█████████████████████████▍                                                         | 6157/20117 [3:51:34<8:45:58,  2.26s/it] 31%|█████████████████████████▍                                                         | 6158/20117 [3:51:37<8:49:59,  2.28s/it] 31%|█████████████████████████▍                                                         | 6159/20117 [3:51:39<8:51:00,  2.28s/it] 31%|█████████████████████████▍                                                         | 6160/20117 [3:51:41<8:53:58,  2.30s/it]                                                                                                                                 {'loss': 0.2213, 'grad_norm': 0.4314253032207489, 'learning_rate': 0.0001580920502906901, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.04, 'epoch': 0.61}
 31%|█████████████████████████▍                                                         | 6160/20117 [3:51:41<8:53:58,  2.30s/it] 31%|█████████████████████████▍                                                         | 6161/20117 [3:51:44<8:57:48,  2.31s/it] 31%|█████████████████████████▍                                                         | 6162/20117 [3:51:46<8:53:56,  2.30s/it] 31%|█████████████████████████▍                                                         | 6163/20117 [3:51:48<8:54:16,  2.30s/it] 31%|█████████████████████████▍                                                         | 6164/20117 [3:51:50<8:53:23,  2.29s/it] 31%|█████████████████████████▍                                                         | 6165/20117 [3:51:53<9:12:37,  2.38s/it] 31%|█████████████████████████▍                                                         | 6166/20117 [3:51:55<9:07:16,  2.35s/it] 31%|█████████████████████████▍                                                         | 6167/20117 [3:51:58<9:07:09,  2.35s/it] 31%|█████████████████████████▍                                                         | 6168/20117 [3:52:00<9:07:22,  2.35s/it] 31%|█████████████████████████▍                                                         | 6169/20117 [3:52:02<9:02:26,  2.33s/it] 31%|█████████████████████████▍                                                         | 6170/20117 [3:52:05<8:58:14,  2.32s/it]                                                                                                                                 {'loss': 0.243, 'grad_norm': 0.5435411930084229, 'learning_rate': 0.00015796423079756074, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 365.75, 'epoch': 0.61}
 31%|█████████████████████████▍                                                         | 6170/20117 [3:52:05<8:58:14,  2.32s/it] 31%|█████████████████████████▍                                                         | 6171/20117 [3:52:07<8:53:50,  2.30s/it] 31%|█████████████████████████▍                                                         | 6172/20117 [3:52:09<8:51:10,  2.29s/it] 31%|█████████████████████████▍                                                         | 6173/20117 [3:52:11<8:54:12,  2.30s/it] 31%|█████████████████████████▍                                                         | 6174/20117 [3:52:14<8:49:43,  2.28s/it] 31%|█████████████████████████▍                                                         | 6175/20117 [3:52:16<8:46:51,  2.27s/it] 31%|█████████████████████████▍                                                         | 6176/20117 [3:52:18<8:45:47,  2.26s/it] 31%|█████████████████████████▍                                                         | 6177/20117 [3:52:20<8:49:41,  2.28s/it] 31%|█████████████████████████▍                                                         | 6178/20117 [3:52:23<8:49:06,  2.28s/it] 31%|█████████████████████████▍                                                         | 6179/20117 [3:52:25<8:47:21,  2.27s/it] 31%|█████████████████████████▍                                                         | 6180/20117 [3:52:27<8:47:24,  2.27s/it]                                                                                                                                 {'loss': 0.2484, 'grad_norm': 0.31716790795326233, 'learning_rate': 0.00015783626852627992, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 435.76, 'epoch': 0.61}
 31%|█████████████████████████▍                                                         | 6180/20117 [3:52:27<8:47:24,  2.27s/it] 31%|█████████████████████████▌                                                         | 6181/20117 [3:52:30<8:49:44,  2.28s/it] 31%|█████████████████████████▌                                                         | 6182/20117 [3:52:32<8:51:44,  2.29s/it] 31%|█████████████████████████▌                                                         | 6183/20117 [3:52:34<8:55:06,  2.30s/it] 31%|█████████████████████████▌                                                         | 6184/20117 [3:52:36<8:51:42,  2.29s/it] 31%|█████████████████████████▌                                                         | 6185/20117 [3:52:39<8:52:05,  2.29s/it] 31%|█████████████████████████▌                                                         | 6186/20117 [3:52:41<8:47:10,  2.27s/it] 31%|█████████████████████████▌                                                         | 6187/20117 [3:52:43<8:47:14,  2.27s/it] 31%|█████████████████████████▌                                                         | 6188/20117 [3:52:46<8:52:08,  2.29s/it] 31%|█████████████████████████▌                                                         | 6189/20117 [3:52:48<8:47:36,  2.27s/it] 31%|█████████████████████████▌                                                         | 6190/20117 [3:52:50<8:45:13,  2.26s/it]                                                                                                                                 {'loss': 0.223, 'grad_norm': 0.3328079879283905, 'learning_rate': 0.0001577081637920457, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 310.86, 'epoch': 0.62}
 31%|█████████████████████████▌                                                         | 6190/20117 [3:52:50<8:45:13,  2.26s/it] 31%|█████████████████████████▌                                                         | 6191/20117 [3:52:52<8:47:38,  2.27s/it] 31%|█████████████████████████▌                                                         | 6192/20117 [3:52:55<8:50:22,  2.29s/it] 31%|█████████████████████████▌                                                         | 6193/20117 [3:52:57<8:51:01,  2.29s/it] 31%|█████████████████████████▌                                                         | 6194/20117 [3:52:59<8:48:11,  2.28s/it] 31%|█████████████████████████▌                                                         | 6195/20117 [3:53:02<8:51:32,  2.29s/it] 31%|█████████████████████████▌                                                         | 6196/20117 [3:53:04<8:50:02,  2.28s/it] 31%|█████████████████████████▌                                                         | 6197/20117 [3:53:06<8:49:50,  2.28s/it] 31%|█████████████████████████▌                                                         | 6198/20117 [3:53:08<8:48:06,  2.28s/it] 31%|█████████████████████████▌                                                         | 6199/20117 [3:53:11<8:49:56,  2.28s/it] 31%|█████████████████████████▌                                                         | 6200/20117 [3:53:13<8:49:13,  2.28s/it]                                                                                                                                 {'loss': 0.2311, 'grad_norm': 0.3827805519104004, 'learning_rate': 0.00015757991691040722, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 333.74, 'epoch': 0.62}
 31%|█████████████████████████▌                                                         | 6200/20117 [3:53:13<8:49:13,  2.28s/it] 31%|█████████████████████████▌                                                         | 6201/20117 [3:53:15<8:49:54,  2.28s/it] 31%|█████████████████████████▌                                                         | 6202/20117 [3:53:17<8:42:50,  2.25s/it] 31%|█████████████████████████▌                                                         | 6203/20117 [3:53:20<8:44:49,  2.26s/it] 31%|█████████████████████████▌                                                         | 6204/20117 [3:53:22<8:43:21,  2.26s/it] 31%|█████████████████████████▌                                                         | 6205/20117 [3:53:24<8:43:20,  2.26s/it] 31%|█████████████████████████▌                                                         | 6206/20117 [3:53:26<8:43:14,  2.26s/it] 31%|█████████████████████████▌                                                         | 6207/20117 [3:53:29<8:42:13,  2.25s/it] 31%|█████████████████████████▌                                                         | 6208/20117 [3:53:31<8:47:36,  2.28s/it] 31%|█████████████████████████▌                                                         | 6209/20117 [3:53:33<8:45:24,  2.27s/it] 31%|█████████████████████████▌                                                         | 6210/20117 [3:53:36<8:43:30,  2.26s/it]                                                                                                                                 {'loss': 0.229, 'grad_norm': 0.3756648004055023, 'learning_rate': 0.00015745152819726356, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.29, 'epoch': 0.62}
 31%|█████████████████████████▌                                                         | 6210/20117 [3:53:36<8:43:30,  2.26s/it] 31%|█████████████████████████▋                                                         | 6211/20117 [3:53:38<8:45:20,  2.27s/it] 31%|█████████████████████████▋                                                         | 6212/20117 [3:53:40<8:44:48,  2.26s/it] 31%|█████████████████████████▋                                                         | 6213/20117 [3:53:42<8:45:26,  2.27s/it] 31%|█████████████████████████▋                                                         | 6214/20117 [3:53:45<8:47:32,  2.28s/it] 31%|█████████████████████████▋                                                         | 6215/20117 [3:53:47<8:49:15,  2.28s/it] 31%|█████████████████████████▋                                                         | 6216/20117 [3:53:49<8:48:12,  2.28s/it] 31%|█████████████████████████▋                                                         | 6217/20117 [3:53:52<9:13:32,  2.39s/it] 31%|█████████████████████████▋                                                         | 6218/20117 [3:53:54<9:07:49,  2.36s/it] 31%|█████████████████████████▋                                                         | 6219/20117 [3:53:56<9:02:58,  2.34s/it] 31%|█████████████████████████▋                                                         | 6220/20117 [3:53:59<8:58:53,  2.33s/it]                                                                                                                                 {'loss': 0.2694, 'grad_norm': 0.4345835745334625, 'learning_rate': 0.0001573229979688633, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 411.14, 'epoch': 0.62}
 31%|█████████████████████████▋                                                         | 6220/20117 [3:53:59<8:58:53,  2.33s/it] 31%|█████████████████████████▋                                                         | 6221/20117 [3:54:01<8:58:26,  2.32s/it] 31%|█████████████████████████▋                                                         | 6222/20117 [3:54:03<8:50:28,  2.29s/it] 31%|█████████████████████████▋                                                         | 6223/20117 [3:54:06<8:50:23,  2.29s/it] 31%|█████████████████████████▋                                                         | 6224/20117 [3:54:08<8:49:43,  2.29s/it] 31%|█████████████████████████▋                                                         | 6225/20117 [3:54:10<8:53:52,  2.31s/it] 31%|█████████████████████████▋                                                         | 6226/20117 [3:54:12<8:49:33,  2.29s/it] 31%|█████████████████████████▋                                                         | 6227/20117 [3:54:15<8:46:58,  2.28s/it] 31%|█████████████████████████▋                                                         | 6228/20117 [3:54:17<8:48:34,  2.28s/it] 31%|█████████████████████████▋                                                         | 6229/20117 [3:54:19<8:55:05,  2.31s/it] 31%|█████████████████████████▋                                                         | 6230/20117 [3:54:22<8:48:16,  2.28s/it]                                                                                                                                 {'loss': 0.1925, 'grad_norm': 0.23183055222034454, 'learning_rate': 0.00015719432654180357, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.23, 'epoch': 0.62}
 31%|█████████████████████████▋                                                         | 6230/20117 [3:54:22<8:48:16,  2.28s/it] 31%|█████████████████████████▋                                                         | 6231/20117 [3:54:24<8:50:00,  2.29s/it] 31%|█████████████████████████▋                                                         | 6232/20117 [3:54:26<8:51:22,  2.30s/it] 31%|█████████████████████████▋                                                         | 6233/20117 [3:54:28<8:48:55,  2.29s/it] 31%|█████████████████████████▋                                                         | 6234/20117 [3:54:31<8:50:42,  2.29s/it] 31%|█████████████████████████▋                                                         | 6235/20117 [3:54:33<8:46:22,  2.28s/it] 31%|█████████████████████████▋                                                         | 6236/20117 [3:54:35<8:41:56,  2.26s/it] 31%|█████████████████████████▋                                                         | 6237/20117 [3:54:37<8:42:46,  2.26s/it] 31%|█████████████████████████▋                                                         | 6238/20117 [3:54:40<8:44:37,  2.27s/it] 31%|█████████████████████████▋                                                         | 6239/20117 [3:54:42<8:49:11,  2.29s/it] 31%|█████████████████████████▋                                                         | 6240/20117 [3:54:44<8:46:54,  2.28s/it]                                                                                                                                 {'loss': 0.1506, 'grad_norm': 0.2867846190929413, 'learning_rate': 0.00015706551423302925, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 275.12, 'epoch': 0.62}
 31%|█████████████████████████▋                                                         | 6240/20117 [3:54:44<8:46:54,  2.28s/it] 31%|█████████████████████████▋                                                         | 6241/20117 [3:54:47<8:40:54,  2.25s/it] 31%|█████████████████████████▊                                                         | 6242/20117 [3:54:49<8:40:31,  2.25s/it] 31%|█████████████████████████▊                                                         | 6243/20117 [3:54:51<8:44:36,  2.27s/it] 31%|█████████████████████████▊                                                         | 6244/20117 [3:54:53<8:45:45,  2.27s/it] 31%|█████████████████████████▊                                                         | 6245/20117 [3:54:56<8:44:17,  2.27s/it] 31%|█████████████████████████▊                                                         | 6246/20117 [3:54:58<8:39:46,  2.25s/it] 31%|█████████████████████████▊                                                         | 6247/20117 [3:55:00<8:40:38,  2.25s/it] 31%|█████████████████████████▊                                                         | 6248/20117 [3:55:02<8:44:19,  2.27s/it] 31%|█████████████████████████▊                                                         | 6249/20117 [3:55:05<8:47:40,  2.28s/it] 31%|█████████████████████████▊                                                         | 6250/20117 [3:55:07<8:46:07,  2.28s/it]                                                                                                                                 {'loss': 0.2867, 'grad_norm': 0.42951786518096924, 'learning_rate': 0.00015693656135983233, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 383.43, 'epoch': 0.62}
 31%|█████████████████████████▊                                                         | 6250/20117 [3:55:07<8:46:07,  2.28s/it] 31%|█████████████████████████▊                                                         | 6251/20117 [3:55:09<8:46:45,  2.28s/it] 31%|█████████████████████████▊                                                         | 6252/20117 [3:55:12<8:43:15,  2.26s/it] 31%|█████████████████████████▊                                                         | 6253/20117 [3:55:14<8:44:14,  2.27s/it] 31%|█████████████████████████▊                                                         | 6254/20117 [3:55:16<8:40:49,  2.25s/it] 31%|█████████████████████████▊                                                         | 6255/20117 [3:55:18<8:45:49,  2.28s/it] 31%|█████████████████████████▊                                                         | 6256/20117 [3:55:21<8:44:17,  2.27s/it] 31%|█████████████████████████▊                                                         | 6257/20117 [3:55:23<8:48:13,  2.29s/it] 31%|█████████████████████████▊                                                         | 6258/20117 [3:55:25<8:46:43,  2.28s/it] 31%|█████████████████████████▊                                                         | 6259/20117 [3:55:27<8:43:04,  2.26s/it] 31%|█████████████████████████▊                                                         | 6260/20117 [3:55:30<8:39:57,  2.25s/it]                                                                                                                                 {'loss': 0.263, 'grad_norm': 0.5975165963172913, 'learning_rate': 0.00015680746823985094, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 304.9, 'epoch': 0.62}
 31%|█████████████████████████▊                                                         | 6260/20117 [3:55:30<8:39:57,  2.25s/it] 31%|█████████████████████████▊                                                         | 6261/20117 [3:55:32<8:48:38,  2.29s/it] 31%|█████████████████████████▊                                                         | 6262/20117 [3:55:34<8:49:07,  2.29s/it] 31%|█████████████████████████▊                                                         | 6263/20117 [3:55:37<8:43:16,  2.27s/it] 31%|█████████████████████████▊                                                         | 6264/20117 [3:55:39<8:35:46,  2.23s/it] 31%|█████████████████████████▊                                                         | 6265/20117 [3:55:41<8:32:35,  2.22s/it] 31%|█████████████████████████▊                                                         | 6266/20117 [3:55:43<8:28:26,  2.20s/it] 31%|█████████████████████████▊                                                         | 6267/20117 [3:55:45<8:25:02,  2.19s/it] 31%|█████████████████████████▊                                                         | 6268/20117 [3:55:47<8:23:31,  2.18s/it] 31%|█████████████████████████▊                                                         | 6269/20117 [3:55:50<8:46:08,  2.28s/it] 31%|█████████████████████████▊                                                         | 6270/20117 [3:55:52<8:42:01,  2.26s/it]                                                                                                                                 {'loss': 0.2079, 'grad_norm': 0.48659709095954895, 'learning_rate': 0.00015667823519106873, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 340.54, 'epoch': 0.62}
 31%|█████████████████████████▊                                                         | 6270/20117 [3:55:52<8:42:01,  2.26s/it] 31%|█████████████████████████▊                                                         | 6271/20117 [3:55:54<8:41:25,  2.26s/it] 31%|█████████████████████████▉                                                         | 6272/20117 [3:55:57<8:44:26,  2.27s/it] 31%|█████████████████████████▉                                                         | 6273/20117 [3:55:59<8:42:33,  2.26s/it] 31%|█████████████████████████▉                                                         | 6274/20117 [3:56:01<8:44:47,  2.27s/it] 31%|█████████████████████████▉                                                         | 6275/20117 [3:56:03<8:45:47,  2.28s/it] 31%|█████████████████████████▉                                                         | 6276/20117 [3:56:06<8:50:20,  2.30s/it] 31%|█████████████████████████▉                                                         | 6277/20117 [3:56:08<8:48:50,  2.29s/it] 31%|█████████████████████████▉                                                         | 6278/20117 [3:56:10<8:46:32,  2.28s/it] 31%|█████████████████████████▉                                                         | 6279/20117 [3:56:13<8:45:13,  2.28s/it] 31%|█████████████████████████▉                                                         | 6280/20117 [3:56:15<8:43:23,  2.27s/it]                                                                                                                                 {'loss': 0.2537, 'grad_norm': 0.461224764585495, 'learning_rate': 0.00015654886253181402, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 321.0, 'epoch': 0.62}
 31%|█████████████████████████▉                                                         | 6280/20117 [3:56:15<8:43:23,  2.27s/it] 31%|█████████████████████████▉                                                         | 6281/20117 [3:56:17<8:45:40,  2.28s/it] 31%|█████████████████████████▉                                                         | 6282/20117 [3:56:19<8:45:49,  2.28s/it] 31%|█████████████████████████▉                                                         | 6283/20117 [3:56:22<8:41:15,  2.26s/it] 31%|█████████████████████████▉                                                         | 6284/20117 [3:56:24<8:34:23,  2.23s/it] 31%|█████████████████████████▉                                                         | 6285/20117 [3:56:26<8:36:35,  2.24s/it] 31%|█████████████████████████▉                                                         | 6286/20117 [3:56:28<8:37:55,  2.25s/it] 31%|█████████████████████████▉                                                         | 6287/20117 [3:56:31<8:40:27,  2.26s/it] 31%|█████████████████████████▉                                                         | 6288/20117 [3:56:33<8:46:25,  2.28s/it] 31%|█████████████████████████▉                                                         | 6289/20117 [3:56:35<8:46:18,  2.28s/it] 31%|█████████████████████████▉                                                         | 6290/20117 [3:56:38<8:54:27,  2.32s/it]                                                                                                                                 {'loss': 0.2009, 'grad_norm': 0.36294886469841003, 'learning_rate': 0.00015641935058075904, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 308.81, 'epoch': 0.63}
 31%|█████████████████████████▉                                                         | 6290/20117 [3:56:38<8:54:27,  2.32s/it] 31%|█████████████████████████▉                                                         | 6291/20117 [3:56:40<8:51:02,  2.30s/it] 31%|█████████████████████████▉                                                         | 6292/20117 [3:56:42<8:49:04,  2.30s/it] 31%|█████████████████████████▉                                                         | 6293/20117 [3:56:44<8:46:25,  2.28s/it] 31%|█████████████████████████▉                                                         | 6294/20117 [3:56:47<8:49:57,  2.30s/it] 31%|█████████████████████████▉                                                         | 6295/20117 [3:56:49<8:51:57,  2.31s/it] 31%|█████████████████████████▉                                                         | 6296/20117 [3:56:52<8:58:16,  2.34s/it] 31%|█████████████████████████▉                                                         | 6297/20117 [3:56:54<8:55:51,  2.33s/it] 31%|█████████████████████████▉                                                         | 6298/20117 [3:56:56<8:55:50,  2.33s/it] 31%|█████████████████████████▉                                                         | 6299/20117 [3:56:58<8:55:16,  2.32s/it] 31%|█████████████████████████▉                                                         | 6300/20117 [3:57:01<8:58:11,  2.34s/it]                                                                                                                                 {'loss': 0.2343, 'grad_norm': 0.4274291396141052, 'learning_rate': 0.0001562896996569191, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.11, 'epoch': 0.63}
 31%|█████████████████████████▉                                                         | 6300/20117 [3:57:01<8:58:11,  2.34s/it] 31%|█████████████████████████▉                                                         | 6301/20117 [3:57:03<8:54:29,  2.32s/it] 31%|██████████████████████████                                                         | 6302/20117 [3:57:05<8:54:30,  2.32s/it] 31%|██████████████████████████                                                         | 6303/20117 [3:57:08<8:54:49,  2.32s/it] 31%|██████████████████████████                                                         | 6304/20117 [3:57:10<8:52:15,  2.31s/it] 31%|██████████████████████████                                                         | 6305/20117 [3:57:12<8:55:00,  2.32s/it] 31%|██████████████████████████                                                         | 6306/20117 [3:57:15<8:49:26,  2.30s/it] 31%|██████████████████████████                                                         | 6307/20117 [3:57:17<8:49:00,  2.30s/it] 31%|██████████████████████████                                                         | 6308/20117 [3:57:19<8:48:09,  2.29s/it] 31%|██████████████████████████                                                         | 6309/20117 [3:57:22<8:46:11,  2.29s/it] 31%|██████████████████████████                                                         | 6310/20117 [3:57:24<8:46:14,  2.29s/it]                                                                                                                                 {'loss': 0.211, 'grad_norm': 0.47336748242378235, 'learning_rate': 0.00015615991007965176, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 355.64, 'epoch': 0.63}
 31%|██████████████████████████                                                         | 6310/20117 [3:57:24<8:46:14,  2.29s/it] 31%|██████████████████████████                                                         | 6311/20117 [3:57:26<8:45:28,  2.28s/it] 31%|██████████████████████████                                                         | 6312/20117 [3:57:28<8:48:53,  2.30s/it] 31%|██████████████████████████                                                         | 6313/20117 [3:57:31<8:52:47,  2.32s/it] 31%|██████████████████████████                                                         | 6314/20117 [3:57:33<8:51:27,  2.31s/it] 31%|██████████████████████████                                                         | 6315/20117 [3:57:35<8:45:46,  2.29s/it] 31%|██████████████████████████                                                         | 6316/20117 [3:57:38<8:45:31,  2.28s/it] 31%|██████████████████████████                                                         | 6317/20117 [3:57:40<8:45:05,  2.28s/it] 31%|██████████████████████████                                                         | 6318/20117 [3:57:42<8:45:45,  2.29s/it] 31%|██████████████████████████                                                         | 6319/20117 [3:57:44<8:47:32,  2.29s/it] 31%|██████████████████████████                                                         | 6320/20117 [3:57:47<8:46:01,  2.29s/it]                                                                                                                                 {'loss': 0.2492, 'grad_norm': 0.4946894347667694, 'learning_rate': 0.00015602998216865624, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.15, 'epoch': 0.63}
 31%|██████████████████████████                                                         | 6320/20117 [3:57:47<8:46:01,  2.29s/it] 31%|██████████████████████████                                                         | 6321/20117 [3:57:49<8:44:06,  2.28s/it] 31%|██████████████████████████                                                         | 6322/20117 [3:57:52<9:08:23,  2.39s/it] 31%|██████████████████████████                                                         | 6323/20117 [3:57:54<8:58:35,  2.34s/it] 31%|██████████████████████████                                                         | 6324/20117 [3:57:56<8:55:50,  2.33s/it] 31%|██████████████████████████                                                         | 6325/20117 [3:57:58<8:53:19,  2.32s/it] 31%|██████████████████████████                                                         | 6326/20117 [3:58:01<8:48:38,  2.30s/it] 31%|██████████████████████████                                                         | 6327/20117 [3:58:03<8:43:30,  2.28s/it] 31%|██████████████████████████                                                         | 6328/20117 [3:58:05<8:44:31,  2.28s/it] 31%|██████████████████████████                                                         | 6329/20117 [3:58:08<8:46:03,  2.29s/it] 31%|██████████████████████████                                                         | 6330/20117 [3:58:10<8:43:10,  2.28s/it]                                                                                                                                 {'loss': 0.2308, 'grad_norm': 0.38327473402023315, 'learning_rate': 0.00015589991624397244, 'memory/max_active (GiB)': 20.61, 'memory/max_allocated (GiB)': 20.61, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 301.77, 'epoch': 0.63}
 31%|██████████████████████████                                                         | 6330/20117 [3:58:10<8:43:10,  2.28s/it] 31%|██████████████████████████                                                         | 6331/20117 [3:58:12<8:40:03,  2.26s/it] 31%|██████████████████████████                                                         | 6332/20117 [3:58:14<8:43:44,  2.28s/it] 31%|██████████████████████████▏                                                        | 6333/20117 [3:58:17<8:39:18,  2.26s/it] 31%|██████████████████████████▏                                                        | 6334/20117 [3:58:19<8:38:04,  2.26s/it] 31%|██████████████████████████▏                                                        | 6335/20117 [3:58:21<8:38:03,  2.26s/it] 31%|██████████████████████████▏                                                        | 6336/20117 [3:58:23<8:33:50,  2.24s/it] 32%|██████████████████████████▏                                                        | 6337/20117 [3:58:25<8:33:28,  2.24s/it] 32%|██████████████████████████▏                                                        | 6338/20117 [3:58:28<8:37:09,  2.25s/it] 32%|██████████████████████████▏                                                        | 6339/20117 [3:58:30<8:35:00,  2.24s/it] 32%|██████████████████████████▏                                                        | 6340/20117 [3:58:32<8:43:36,  2.28s/it]                                                                                                                                 {'loss': 0.2812, 'grad_norm': 0.46532171964645386, 'learning_rate': 0.00015576971262598024, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 344.05, 'epoch': 0.63}
 32%|██████████████████████████▏                                                        | 6340/20117 [3:58:32<8:43:36,  2.28s/it] 32%|██████████████████████████▏                                                        | 6341/20117 [3:58:35<8:39:42,  2.26s/it] 32%|██████████████████████████▏                                                        | 6342/20117 [3:58:37<8:37:53,  2.26s/it] 32%|██████████████████████████▏                                                        | 6343/20117 [3:58:39<8:40:16,  2.27s/it] 32%|██████████████████████████▏                                                        | 6344/20117 [3:58:41<8:38:31,  2.26s/it] 32%|██████████████████████████▏                                                        | 6345/20117 [3:58:44<8:40:18,  2.27s/it] 32%|██████████████████████████▏                                                        | 6346/20117 [3:58:46<8:44:00,  2.28s/it] 32%|██████████████████████████▏                                                        | 6347/20117 [3:58:48<8:44:13,  2.28s/it] 32%|██████████████████████████▏                                                        | 6348/20117 [3:58:51<8:46:03,  2.29s/it] 32%|██████████████████████████▏                                                        | 6349/20117 [3:58:53<8:42:01,  2.27s/it] 32%|██████████████████████████▏                                                        | 6350/20117 [3:58:55<8:43:41,  2.28s/it]                                                                                                                                 {'loss': 0.2415, 'grad_norm': 0.3749048709869385, 'learning_rate': 0.00015563937163539862, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 348.41, 'epoch': 0.63}
 32%|██████████████████████████▏                                                        | 6350/20117 [3:58:55<8:43:41,  2.28s/it] 32%|██████████████████████████▏                                                        | 6351/20117 [3:58:57<8:43:30,  2.28s/it] 32%|██████████████████████████▏                                                        | 6352/20117 [3:59:00<8:48:33,  2.30s/it] 32%|██████████████████████████▏                                                        | 6353/20117 [3:59:02<8:43:37,  2.28s/it] 32%|██████████████████████████▏                                                        | 6354/20117 [3:59:04<8:42:59,  2.28s/it] 32%|██████████████████████████▏                                                        | 6355/20117 [3:59:07<8:46:24,  2.30s/it] 32%|██████████████████████████▏                                                        | 6356/20117 [3:59:09<8:44:13,  2.29s/it] 32%|██████████████████████████▏                                                        | 6357/20117 [3:59:11<8:47:42,  2.30s/it] 32%|██████████████████████████▏                                                        | 6358/20117 [3:59:13<8:42:48,  2.28s/it] 32%|██████████████████████████▏                                                        | 6359/20117 [3:59:16<8:37:04,  2.26s/it] 32%|██████████████████████████▏                                                        | 6360/20117 [3:59:18<8:38:10,  2.26s/it]                                                                                                                                 {'loss': 0.2228, 'grad_norm': 0.2943509519100189, 'learning_rate': 0.000155508893593285, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 288.63, 'epoch': 0.63}
 32%|██████████████████████████▏                                                        | 6360/20117 [3:59:18<8:38:10,  2.26s/it] 32%|██████████████████████████▏                                                        | 6361/20117 [3:59:20<8:37:28,  2.26s/it] 32%|██████████████████████████▏                                                        | 6362/20117 [3:59:22<8:35:43,  2.25s/it] 32%|██████████████████████████▎                                                        | 6363/20117 [3:59:25<8:36:39,  2.25s/it] 32%|██████████████████████████▎                                                        | 6364/20117 [3:59:27<8:39:36,  2.27s/it] 32%|██████████████████████████▎                                                        | 6365/20117 [3:59:29<8:37:09,  2.26s/it] 32%|██████████████████████████▎                                                        | 6366/20117 [3:59:31<8:33:01,  2.24s/it] 32%|██████████████████████████▎                                                        | 6367/20117 [3:59:34<8:32:04,  2.23s/it] 32%|██████████████████████████▎                                                        | 6368/20117 [3:59:36<8:35:53,  2.25s/it] 32%|██████████████████████████▎                                                        | 6369/20117 [3:59:38<8:34:17,  2.24s/it] 32%|██████████████████████████▎                                                        | 6370/20117 [3:59:40<8:33:06,  2.24s/it]                                                                                                                                 {'loss': 0.2499, 'grad_norm': 0.5494127869606018, 'learning_rate': 0.00015537827882103442, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 356.07, 'epoch': 0.63}
 32%|██████████████████████████▎                                                        | 6370/20117 [3:59:40<8:33:06,  2.24s/it] 32%|██████████████████████████▎                                                        | 6371/20117 [3:59:43<8:38:59,  2.27s/it] 32%|██████████████████████████▎                                                        | 6372/20117 [3:59:45<8:37:12,  2.26s/it] 32%|██████████████████████████▎                                                        | 6373/20117 [3:59:47<8:37:03,  2.26s/it] 32%|██████████████████████████▎                                                        | 6374/20117 [3:59:49<8:35:25,  2.25s/it] 32%|██████████████████████████▎                                                        | 6375/20117 [3:59:52<8:39:13,  2.27s/it] 32%|██████████████████████████▎                                                        | 6376/20117 [3:59:54<9:07:38,  2.39s/it] 32%|██████████████████████████▎                                                        | 6377/20117 [3:59:57<8:58:07,  2.35s/it] 32%|██████████████████████████▎                                                        | 6378/20117 [3:59:59<8:56:57,  2.34s/it] 32%|██████████████████████████▎                                                        | 6379/20117 [4:00:01<8:52:12,  2.32s/it] 32%|██████████████████████████▎                                                        | 6380/20117 [4:00:03<8:47:52,  2.31s/it]                                                                                                                                 {'loss': 0.2142, 'grad_norm': 0.48988381028175354, 'learning_rate': 0.0001552475276403786, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.8, 'epoch': 0.63}
 32%|██████████████████████████▎                                                        | 6380/20117 [4:00:03<8:47:52,  2.31s/it] 32%|██████████████████████████▎                                                        | 6381/20117 [4:00:06<8:49:57,  2.31s/it] 32%|██████████████████████████▎                                                        | 6382/20117 [4:00:08<8:48:13,  2.31s/it] 32%|██████████████████████████▎                                                        | 6383/20117 [4:00:10<8:51:07,  2.32s/it] 32%|██████████████████████████▎                                                        | 6384/20117 [4:00:13<8:50:18,  2.32s/it] 32%|██████████████████████████▎                                                        | 6385/20117 [4:00:15<8:45:10,  2.29s/it] 32%|██████████████████████████▎                                                        | 6386/20117 [4:00:17<8:43:27,  2.29s/it] 32%|██████████████████████████▎                                                        | 6387/20117 [4:00:20<8:43:00,  2.29s/it] 32%|██████████████████████████▎                                                        | 6388/20117 [4:00:22<8:37:06,  2.26s/it] 32%|██████████████████████████▎                                                        | 6389/20117 [4:00:24<8:38:41,  2.27s/it] 32%|██████████████████████████▎                                                        | 6390/20117 [4:00:26<8:37:02,  2.26s/it]                                                                                                                                 {'loss': 0.2364, 'grad_norm': 0.3422715365886688, 'learning_rate': 0.00015511664037338538, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 409.18, 'epoch': 0.64}
 32%|██████████████████████████▎                                                        | 6390/20117 [4:00:26<8:37:02,  2.26s/it] 32%|██████████████████████████▎                                                        | 6391/20117 [4:00:29<8:35:43,  2.25s/it] 32%|██████████████████████████▎                                                        | 6392/20117 [4:00:31<8:40:07,  2.27s/it] 32%|██████████████████████████▍                                                        | 6393/20117 [4:00:33<8:39:23,  2.27s/it] 32%|██████████████████████████▍                                                        | 6394/20117 [4:00:35<8:39:57,  2.27s/it] 32%|██████████████████████████▍                                                        | 6395/20117 [4:00:38<8:42:46,  2.29s/it] 32%|██████████████████████████▍                                                        | 6396/20117 [4:00:40<8:43:14,  2.29s/it] 32%|██████████████████████████▍                                                        | 6397/20117 [4:00:42<8:39:08,  2.27s/it] 32%|██████████████████████████▍                                                        | 6398/20117 [4:00:45<8:39:33,  2.27s/it] 32%|██████████████████████████▍                                                        | 6399/20117 [4:00:47<8:39:26,  2.27s/it] 32%|██████████████████████████▍                                                        | 6400/20117 [4:00:49<8:45:12,  2.30s/it]                                                                                                                                 {'loss': 0.2392, 'grad_norm': 0.6021102070808411, 'learning_rate': 0.00015498561734245776, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 306.59, 'epoch': 0.64}
 32%|██████████████████████████▍                                                        | 6400/20117 [4:00:49<8:45:12,  2.30s/it] 32%|██████████████████████████▍                                                        | 6401/20117 [4:00:51<8:44:13,  2.29s/it] 32%|██████████████████████████▍                                                        | 6402/20117 [4:00:54<8:43:16,  2.29s/it] 32%|██████████████████████████▍                                                        | 6403/20117 [4:00:56<8:43:01,  2.29s/it] 32%|██████████████████████████▍                                                        | 6404/20117 [4:00:58<8:41:02,  2.28s/it] 32%|██████████████████████████▍                                                        | 6405/20117 [4:01:01<8:40:26,  2.28s/it] 32%|██████████████████████████▍                                                        | 6406/20117 [4:01:03<8:39:30,  2.27s/it] 32%|██████████████████████████▍                                                        | 6407/20117 [4:01:05<8:44:01,  2.29s/it] 32%|██████████████████████████▍                                                        | 6408/20117 [4:01:07<8:42:40,  2.29s/it] 32%|██████████████████████████▍                                                        | 6409/20117 [4:01:10<8:46:08,  2.30s/it] 32%|██████████████████████████▍                                                        | 6410/20117 [4:01:12<8:41:46,  2.28s/it]                                                                                                                                 {'loss': 0.2798, 'grad_norm': 0.6073122620582581, 'learning_rate': 0.00015485445887033317, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 342.93, 'epoch': 0.64}
 32%|██████████████████████████▍                                                        | 6410/20117 [4:01:12<8:41:46,  2.28s/it] 32%|██████████████████████████▍                                                        | 6411/20117 [4:01:14<8:43:29,  2.29s/it] 32%|██████████████████████████▍                                                        | 6412/20117 [4:01:17<8:44:17,  2.30s/it] 32%|██████████████████████████▍                                                        | 6413/20117 [4:01:19<8:46:09,  2.30s/it] 32%|██████████████████████████▍                                                        | 6414/20117 [4:01:21<8:41:16,  2.28s/it] 32%|██████████████████████████▍                                                        | 6415/20117 [4:01:23<8:41:17,  2.28s/it] 32%|██████████████████████████▍                                                        | 6416/20117 [4:01:26<8:43:06,  2.29s/it] 32%|██████████████████████████▍                                                        | 6417/20117 [4:01:28<8:41:40,  2.28s/it] 32%|██████████████████████████▍                                                        | 6418/20117 [4:01:30<8:43:17,  2.29s/it] 32%|██████████████████████████▍                                                        | 6419/20117 [4:01:33<8:40:47,  2.28s/it] 32%|██████████████████████████▍                                                        | 6420/20117 [4:01:35<8:42:24,  2.29s/it]                                                                                                                                 {'loss': 0.2477, 'grad_norm': 0.3407362401485443, 'learning_rate': 0.0001547231652800826, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 367.05, 'epoch': 0.64}
 32%|██████████████████████████▍                                                        | 6420/20117 [4:01:35<8:42:24,  2.29s/it] 32%|██████████████████████████▍                                                        | 6421/20117 [4:01:37<8:50:53,  2.33s/it] 32%|██████████████████████████▍                                                        | 6422/20117 [4:01:40<8:52:24,  2.33s/it] 32%|██████████████████████████▌                                                        | 6423/20117 [4:01:42<8:48:34,  2.32s/it] 32%|██████████████████████████▌                                                        | 6424/20117 [4:01:44<8:49:17,  2.32s/it] 32%|██████████████████████████▌                                                        | 6425/20117 [4:01:47<8:50:04,  2.32s/it] 32%|██████████████████████████▌                                                        | 6426/20117 [4:01:49<8:46:36,  2.31s/it] 32%|██████████████████████████▌                                                        | 6427/20117 [4:01:51<8:48:59,  2.32s/it] 32%|██████████████████████████▌                                                        | 6428/20117 [4:01:54<9:14:22,  2.43s/it] 32%|██████████████████████████▌                                                        | 6429/20117 [4:01:56<9:03:59,  2.38s/it] 32%|██████████████████████████▌                                                        | 6430/20117 [4:01:58<8:59:30,  2.37s/it]                                                                                                                                 {'loss': 0.2399, 'grad_norm': 0.506592869758606, 'learning_rate': 0.00015459173689510994, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 412.24, 'epoch': 0.64}
 32%|██████████████████████████▌                                                        | 6430/20117 [4:01:58<8:59:30,  2.37s/it] 32%|██████████████████████████▌                                                        | 6431/20117 [4:02:01<8:57:04,  2.35s/it] 32%|██████████████████████████▌                                                        | 6432/20117 [4:02:03<8:55:42,  2.35s/it] 32%|██████████████████████████▌                                                        | 6433/20117 [4:02:05<8:53:15,  2.34s/it] 32%|██████████████████████████▌                                                        | 6434/20117 [4:02:08<8:50:13,  2.33s/it] 32%|██████████████████████████▌                                                        | 6435/20117 [4:02:10<8:48:27,  2.32s/it] 32%|██████████████████████████▌                                                        | 6436/20117 [4:02:12<8:46:24,  2.31s/it] 32%|██████████████████████████▌                                                        | 6437/20117 [4:02:15<8:40:41,  2.28s/it] 32%|██████████████████████████▌                                                        | 6438/20117 [4:02:17<8:50:06,  2.33s/it] 32%|██████████████████████████▌                                                        | 6439/20117 [4:02:19<8:44:55,  2.30s/it] 32%|██████████████████████████▌                                                        | 6440/20117 [4:02:21<8:41:48,  2.29s/it]                                                                                                                                 {'loss': 0.1948, 'grad_norm': 0.5419439077377319, 'learning_rate': 0.0001544601740391511, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 329.23, 'epoch': 0.64}
 32%|██████████████████████████▌                                                        | 6440/20117 [4:02:21<8:41:48,  2.29s/it] 32%|██████████████████████████▌                                                        | 6441/20117 [4:02:24<8:46:55,  2.31s/it] 32%|██████████████████████████▌                                                        | 6442/20117 [4:02:26<8:46:39,  2.31s/it] 32%|██████████████████████████▌                                                        | 6443/20117 [4:02:28<8:44:34,  2.30s/it] 32%|██████████████████████████▌                                                        | 6444/20117 [4:02:31<8:44:13,  2.30s/it] 32%|██████████████████████████▌                                                        | 6445/20117 [4:02:33<8:39:12,  2.28s/it] 32%|██████████████████████████▌                                                        | 6446/20117 [4:02:35<8:40:35,  2.28s/it] 32%|██████████████████████████▌                                                        | 6447/20117 [4:02:38<8:38:21,  2.28s/it] 32%|██████████████████████████▌                                                        | 6448/20117 [4:02:40<8:34:58,  2.26s/it] 32%|██████████████████████████▌                                                        | 6449/20117 [4:02:42<8:36:57,  2.27s/it] 32%|██████████████████████████▌                                                        | 6450/20117 [4:02:44<8:37:06,  2.27s/it]                                                                                                                                 {'loss': 0.2146, 'grad_norm': 0.48251059651374817, 'learning_rate': 0.00015432847703627316, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 317.65, 'epoch': 0.64}
 32%|██████████████████████████▌                                                        | 6450/20117 [4:02:44<8:37:06,  2.27s/it] 32%|██████████████████████████▌                                                        | 6451/20117 [4:02:47<8:35:40,  2.26s/it] 32%|██████████████████████████▌                                                        | 6452/20117 [4:02:49<8:35:56,  2.27s/it] 32%|██████████████████████████▌                                                        | 6453/20117 [4:02:51<8:36:15,  2.27s/it] 32%|██████████████████████████▋                                                        | 6454/20117 [4:02:53<8:32:25,  2.25s/it] 32%|██████████████████████████▋                                                        | 6455/20117 [4:02:56<8:33:51,  2.26s/it] 32%|██████████████████████████▋                                                        | 6456/20117 [4:02:58<8:25:46,  2.22s/it] 32%|██████████████████████████▋                                                        | 6457/20117 [4:03:00<8:23:46,  2.21s/it] 32%|██████████████████████████▋                                                        | 6458/20117 [4:03:02<8:21:18,  2.20s/it] 32%|██████████████████████████▋                                                        | 6459/20117 [4:03:04<8:26:26,  2.22s/it] 32%|██████████████████████████▋                                                        | 6460/20117 [4:03:07<8:29:39,  2.24s/it]                                                                                                                                 {'loss': 0.2593, 'grad_norm': 0.22626249492168427, 'learning_rate': 0.0001541966462108737, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 331.91, 'epoch': 0.64}
 32%|██████████████████████████▋                                                        | 6460/20117 [4:03:07<8:29:39,  2.24s/it] 32%|██████████████████████████▋                                                        | 6461/20117 [4:03:09<8:33:38,  2.26s/it] 32%|██████████████████████████▋                                                        | 6462/20117 [4:03:11<8:39:07,  2.28s/it] 32%|██████████████████████████▋                                                        | 6463/20117 [4:03:14<8:40:16,  2.29s/it] 32%|██████████████████████████▋                                                        | 6464/20117 [4:03:16<8:43:21,  2.30s/it] 32%|██████████████████████████▋                                                        | 6465/20117 [4:03:18<8:40:50,  2.29s/it] 32%|██████████████████████████▋                                                        | 6466/20117 [4:03:20<8:40:20,  2.29s/it] 32%|██████████████████████████▋                                                        | 6467/20117 [4:03:23<8:41:12,  2.29s/it] 32%|██████████████████████████▋                                                        | 6468/20117 [4:03:25<8:43:28,  2.30s/it] 32%|██████████████████████████▋                                                        | 6469/20117 [4:03:27<8:43:25,  2.30s/it] 32%|██████████████████████████▋                                                        | 6470/20117 [4:03:30<8:42:21,  2.30s/it]                                                                                                                                 {'loss': 0.2162, 'grad_norm': 0.5113493204116821, 'learning_rate': 0.0001540646818876799, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 311.21, 'epoch': 0.64}
 32%|██████████████████████████▋                                                        | 6470/20117 [4:03:30<8:42:21,  2.30s/it] 32%|██████████████████████████▋                                                        | 6471/20117 [4:03:32<8:42:55,  2.30s/it] 32%|██████████████████████████▋                                                        | 6472/20117 [4:03:34<8:38:51,  2.28s/it] 32%|██████████████████████████▋                                                        | 6473/20117 [4:03:36<8:31:46,  2.25s/it] 32%|██████████████████████████▋                                                        | 6474/20117 [4:03:39<8:30:14,  2.24s/it] 32%|██████████████████████████▋                                                        | 6475/20117 [4:03:41<8:34:51,  2.26s/it] 32%|██████████████████████████▋                                                        | 6476/20117 [4:03:43<8:40:00,  2.29s/it] 32%|██████████████████████████▋                                                        | 6477/20117 [4:03:46<8:40:13,  2.29s/it] 32%|██████████████████████████▋                                                        | 6478/20117 [4:03:48<8:39:23,  2.28s/it] 32%|██████████████████████████▋                                                        | 6479/20117 [4:03:50<8:40:06,  2.29s/it] 32%|██████████████████████████▋                                                        | 6480/20117 [4:03:53<8:55:40,  2.36s/it]                                                                                                                                 {'loss': 0.1879, 'grad_norm': 0.3097884953022003, 'learning_rate': 0.0001539325843917478, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 320.21, 'epoch': 0.64}
 32%|██████████████████████████▋                                                        | 6480/20117 [4:03:53<8:55:40,  2.36s/it] 32%|██████████████████████████▋                                                        | 6481/20117 [4:03:55<8:48:30,  2.33s/it] 32%|██████████████████████████▋                                                        | 6482/20117 [4:03:57<8:44:24,  2.31s/it] 32%|██████████████████████████▋                                                        | 6483/20117 [4:03:59<8:43:15,  2.30s/it] 32%|██████████████████████████▊                                                        | 6484/20117 [4:04:02<8:38:36,  2.28s/it] 32%|██████████████████████████▊                                                        | 6485/20117 [4:04:04<8:37:39,  2.28s/it] 32%|██████████████████████████▊                                                        | 6486/20117 [4:04:06<8:33:45,  2.26s/it] 32%|██████████████████████████▊                                                        | 6487/20117 [4:04:08<8:31:06,  2.25s/it] 32%|██████████████████████████▊                                                        | 6488/20117 [4:04:11<8:30:03,  2.25s/it] 32%|██████████████████████████▊                                                        | 6489/20117 [4:04:13<8:31:16,  2.25s/it] 32%|██████████████████████████▊                                                        | 6490/20117 [4:04:15<8:31:05,  2.25s/it]                                                                                                                                 {'loss': 0.217, 'grad_norm': 0.32837173342704773, 'learning_rate': 0.0001538003540484614, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.68, 'epoch': 0.65}
 32%|██████████████████████████▊                                                        | 6490/20117 [4:04:15<8:31:05,  2.25s/it] 32%|██████████████████████████▊                                                        | 6491/20117 [4:04:17<8:26:18,  2.23s/it] 32%|██████████████████████████▊                                                        | 6492/20117 [4:04:20<8:25:36,  2.23s/it] 32%|██████████████████████████▊                                                        | 6493/20117 [4:04:22<8:24:42,  2.22s/it] 32%|██████████████████████████▊                                                        | 6494/20117 [4:04:24<8:27:02,  2.23s/it] 32%|██████████████████████████▊                                                        | 6495/20117 [4:04:26<8:28:39,  2.24s/it] 32%|██████████████████████████▊                                                        | 6496/20117 [4:04:29<8:49:55,  2.33s/it] 32%|██████████████████████████▊                                                        | 6497/20117 [4:04:31<8:43:40,  2.31s/it] 32%|██████████████████████████▊                                                        | 6498/20117 [4:04:33<8:43:09,  2.30s/it] 32%|██████████████████████████▊                                                        | 6499/20117 [4:04:36<8:39:48,  2.29s/it] 32%|██████████████████████████▊                                                        | 6500/20117 [4:04:38<8:34:01,  2.26s/it]                                                                                                                                 {'loss': 0.2531, 'grad_norm': 0.519063413143158, 'learning_rate': 0.00015366799118353202, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.3, 'epoch': 0.65}
 32%|██████████████████████████▊                                                        | 6500/20117 [4:04:38<8:34:01,  2.26s/it] 32%|██████████████████████████▊                                                        | 6501/20117 [4:04:40<8:34:47,  2.27s/it] 32%|██████████████████████████▊                                                        | 6502/20117 [4:04:42<8:30:05,  2.25s/it] 32%|██████████████████████████▊                                                        | 6503/20117 [4:04:45<8:36:19,  2.28s/it] 32%|██████████████████████████▊                                                        | 6504/20117 [4:04:47<8:43:20,  2.31s/it] 32%|██████████████████████████▊                                                        | 6505/20117 [4:04:49<8:37:49,  2.28s/it] 32%|██████████████████████████▊                                                        | 6506/20117 [4:04:52<8:40:22,  2.29s/it] 32%|██████████████████████████▊                                                        | 6507/20117 [4:04:54<8:40:12,  2.29s/it] 32%|██████████████████████████▊                                                        | 6508/20117 [4:04:56<8:37:48,  2.28s/it] 32%|██████████████████████████▊                                                        | 6509/20117 [4:04:58<8:37:08,  2.28s/it] 32%|██████████████████████████▊                                                        | 6510/20117 [4:05:01<8:35:59,  2.28s/it]                                                                                                                                 {'loss': 0.291, 'grad_norm': 0.3581913113594055, 'learning_rate': 0.0001535354961229974, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 389.02, 'epoch': 0.65}
 32%|██████████████████████████▊                                                        | 6510/20117 [4:05:01<8:35:59,  2.28s/it] 32%|██████████████████████████▊                                                        | 6511/20117 [4:05:03<8:36:05,  2.28s/it] 32%|██████████████████████████▊                                                        | 6512/20117 [4:05:05<8:34:27,  2.27s/it] 32%|██████████████████████████▊                                                        | 6513/20117 [4:05:07<8:32:22,  2.26s/it] 32%|██████████████████████████▉                                                        | 6514/20117 [4:05:10<8:30:45,  2.25s/it] 32%|██████████████████████████▉                                                        | 6515/20117 [4:05:12<8:35:04,  2.27s/it] 32%|██████████████████████████▉                                                        | 6516/20117 [4:05:14<8:32:18,  2.26s/it] 32%|██████████████████████████▉                                                        | 6517/20117 [4:05:16<8:29:32,  2.25s/it] 32%|██████████████████████████▉                                                        | 6518/20117 [4:05:19<8:30:58,  2.25s/it] 32%|██████████████████████████▉                                                        | 6519/20117 [4:05:21<8:28:38,  2.24s/it] 32%|██████████████████████████▉                                                        | 6520/20117 [4:05:23<8:35:44,  2.28s/it]                                                                                                                                 {'loss': 0.2409, 'grad_norm': 0.3630671799182892, 'learning_rate': 0.0001534028691932208, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 313.47, 'epoch': 0.65}
 32%|██████████████████████████▉                                                        | 6520/20117 [4:05:23<8:35:44,  2.28s/it] 32%|██████████████████████████▉                                                        | 6521/20117 [4:05:26<8:35:46,  2.28s/it] 32%|██████████████████████████▉                                                        | 6522/20117 [4:05:28<8:34:24,  2.27s/it] 32%|██████████████████████████▉                                                        | 6523/20117 [4:05:30<8:35:22,  2.27s/it] 32%|██████████████████████████▉                                                        | 6524/20117 [4:05:32<8:33:10,  2.27s/it] 32%|██████████████████████████▉                                                        | 6525/20117 [4:05:35<8:33:18,  2.27s/it] 32%|██████████████████████████▉                                                        | 6526/20117 [4:05:37<8:43:33,  2.31s/it] 32%|██████████████████████████▉                                                        | 6527/20117 [4:05:39<8:37:18,  2.28s/it] 32%|██████████████████████████▉                                                        | 6528/20117 [4:05:42<8:37:24,  2.28s/it] 32%|██████████████████████████▉                                                        | 6529/20117 [4:05:44<8:32:53,  2.26s/it] 32%|██████████████████████████▉                                                        | 6530/20117 [4:05:46<8:27:44,  2.24s/it]                                                                                                                                 {'loss': 0.2133, 'grad_norm': 0.3670892119407654, 'learning_rate': 0.00015327011072089044, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 290.75, 'epoch': 0.65}
 32%|██████████████████████████▉                                                        | 6530/20117 [4:05:46<8:27:44,  2.24s/it] 32%|██████████████████████████▉                                                        | 6531/20117 [4:05:48<8:26:04,  2.23s/it] 32%|██████████████████████████▉                                                        | 6532/20117 [4:05:50<8:23:57,  2.23s/it] 32%|██████████████████████████▉                                                        | 6533/20117 [4:05:53<8:56:08,  2.37s/it] 32%|██████████████████████████▉                                                        | 6534/20117 [4:05:55<8:56:56,  2.37s/it] 32%|██████████████████████████▉                                                        | 6535/20117 [4:05:58<8:46:02,  2.32s/it] 32%|██████████████████████████▉                                                        | 6536/20117 [4:06:00<8:44:16,  2.32s/it] 32%|██████████████████████████▉                                                        | 6537/20117 [4:06:02<8:40:21,  2.30s/it] 32%|██████████████████████████▉                                                        | 6538/20117 [4:06:04<8:36:39,  2.28s/it] 33%|██████████████████████████▉                                                        | 6539/20117 [4:06:07<8:36:18,  2.28s/it] 33%|██████████████████████████▉                                                        | 6540/20117 [4:06:09<8:38:43,  2.29s/it]                                                                                                                                 {'loss': 0.27, 'grad_norm': 0.40198561549186707, 'learning_rate': 0.00015313722103301852, 'memory/max_active (GiB)': 18.85, 'memory/max_allocated (GiB)': 18.85, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 429.99, 'epoch': 0.65}
 33%|██████████████████████████▉                                                        | 6540/20117 [4:06:09<8:38:43,  2.29s/it] 33%|██████████████████████████▉                                                        | 6541/20117 [4:06:11<8:39:30,  2.30s/it] 33%|██████████████████████████▉                                                        | 6542/20117 [4:06:14<8:43:03,  2.31s/it] 33%|██████████████████████████▉                                                        | 6543/20117 [4:06:16<8:35:59,  2.28s/it] 33%|██████████████████████████▉                                                        | 6544/20117 [4:06:18<8:35:13,  2.28s/it] 33%|███████████████████████████                                                        | 6545/20117 [4:06:20<8:34:40,  2.28s/it] 33%|███████████████████████████                                                        | 6546/20117 [4:06:23<8:30:56,  2.26s/it] 33%|███████████████████████████                                                        | 6547/20117 [4:06:25<8:30:14,  2.26s/it] 33%|███████████████████████████                                                        | 6548/20117 [4:06:27<8:30:05,  2.26s/it] 33%|███████████████████████████                                                        | 6549/20117 [4:06:29<8:26:42,  2.24s/it] 33%|███████████████████████████                                                        | 6550/20117 [4:06:32<8:27:41,  2.25s/it]                                                                                                                                 {'loss': 0.1676, 'grad_norm': 0.3494684398174286, 'learning_rate': 0.00015300420045694034, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 368.89, 'epoch': 0.65}
 33%|███████████████████████████                                                        | 6550/20117 [4:06:32<8:27:41,  2.25s/it] 33%|███████████████████████████                                                        | 6551/20117 [4:06:34<8:28:06,  2.25s/it] 33%|███████████████████████████                                                        | 6552/20117 [4:06:36<8:31:53,  2.26s/it] 33%|███████████████████████████                                                        | 6553/20117 [4:06:38<8:28:53,  2.25s/it] 33%|███████████████████████████                                                        | 6554/20117 [4:06:41<8:34:45,  2.28s/it] 33%|███████████████████████████                                                        | 6555/20117 [4:06:43<8:33:29,  2.27s/it] 33%|███████████████████████████                                                        | 6556/20117 [4:06:45<8:36:10,  2.28s/it] 33%|███████████████████████████                                                        | 6557/20117 [4:06:48<8:34:31,  2.28s/it] 33%|███████████████████████████                                                        | 6558/20117 [4:06:50<8:32:37,  2.27s/it] 33%|███████████████████████████                                                        | 6559/20117 [4:06:52<8:39:14,  2.30s/it] 33%|███████████████████████████                                                        | 6560/20117 [4:06:54<8:37:58,  2.29s/it]                                                                                                                                 {'loss': 0.2585, 'grad_norm': 0.42560404539108276, 'learning_rate': 0.00015287104932031374, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 317.83, 'epoch': 0.65}
 33%|███████████████████████████                                                        | 6560/20117 [4:06:54<8:37:58,  2.29s/it] 33%|███████████████████████████                                                        | 6561/20117 [4:06:57<8:34:44,  2.28s/it] 33%|███████████████████████████                                                        | 6562/20117 [4:06:59<8:36:26,  2.29s/it] 33%|███████████████████████████                                                        | 6563/20117 [4:07:01<8:37:59,  2.29s/it] 33%|███████████████████████████                                                        | 6564/20117 [4:07:04<8:37:06,  2.29s/it] 33%|███████████████████████████                                                        | 6565/20117 [4:07:06<8:34:07,  2.28s/it] 33%|███████████████████████████                                                        | 6566/20117 [4:07:08<8:32:14,  2.27s/it] 33%|███████████████████████████                                                        | 6567/20117 [4:07:11<8:40:13,  2.30s/it] 33%|███████████████████████████                                                        | 6568/20117 [4:07:13<8:37:33,  2.29s/it] 33%|███████████████████████████                                                        | 6569/20117 [4:07:15<8:33:44,  2.28s/it] 33%|███████████████████████████                                                        | 6570/20117 [4:07:17<8:36:24,  2.29s/it]                                                                                                                                 {'loss': 0.2129, 'grad_norm': 0.511513352394104, 'learning_rate': 0.00015273776795111813, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.26, 'epoch': 0.65}
 33%|███████████████████████████                                                        | 6570/20117 [4:07:17<8:36:24,  2.29s/it] 33%|███████████████████████████                                                        | 6571/20117 [4:07:20<8:32:47,  2.27s/it] 33%|███████████████████████████                                                        | 6572/20117 [4:07:22<8:30:51,  2.26s/it] 33%|███████████████████████████                                                        | 6573/20117 [4:07:24<8:30:11,  2.26s/it] 33%|███████████████████████████                                                        | 6574/20117 [4:07:26<8:28:23,  2.25s/it] 33%|███████████████████████████▏                                                       | 6575/20117 [4:07:29<8:27:59,  2.25s/it] 33%|███████████████████████████▏                                                       | 6576/20117 [4:07:31<8:27:33,  2.25s/it] 33%|███████████████████████████▏                                                       | 6577/20117 [4:07:33<8:32:30,  2.27s/it] 33%|███████████████████████████▏                                                       | 6578/20117 [4:07:35<8:28:30,  2.25s/it] 33%|███████████████████████████▏                                                       | 6579/20117 [4:07:38<8:25:52,  2.24s/it] 33%|███████████████████████████▏                                                       | 6580/20117 [4:07:40<8:21:40,  2.22s/it]                                                                                                                                 {'loss': 0.2674, 'grad_norm': 0.3022279441356659, 'learning_rate': 0.00015260435667765364, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 332.65, 'epoch': 0.65}
 33%|███████████████████████████▏                                                       | 6580/20117 [4:07:40<8:21:40,  2.22s/it] 33%|███████████████████████████▏                                                       | 6581/20117 [4:07:42<8:24:37,  2.24s/it] 33%|███████████████████████████▏                                                       | 6582/20117 [4:07:44<8:27:26,  2.25s/it] 33%|███████████████████████████▏                                                       | 6583/20117 [4:07:46<8:25:19,  2.24s/it] 33%|███████████████████████████▏                                                       | 6584/20117 [4:07:49<8:29:06,  2.26s/it] 33%|███████████████████████████▏                                                       | 6585/20117 [4:07:51<8:29:40,  2.26s/it] 33%|███████████████████████████▏                                                       | 6586/20117 [4:07:53<8:27:47,  2.25s/it] 33%|███████████████████████████▏                                                       | 6587/20117 [4:07:56<8:50:44,  2.35s/it] 33%|███████████████████████████▏                                                       | 6588/20117 [4:07:58<8:44:23,  2.33s/it] 33%|███████████████████████████▏                                                       | 6589/20117 [4:08:00<8:43:11,  2.32s/it] 33%|███████████████████████████▏                                                       | 6590/20117 [4:08:03<8:37:19,  2.29s/it]                                                                                                                                 {'loss': 0.2512, 'grad_norm': 0.3808051347732544, 'learning_rate': 0.00015247081582854053, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.8, 'epoch': 0.66}
 33%|███████████████████████████▏                                                       | 6590/20117 [4:08:03<8:37:19,  2.29s/it] 33%|███████████████████████████▏                                                       | 6591/20117 [4:08:05<8:39:40,  2.31s/it] 33%|███████████████████████████▏                                                       | 6592/20117 [4:08:07<8:42:45,  2.32s/it] 33%|███████████████████████████▏                                                       | 6593/20117 [4:08:10<8:45:46,  2.33s/it] 33%|███████████████████████████▏                                                       | 6594/20117 [4:08:12<8:43:57,  2.32s/it] 33%|███████████████████████████▏                                                       | 6595/20117 [4:08:14<8:41:07,  2.31s/it] 33%|███████████████████████████▏                                                       | 6596/20117 [4:08:17<8:36:24,  2.29s/it] 33%|███████████████████████████▏                                                       | 6597/20117 [4:08:19<8:34:39,  2.28s/it] 33%|███████████████████████████▏                                                       | 6598/20117 [4:08:21<8:32:51,  2.28s/it] 33%|███████████████████████████▏                                                       | 6599/20117 [4:08:23<8:36:31,  2.29s/it] 33%|███████████████████████████▏                                                       | 6600/20117 [4:08:26<8:35:20,  2.29s/it]                                                                                                                                 {'loss': 0.2376, 'grad_norm': 0.4839475154876709, 'learning_rate': 0.00015233714573271802, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 329.54, 'epoch': 0.66}
 33%|███████████████████████████▏                                                       | 6600/20117 [4:08:26<8:35:20,  2.29s/it] 33%|███████████████████████████▏                                                       | 6601/20117 [4:08:28<8:34:00,  2.28s/it] 33%|███████████████████████████▏                                                       | 6602/20117 [4:08:30<8:36:33,  2.29s/it] 33%|███████████████████████████▏                                                       | 6603/20117 [4:08:33<8:40:59,  2.31s/it] 33%|███████████████████████████▏                                                       | 6604/20117 [4:08:35<8:45:17,  2.33s/it] 33%|███████████████████████████▎                                                       | 6605/20117 [4:08:37<8:43:45,  2.33s/it] 33%|███████████████████████████▎                                                       | 6606/20117 [4:08:40<8:41:11,  2.31s/it] 33%|███████████████████████████▎                                                       | 6607/20117 [4:08:42<8:46:35,  2.34s/it] 33%|███████████████████████████▎                                                       | 6608/20117 [4:08:44<8:47:53,  2.34s/it] 33%|███████████████████████████▎                                                       | 6609/20117 [4:08:47<8:44:15,  2.33s/it] 33%|███████████████████████████▎                                                       | 6610/20117 [4:08:49<8:43:45,  2.33s/it]                                                                                                                                 {'loss': 0.2289, 'grad_norm': 0.7145663499832153, 'learning_rate': 0.0001522033467194439, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.02, 'epoch': 0.66}
 33%|███████████████████████████▎                                                       | 6610/20117 [4:08:49<8:43:45,  2.33s/it] 33%|███████████████████████████▎                                                       | 6611/20117 [4:08:51<8:42:41,  2.32s/it] 33%|███████████████████████████▎                                                       | 6612/20117 [4:08:54<8:39:57,  2.31s/it] 33%|███████████████████████████▎                                                       | 6613/20117 [4:08:56<8:41:41,  2.32s/it] 33%|███████████████████████████▎                                                       | 6614/20117 [4:08:58<8:37:08,  2.30s/it] 33%|███████████████████████████▎                                                       | 6615/20117 [4:09:00<8:33:27,  2.28s/it] 33%|███████████████████████████▎                                                       | 6616/20117 [4:09:03<8:39:21,  2.31s/it] 33%|███████████████████████████▎                                                       | 6617/20117 [4:09:05<8:38:05,  2.30s/it] 33%|███████████████████████████▎                                                       | 6618/20117 [4:09:07<8:35:08,  2.29s/it] 33%|███████████████████████████▎                                                       | 6619/20117 [4:09:10<8:35:22,  2.29s/it] 33%|███████████████████████████▎                                                       | 6620/20117 [4:09:12<8:36:04,  2.29s/it]                                                                                                                                 {'loss': 0.2619, 'grad_norm': 0.4483419358730316, 'learning_rate': 0.00015206941911829336, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 344.26, 'epoch': 0.66}
 33%|███████████████████████████▎                                                       | 6620/20117 [4:09:12<8:36:04,  2.29s/it] 33%|███████████████████████████▎                                                       | 6621/20117 [4:09:14<8:35:29,  2.29s/it] 33%|███████████████████████████▎                                                       | 6622/20117 [4:09:16<8:32:46,  2.28s/it] 33%|███████████████████████████▎                                                       | 6623/20117 [4:09:19<8:34:47,  2.29s/it] 33%|███████████████████████████▎                                                       | 6624/20117 [4:09:21<8:33:25,  2.28s/it] 33%|███████████████████████████▎                                                       | 6625/20117 [4:09:23<8:34:31,  2.29s/it] 33%|███████████████████████████▎                                                       | 6626/20117 [4:09:26<8:30:56,  2.27s/it] 33%|███████████████████████████▎                                                       | 6627/20117 [4:09:28<8:32:30,  2.28s/it] 33%|███████████████████████████▎                                                       | 6628/20117 [4:09:30<8:27:44,  2.26s/it] 33%|███████████████████████████▎                                                       | 6629/20117 [4:09:32<8:30:48,  2.27s/it] 33%|███████████████████████████▎                                                       | 6630/20117 [4:09:35<8:29:23,  2.27s/it]                                                                                                                                 {'loss': 0.3162, 'grad_norm': 0.7042835354804993, 'learning_rate': 0.00015193536325915842, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 325.39, 'epoch': 0.66}
 33%|███████████████████████████▎                                                       | 6630/20117 [4:09:35<8:29:23,  2.27s/it] 33%|███████████████████████████▎                                                       | 6631/20117 [4:09:37<8:31:09,  2.27s/it] 33%|███████████████████████████▎                                                       | 6632/20117 [4:09:39<8:29:09,  2.27s/it] 33%|███████████████████████████▎                                                       | 6633/20117 [4:09:41<8:29:02,  2.27s/it] 33%|███████████████████████████▎                                                       | 6634/20117 [4:09:44<8:30:01,  2.27s/it] 33%|███████████████████████████▍                                                       | 6635/20117 [4:09:46<8:33:14,  2.28s/it] 33%|███████████████████████████▍                                                       | 6636/20117 [4:09:48<8:28:56,  2.27s/it] 33%|███████████████████████████▍                                                       | 6637/20117 [4:09:50<8:27:52,  2.26s/it] 33%|███████████████████████████▍                                                       | 6638/20117 [4:09:53<8:28:51,  2.27s/it] 33%|███████████████████████████▍                                                       | 6639/20117 [4:09:55<8:52:44,  2.37s/it] 33%|███████████████████████████▍                                                       | 6640/20117 [4:09:58<8:44:17,  2.33s/it]                                                                                                                                 {'loss': 0.1955, 'grad_norm': 0.44085246324539185, 'learning_rate': 0.00015180117947224698, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 356.35, 'epoch': 0.66}
 33%|███████████████████████████▍                                                       | 6640/20117 [4:09:58<8:44:17,  2.33s/it] 33%|███████████████████████████▍                                                       | 6641/20117 [4:10:00<8:41:11,  2.32s/it] 33%|███████████████████████████▍                                                       | 6642/20117 [4:10:02<8:42:34,  2.33s/it] 33%|███████████████████████████▍                                                       | 6643/20117 [4:10:04<8:34:46,  2.29s/it] 33%|███████████████████████████▍                                                       | 6644/20117 [4:10:07<8:30:20,  2.27s/it] 33%|███████████████████████████▍                                                       | 6645/20117 [4:10:09<8:27:02,  2.26s/it] 33%|███████████████████████████▍                                                       | 6646/20117 [4:10:11<8:23:51,  2.24s/it] 33%|███████████████████████████▍                                                       | 6647/20117 [4:10:13<8:21:30,  2.23s/it] 33%|███████████████████████████▍                                                       | 6648/20117 [4:10:16<8:23:17,  2.24s/it] 33%|███████████████████████████▍                                                       | 6649/20117 [4:10:18<8:22:57,  2.24s/it] 33%|███████████████████████████▍                                                       | 6650/20117 [4:10:20<8:23:24,  2.24s/it]                                                                                                                                 {'loss': 0.2302, 'grad_norm': 0.32135269045829773, 'learning_rate': 0.00015166686808808208, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 401.75, 'epoch': 0.66}
 33%|███████████████████████████▍                                                       | 6650/20117 [4:10:20<8:23:24,  2.24s/it] 33%|███████████████████████████▍                                                       | 6651/20117 [4:10:22<8:24:40,  2.25s/it] 33%|███████████████████████████▍                                                       | 6652/20117 [4:10:25<8:26:15,  2.26s/it] 33%|███████████████████████████▍                                                       | 6653/20117 [4:10:27<8:27:28,  2.26s/it] 33%|███████████████████████████▍                                                       | 6654/20117 [4:10:29<8:28:56,  2.27s/it] 33%|███████████████████████████▍                                                       | 6655/20117 [4:10:32<8:34:11,  2.29s/it] 33%|███████████████████████████▍                                                       | 6656/20117 [4:10:34<8:31:37,  2.28s/it] 33%|███████████████████████████▍                                                       | 6657/20117 [4:10:36<8:31:58,  2.28s/it] 33%|███████████████████████████▍                                                       | 6658/20117 [4:10:38<8:34:04,  2.29s/it] 33%|███████████████████████████▍                                                       | 6659/20117 [4:10:41<8:33:36,  2.29s/it] 33%|███████████████████████████▍                                                       | 6660/20117 [4:10:43<8:36:11,  2.30s/it]                                                                                                                                 {'loss': 0.251, 'grad_norm': 0.5171180367469788, 'learning_rate': 0.00015153242943750103, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 328.4, 'epoch': 0.66}
 33%|███████████████████████████▍                                                       | 6660/20117 [4:10:43<8:36:11,  2.30s/it] 33%|███████████████████████████▍                                                       | 6661/20117 [4:10:45<8:36:54,  2.30s/it] 33%|███████████████████████████▍                                                       | 6662/20117 [4:10:48<8:34:25,  2.29s/it] 33%|███████████████████████████▍                                                       | 6663/20117 [4:10:50<8:28:42,  2.27s/it] 33%|███████████████████████████▍                                                       | 6664/20117 [4:10:52<8:30:07,  2.28s/it] 33%|███████████████████████████▍                                                       | 6665/20117 [4:10:54<8:30:31,  2.28s/it] 33%|███████████████████████████▌                                                       | 6666/20117 [4:10:57<8:36:40,  2.30s/it] 33%|███████████████████████████▌                                                       | 6667/20117 [4:10:59<8:39:20,  2.32s/it] 33%|███████████████████████████▌                                                       | 6668/20117 [4:11:01<8:35:05,  2.30s/it] 33%|███████████████████████████▌                                                       | 6669/20117 [4:11:04<8:33:44,  2.29s/it] 33%|███████████████████████████▌                                                       | 6670/20117 [4:11:06<8:33:17,  2.29s/it]                                                                                                                                 {'loss': 0.2186, 'grad_norm': 0.5205950140953064, 'learning_rate': 0.00015139786385165462, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 325.18, 'epoch': 0.66}
 33%|███████████████████████████▌                                                       | 6670/20117 [4:11:06<8:33:17,  2.29s/it] 33%|███████████████████████████▌                                                       | 6671/20117 [4:11:08<8:34:36,  2.30s/it] 33%|███████████████████████████▌                                                       | 6672/20117 [4:11:11<8:34:04,  2.29s/it] 33%|███████████████████████████▌                                                       | 6673/20117 [4:11:13<8:35:53,  2.30s/it] 33%|███████████████████████████▌                                                       | 6674/20117 [4:11:15<8:35:41,  2.30s/it] 33%|███████████████████████████▌                                                       | 6675/20117 [4:11:17<8:31:18,  2.28s/it] 33%|███████████████████████████▌                                                       | 6676/20117 [4:11:20<8:32:50,  2.29s/it] 33%|███████████████████████████▌                                                       | 6677/20117 [4:11:22<8:35:22,  2.30s/it] 33%|███████████████████████████▌                                                       | 6678/20117 [4:11:24<8:30:23,  2.28s/it] 33%|███████████████████████████▌                                                       | 6679/20117 [4:11:27<8:35:11,  2.30s/it] 33%|███████████████████████████▌                                                       | 6680/20117 [4:11:29<8:39:37,  2.32s/it]                                                                                                                                 {'loss': 0.1604, 'grad_norm': 0.31780245900154114, 'learning_rate': 0.0001512631716620064, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.84, 'epoch': 0.66}
 33%|███████████████████████████▌                                                       | 6680/20117 [4:11:29<8:39:37,  2.32s/it] 33%|███████████████████████████▌                                                       | 6681/20117 [4:11:31<8:38:28,  2.32s/it] 33%|███████████████████████████▌                                                       | 6682/20117 [4:11:34<8:35:44,  2.30s/it] 33%|███████████████████████████▌                                                       | 6683/20117 [4:11:36<8:32:33,  2.29s/it] 33%|███████████████████████████▌                                                       | 6684/20117 [4:11:38<8:30:40,  2.28s/it] 33%|███████████████████████████▌                                                       | 6685/20117 [4:11:40<8:28:58,  2.27s/it] 33%|███████████████████████████▌                                                       | 6686/20117 [4:11:43<8:29:23,  2.28s/it] 33%|███████████████████████████▌                                                       | 6687/20117 [4:11:45<8:34:04,  2.30s/it] 33%|███████████████████████████▌                                                       | 6688/20117 [4:11:47<8:28:37,  2.27s/it] 33%|███████████████████████████▌                                                       | 6689/20117 [4:11:49<8:26:36,  2.26s/it] 33%|███████████████████████████▌                                                       | 6690/20117 [4:11:52<8:30:35,  2.28s/it]                                                                                                                                 {'loss': 0.266, 'grad_norm': 0.29278233647346497, 'learning_rate': 0.00015112835320033163, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 408.19, 'epoch': 0.67}
 33%|███████████████████████████▌                                                       | 6690/20117 [4:11:52<8:30:35,  2.28s/it] 33%|███████████████████████████▌                                                       | 6691/20117 [4:11:54<8:48:23,  2.36s/it] 33%|███████████████████████████▌                                                       | 6692/20117 [4:11:57<8:45:47,  2.35s/it] 33%|███████████████████████████▌                                                       | 6693/20117 [4:11:59<8:36:40,  2.31s/it] 33%|███████████████████████████▌                                                       | 6694/20117 [4:12:01<8:34:49,  2.30s/it] 33%|███████████████████████████▌                                                       | 6695/20117 [4:12:03<8:36:47,  2.31s/it] 33%|███████████████████████████▋                                                       | 6696/20117 [4:12:06<8:30:30,  2.28s/it] 33%|███████████████████████████▋                                                       | 6697/20117 [4:12:08<8:29:22,  2.28s/it] 33%|███████████████████████████▋                                                       | 6698/20117 [4:12:10<8:32:23,  2.29s/it] 33%|███████████████████████████▋                                                       | 6699/20117 [4:12:13<8:34:37,  2.30s/it] 33%|███████████████████████████▋                                                       | 6700/20117 [4:12:15<8:35:41,  2.31s/it]                                                                                                                                 {'loss': 0.1933, 'grad_norm': 0.47382065653800964, 'learning_rate': 0.00015099340879871668, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 302.45, 'epoch': 0.67}
 33%|███████████████████████████▋                                                       | 6700/20117 [4:12:15<8:35:41,  2.31s/it] 33%|███████████████████████████▋                                                       | 6701/20117 [4:12:17<8:32:35,  2.29s/it] 33%|███████████████████████████▋                                                       | 6702/20117 [4:12:19<8:27:54,  2.27s/it] 33%|███████████████████████████▋                                                       | 6703/20117 [4:12:22<8:28:43,  2.28s/it] 33%|███████████████████████████▋                                                       | 6704/20117 [4:12:24<8:31:49,  2.29s/it] 33%|███████████████████████████▋                                                       | 6705/20117 [4:12:26<8:29:54,  2.28s/it] 33%|███████████████████████████▋                                                       | 6706/20117 [4:12:29<8:32:02,  2.29s/it] 33%|███████████████████████████▋                                                       | 6707/20117 [4:12:31<8:27:54,  2.27s/it] 33%|███████████████████████████▋                                                       | 6708/20117 [4:12:33<8:31:24,  2.29s/it] 33%|███████████████████████████▋                                                       | 6709/20117 [4:12:35<8:33:40,  2.30s/it] 33%|███████████████████████████▋                                                       | 6710/20117 [4:12:38<8:26:50,  2.27s/it]                                                                                                                                 {'loss': 0.2225, 'grad_norm': 0.3947311043739319, 'learning_rate': 0.00015085833878955823, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 418.2, 'epoch': 0.67}
 33%|███████████████████████████▋                                                       | 6710/20117 [4:12:38<8:26:50,  2.27s/it] 33%|███████████████████████████▋                                                       | 6711/20117 [4:12:40<8:32:11,  2.29s/it] 33%|███████████████████████████▋                                                       | 6712/20117 [4:12:42<8:29:50,  2.28s/it] 33%|███████████████████████████▋                                                       | 6713/20117 [4:12:44<8:29:45,  2.28s/it] 33%|███████████████████████████▋                                                       | 6714/20117 [4:12:47<8:29:05,  2.28s/it] 33%|███████████████████████████▋                                                       | 6715/20117 [4:12:49<8:34:34,  2.30s/it] 33%|███████████████████████████▋                                                       | 6716/20117 [4:12:51<8:34:25,  2.30s/it] 33%|███████████████████████████▋                                                       | 6717/20117 [4:12:54<8:32:46,  2.30s/it] 33%|███████████████████████████▋                                                       | 6718/20117 [4:12:56<8:26:49,  2.27s/it] 33%|███████████████████████████▋                                                       | 6719/20117 [4:12:58<8:25:58,  2.27s/it] 33%|███████████████████████████▋                                                       | 6720/20117 [4:13:00<8:24:38,  2.26s/it]                                                                                                                                 {'loss': 0.2056, 'grad_norm': 0.5490260720252991, 'learning_rate': 0.00015072314350556213, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.35, 'epoch': 0.67}
 33%|███████████████████████████▋                                                       | 6720/20117 [4:13:00<8:24:38,  2.26s/it] 33%|███████████████████████████▋                                                       | 6721/20117 [4:13:03<8:21:47,  2.25s/it] 33%|███████████████████████████▋                                                       | 6722/20117 [4:13:05<8:27:02,  2.27s/it] 33%|███████████████████████████▋                                                       | 6723/20117 [4:13:07<8:26:11,  2.27s/it] 33%|███████████████████████████▋                                                       | 6724/20117 [4:13:09<8:27:10,  2.27s/it] 33%|███████████████████████████▋                                                       | 6725/20117 [4:13:12<8:28:40,  2.28s/it] 33%|███████████████████████████▊                                                       | 6726/20117 [4:13:14<8:30:35,  2.29s/it] 33%|███████████████████████████▊                                                       | 6727/20117 [4:13:16<8:29:40,  2.28s/it] 33%|███████████████████████████▊                                                       | 6728/20117 [4:13:19<8:31:25,  2.29s/it] 33%|███████████████████████████▊                                                       | 6729/20117 [4:13:21<8:29:03,  2.28s/it] 33%|███████████████████████████▊                                                       | 6730/20117 [4:13:23<8:25:33,  2.27s/it]                                                                                                                                 {'loss': 0.2377, 'grad_norm': 0.412194162607193, 'learning_rate': 0.000150587823279743, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 304.73, 'epoch': 0.67}
 33%|███████████████████████████▊                                                       | 6730/20117 [4:13:23<8:25:33,  2.27s/it] 33%|███████████████████████████▊                                                       | 6731/20117 [4:13:26<8:30:25,  2.29s/it] 33%|███████████████████████████▊                                                       | 6732/20117 [4:13:28<8:35:19,  2.31s/it] 33%|███████████████████████████▊                                                       | 6733/20117 [4:13:30<8:36:05,  2.31s/it] 33%|███████████████████████████▊                                                       | 6734/20117 [4:13:32<8:33:28,  2.30s/it] 33%|███████████████████████████▊                                                       | 6735/20117 [4:13:35<8:35:34,  2.31s/it] 33%|███████████████████████████▊                                                       | 6736/20117 [4:13:37<8:33:07,  2.30s/it] 33%|███████████████████████████▊                                                       | 6737/20117 [4:13:39<8:29:27,  2.28s/it] 33%|███████████████████████████▊                                                       | 6738/20117 [4:13:42<8:27:02,  2.27s/it] 33%|███████████████████████████▊                                                       | 6739/20117 [4:13:44<8:25:19,  2.27s/it] 34%|███████████████████████████▊                                                       | 6740/20117 [4:13:46<8:30:25,  2.29s/it]                                                                                                                                 {'loss': 0.2622, 'grad_norm': 0.40393805503845215, 'learning_rate': 0.00015045237844542317, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.77, 'epoch': 0.67}
 34%|███████████████████████████▊                                                       | 6740/20117 [4:13:46<8:30:25,  2.29s/it] 34%|███████████████████████████▊                                                       | 6741/20117 [4:13:48<8:28:30,  2.28s/it] 34%|███████████████████████████▊                                                       | 6742/20117 [4:13:51<8:22:58,  2.26s/it] 34%|███████████████████████████▊                                                       | 6743/20117 [4:13:53<8:23:30,  2.26s/it] 34%|███████████████████████████▊                                                       | 6744/20117 [4:13:56<8:48:18,  2.37s/it] 34%|███████████████████████████▊                                                       | 6745/20117 [4:13:58<8:38:37,  2.33s/it] 34%|███████████████████████████▊                                                       | 6746/20117 [4:14:00<8:34:25,  2.31s/it] 34%|███████████████████████████▊                                                       | 6747/20117 [4:14:03<8:57:52,  2.41s/it] 34%|███████████████████████████▊                                                       | 6748/20117 [4:14:05<9:01:09,  2.43s/it] 34%|███████████████████████████▊                                                       | 6749/20117 [4:14:08<9:09:50,  2.47s/it] 34%|███████████████████████████▊                                                       | 6750/20117 [4:14:10<9:08:06,  2.46s/it]                                                                                                                                 {'loss': 0.3129, 'grad_norm': 0.5896100401878357, 'learning_rate': 0.00015031680933623188, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 304.37, 'epoch': 0.67}
 34%|███████████████████████████▊                                                       | 6750/20117 [4:14:10<9:08:06,  2.46s/it] 34%|███████████████████████████▊                                                       | 6751/20117 [4:14:12<8:58:16,  2.42s/it] 34%|███████████████████████████▊                                                       | 6752/20117 [4:14:15<8:54:19,  2.40s/it] 34%|███████████████████████████▊                                                       | 6753/20117 [4:14:17<8:48:12,  2.37s/it] 34%|███████████████████████████▊                                                       | 6754/20117 [4:14:19<8:47:12,  2.37s/it] 34%|███████████████████████████▊                                                       | 6755/20117 [4:14:22<8:40:32,  2.34s/it] 34%|███████████████████████████▊                                                       | 6756/20117 [4:14:24<8:36:44,  2.32s/it] 34%|███████████████████████████▉                                                       | 6757/20117 [4:14:26<8:38:11,  2.33s/it] 34%|███████████████████████████▉                                                       | 6758/20117 [4:14:29<8:35:59,  2.32s/it] 34%|███████████████████████████▉                                                       | 6759/20117 [4:14:31<8:35:18,  2.31s/it] 34%|███████████████████████████▉                                                       | 6760/20117 [4:14:33<8:39:23,  2.33s/it]                                                                                                                                 {'loss': 0.2704, 'grad_norm': 0.5198945999145508, 'learning_rate': 0.00015018111628610446, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.02, 'epoch': 0.67}
 34%|███████████████████████████▉                                                       | 6760/20117 [4:14:33<8:39:23,  2.33s/it] 34%|███████████████████████████▉                                                       | 6761/20117 [4:14:36<8:33:40,  2.31s/it] 34%|███████████████████████████▉                                                       | 6762/20117 [4:14:38<8:36:34,  2.32s/it] 34%|███████████████████████████▉                                                       | 6763/20117 [4:14:40<8:35:41,  2.32s/it] 34%|███████████████████████████▉                                                       | 6764/20117 [4:14:43<8:36:26,  2.32s/it] 34%|███████████████████████████▉                                                       | 6765/20117 [4:14:45<8:35:23,  2.32s/it] 34%|███████████████████████████▉                                                       | 6766/20117 [4:14:47<8:37:21,  2.33s/it] 34%|███████████████████████████▉                                                       | 6767/20117 [4:14:50<8:37:37,  2.33s/it] 34%|███████████████████████████▉                                                       | 6768/20117 [4:14:52<8:35:18,  2.32s/it] 34%|███████████████████████████▉                                                       | 6769/20117 [4:14:54<8:29:41,  2.29s/it] 34%|███████████████████████████▉                                                       | 6770/20117 [4:14:56<8:30:55,  2.30s/it]                                                                                                                                 {'loss': 0.2495, 'grad_norm': 0.32067760825157166, 'learning_rate': 0.00015004529962928164, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.15, 'epoch': 0.67}
 34%|███████████████████████████▉                                                       | 6770/20117 [4:14:56<8:30:55,  2.30s/it] 34%|███████████████████████████▉                                                       | 6771/20117 [4:14:59<8:26:52,  2.28s/it] 34%|███████████████████████████▉                                                       | 6772/20117 [4:15:01<8:30:04,  2.29s/it] 34%|███████████████████████████▉                                                       | 6773/20117 [4:15:03<8:32:23,  2.30s/it] 34%|███████████████████████████▉                                                       | 6774/20117 [4:15:06<8:30:25,  2.30s/it] 34%|███████████████████████████▉                                                       | 6775/20117 [4:15:08<8:33:18,  2.31s/it] 34%|███████████████████████████▉                                                       | 6776/20117 [4:15:10<8:35:16,  2.32s/it] 34%|███████████████████████████▉                                                       | 6777/20117 [4:15:13<8:34:04,  2.31s/it] 34%|███████████████████████████▉                                                       | 6778/20117 [4:15:15<8:33:24,  2.31s/it] 34%|███████████████████████████▉                                                       | 6779/20117 [4:15:17<8:33:10,  2.31s/it] 34%|███████████████████████████▉                                                       | 6780/20117 [4:15:19<8:32:48,  2.31s/it]                                                                                                                                 {'loss': 0.2095, 'grad_norm': 0.49704423546791077, 'learning_rate': 0.0001499093597003085, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 324.06, 'epoch': 0.67}
 34%|███████████████████████████▉                                                       | 6780/20117 [4:15:19<8:32:48,  2.31s/it] 34%|███████████████████████████▉                                                       | 6781/20117 [4:15:22<8:33:22,  2.31s/it] 34%|███████████████████████████▉                                                       | 6782/20117 [4:15:24<8:32:48,  2.31s/it] 34%|███████████████████████████▉                                                       | 6783/20117 [4:15:26<8:37:54,  2.33s/it] 34%|███████████████████████████▉                                                       | 6784/20117 [4:15:29<8:34:03,  2.31s/it] 34%|███████████████████████████▉                                                       | 6785/20117 [4:15:31<8:30:05,  2.30s/it] 34%|███████████████████████████▉                                                       | 6786/20117 [4:15:33<8:30:29,  2.30s/it] 34%|████████████████████████████                                                       | 6787/20117 [4:15:36<8:31:57,  2.30s/it] 34%|████████████████████████████                                                       | 6788/20117 [4:15:38<8:28:41,  2.29s/it] 34%|████████████████████████████                                                       | 6789/20117 [4:15:40<8:28:35,  2.29s/it] 34%|████████████████████████████                                                       | 6790/20117 [4:15:42<8:27:13,  2.28s/it]                                                                                                                                 {'loss': 0.1743, 'grad_norm': 0.42155733704566956, 'learning_rate': 0.00014977329683403385, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 309.09, 'epoch': 0.68}
 34%|████████████████████████████                                                       | 6790/20117 [4:15:42<8:27:13,  2.28s/it] 34%|████████████████████████████                                                       | 6791/20117 [4:15:45<8:32:08,  2.31s/it] 34%|████████████████████████████                                                       | 6792/20117 [4:15:47<8:36:11,  2.32s/it] 34%|████████████████████████████                                                       | 6793/20117 [4:15:50<8:40:58,  2.35s/it] 34%|████████████████████████████                                                       | 6794/20117 [4:15:52<8:53:29,  2.40s/it] 34%|████████████████████████████                                                       | 6795/20117 [4:15:54<8:45:56,  2.37s/it] 34%|████████████████████████████                                                       | 6796/20117 [4:15:57<8:41:34,  2.35s/it] 34%|████████████████████████████                                                       | 6797/20117 [4:15:59<8:43:13,  2.36s/it] 34%|████████████████████████████                                                       | 6798/20117 [4:16:02<9:07:28,  2.47s/it] 34%|████████████████████████████                                                       | 6799/20117 [4:16:04<8:58:57,  2.43s/it] 34%|████████████████████████████                                                       | 6800/20117 [4:16:06<8:51:52,  2.40s/it]                                                                                                                                 {'loss': 0.3424, 'grad_norm': 0.5538848638534546, 'learning_rate': 0.00014963711136560924, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.86, 'epoch': 0.68}
 34%|████████████████████████████                                                       | 6800/20117 [4:16:06<8:51:52,  2.40s/it] 34%|████████████████████████████                                                       | 6801/20117 [4:16:09<8:45:06,  2.37s/it] 34%|████████████████████████████                                                       | 6802/20117 [4:16:11<8:44:04,  2.36s/it] 34%|████████████████████████████                                                       | 6803/20117 [4:16:13<8:39:24,  2.34s/it] 34%|████████████████████████████                                                       | 6804/20117 [4:16:16<8:38:45,  2.34s/it] 34%|████████████████████████████                                                       | 6805/20117 [4:16:18<8:40:13,  2.34s/it] 34%|████████████████████████████                                                       | 6806/20117 [4:16:20<8:40:14,  2.35s/it] 34%|████████████████████████████                                                       | 6807/20117 [4:16:23<8:38:06,  2.34s/it] 34%|████████████████████████████                                                       | 6808/20117 [4:16:25<8:39:53,  2.34s/it] 34%|████████████████████████████                                                       | 6809/20117 [4:16:27<8:39:22,  2.34s/it] 34%|████████████████████████████                                                       | 6810/20117 [4:16:30<8:38:09,  2.34s/it]                                                                                                                                 {'loss': 0.2047, 'grad_norm': 0.3429434299468994, 'learning_rate': 0.00014950080363048833, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 341.24, 'epoch': 0.68}
 34%|████████████████████████████                                                       | 6810/20117 [4:16:30<8:38:09,  2.34s/it] 34%|████████████████████████████                                                       | 6811/20117 [4:16:32<8:46:35,  2.37s/it] 34%|████████████████████████████                                                       | 6812/20117 [4:16:35<8:46:14,  2.37s/it] 34%|████████████████████████████                                                       | 6813/20117 [4:16:37<8:52:01,  2.40s/it] 34%|████████████████████████████                                                       | 6814/20117 [4:16:39<8:47:37,  2.38s/it] 34%|████████████████████████████                                                       | 6815/20117 [4:16:42<8:48:25,  2.38s/it] 34%|████████████████████████████                                                       | 6816/20117 [4:16:44<8:47:22,  2.38s/it] 34%|████████████████████████████▏                                                      | 6817/20117 [4:16:46<8:41:19,  2.35s/it] 34%|████████████████████████████▏                                                      | 6818/20117 [4:16:49<8:39:30,  2.34s/it] 34%|████████████████████████████▏                                                      | 6819/20117 [4:16:51<8:37:13,  2.33s/it] 34%|████████████████████████████▏                                                      | 6820/20117 [4:16:53<8:38:02,  2.34s/it]                                                                                                                                 {'loss': 0.187, 'grad_norm': 0.39259403944015503, 'learning_rate': 0.0001493643739644258, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 367.18, 'epoch': 0.68}
 34%|████████████████████████████▏                                                      | 6820/20117 [4:16:53<8:38:02,  2.34s/it] 34%|████████████████████████████▏                                                      | 6821/20117 [4:16:56<8:35:19,  2.33s/it] 34%|████████████████████████████▏                                                      | 6822/20117 [4:16:58<8:43:46,  2.36s/it] 34%|████████████████████████████▏                                                      | 6823/20117 [4:17:01<8:43:32,  2.36s/it] 34%|████████████████████████████▏                                                      | 6824/20117 [4:17:03<8:42:39,  2.36s/it] 34%|████████████████████████████▏                                                      | 6825/20117 [4:17:05<8:38:31,  2.34s/it] 34%|████████████████████████████▏                                                      | 6826/20117 [4:17:07<8:35:28,  2.33s/it] 34%|████████████████████████████▏                                                      | 6827/20117 [4:17:10<8:35:06,  2.33s/it] 34%|████████████████████████████▏                                                      | 6828/20117 [4:17:12<8:35:00,  2.33s/it] 34%|████████████████████████████▏                                                      | 6829/20117 [4:17:14<8:36:20,  2.33s/it] 34%|████████████████████████████▏                                                      | 6830/20117 [4:17:17<8:33:32,  2.32s/it]                                                                                                                                 {'loss': 0.236, 'grad_norm': 0.37642526626586914, 'learning_rate': 0.00014922782270347686, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 324.51, 'epoch': 0.68}
 34%|████████████████████████████▏                                                      | 6830/20117 [4:17:17<8:33:32,  2.32s/it] 34%|████████████████████████████▏                                                      | 6831/20117 [4:17:19<8:30:18,  2.30s/it] 34%|████████████████████████████▏                                                      | 6832/20117 [4:17:21<8:24:25,  2.28s/it] 34%|████████████████████████████▏                                                      | 6833/20117 [4:17:23<8:19:55,  2.26s/it] 34%|████████████████████████████▏                                                      | 6834/20117 [4:17:26<8:18:08,  2.25s/it] 34%|████████████████████████████▏                                                      | 6835/20117 [4:17:28<8:15:18,  2.24s/it] 34%|████████████████████████████▏                                                      | 6836/20117 [4:17:30<8:18:19,  2.25s/it] 34%|████████████████████████████▏                                                      | 6837/20117 [4:17:32<8:20:18,  2.26s/it] 34%|████████████████████████████▏                                                      | 6838/20117 [4:17:35<8:24:14,  2.28s/it] 34%|████████████████████████████▏                                                      | 6839/20117 [4:17:37<8:32:25,  2.32s/it] 34%|████████████████████████████▏                                                      | 6840/20117 [4:17:40<8:37:27,  2.34s/it]                                                                                                                                 {'loss': 0.2494, 'grad_norm': 0.5826324820518494, 'learning_rate': 0.00014909115018399603, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.26, 'epoch': 0.68}
 34%|████████████████████████████▏                                                      | 6840/20117 [4:17:40<8:37:27,  2.34s/it] 34%|████████████████████████████▏                                                      | 6841/20117 [4:17:42<8:37:07,  2.34s/it] 34%|████████████████████████████▏                                                      | 6842/20117 [4:17:44<8:38:56,  2.35s/it] 34%|████████████████████████████▏                                                      | 6843/20117 [4:17:47<8:37:57,  2.34s/it] 34%|████████████████████████████▏                                                      | 6844/20117 [4:17:49<8:38:36,  2.34s/it] 34%|████████████████████████████▏                                                      | 6845/20117 [4:17:51<8:30:55,  2.31s/it] 34%|████████████████████████████▏                                                      | 6846/20117 [4:17:53<8:26:48,  2.29s/it] 34%|████████████████████████████▏                                                      | 6847/20117 [4:17:56<8:17:42,  2.25s/it] 34%|████████████████████████████▎                                                      | 6848/20117 [4:17:58<8:31:12,  2.31s/it] 34%|████████████████████████████▎                                                      | 6849/20117 [4:18:00<8:28:52,  2.30s/it] 34%|████████████████████████████▎                                                      | 6850/20117 [4:18:03<9:09:35,  2.49s/it]                                                                                                                                 {'loss': 0.2522, 'grad_norm': 0.39206662774086, 'learning_rate': 0.00014895435674263662, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 330.16, 'epoch': 0.68}
 34%|████████████████████████████▎                                                      | 6850/20117 [4:18:03<9:09:35,  2.49s/it] 34%|████████████████████████████▎                                                      | 6851/20117 [4:18:06<9:07:45,  2.48s/it] 34%|████████████████████████████▎                                                      | 6852/20117 [4:18:08<9:01:56,  2.45s/it] 34%|████████████████████████████▎                                                      | 6853/20117 [4:18:10<8:58:09,  2.43s/it] 34%|████████████████████████████▎                                                      | 6854/20117 [4:18:13<8:59:07,  2.44s/it] 34%|████████████████████████████▎                                                      | 6855/20117 [4:18:15<8:54:26,  2.42s/it] 34%|████████████████████████████▎                                                      | 6856/20117 [4:18:18<8:48:54,  2.39s/it] 34%|████████████████████████████▎                                                      | 6857/20117 [4:18:20<8:43:26,  2.37s/it] 34%|████████████████████████████▎                                                      | 6858/20117 [4:18:22<8:47:50,  2.39s/it] 34%|████████████████████████████▎                                                      | 6859/20117 [4:18:25<8:51:39,  2.41s/it] 34%|████████████████████████████▎                                                      | 6860/20117 [4:18:27<8:49:35,  2.40s/it]                                                                                                                                 {'loss': 0.2534, 'grad_norm': 0.21635837852954865, 'learning_rate': 0.00014881744271634986, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.9, 'epoch': 0.68}
 34%|████████████████████████████▎                                                      | 6860/20117 [4:18:27<8:49:35,  2.40s/it] 34%|████████████████████████████▎                                                      | 6861/20117 [4:18:30<8:49:47,  2.40s/it] 34%|████████████████████████████▎                                                      | 6862/20117 [4:18:32<8:51:42,  2.41s/it] 34%|████████████████████████████▎                                                      | 6863/20117 [4:18:34<8:49:58,  2.40s/it] 34%|████████████████████████████▎                                                      | 6864/20117 [4:18:37<8:49:03,  2.40s/it] 34%|████████████████████████████▎                                                      | 6865/20117 [4:18:39<8:43:46,  2.37s/it] 34%|████████████████████████████▎                                                      | 6866/20117 [4:18:41<8:39:55,  2.35s/it] 34%|████████████████████████████▎                                                      | 6867/20117 [4:18:44<8:36:34,  2.34s/it] 34%|████████████████████████████▎                                                      | 6868/20117 [4:18:46<8:33:54,  2.33s/it] 34%|████████████████████████████▎                                                      | 6869/20117 [4:18:48<8:37:11,  2.34s/it] 34%|████████████████████████████▎                                                      | 6870/20117 [4:18:51<8:39:04,  2.35s/it]                                                                                                                                 {'loss': 0.2255, 'grad_norm': 0.25813058018684387, 'learning_rate': 0.00014868040844238386, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.96, 'epoch': 0.68}
 34%|████████████████████████████▎                                                      | 6870/20117 [4:18:51<8:39:04,  2.35s/it] 34%|████████████████████████████▎                                                      | 6871/20117 [4:18:53<8:39:05,  2.35s/it] 34%|████████████████████████████▎                                                      | 6872/20117 [4:18:55<8:40:24,  2.36s/it] 34%|████████████████████████████▎                                                      | 6873/20117 [4:18:58<8:37:48,  2.35s/it] 34%|████████████████████████████▎                                                      | 6874/20117 [4:19:00<8:33:12,  2.33s/it] 34%|████████████████████████████▎                                                      | 6875/20117 [4:19:02<8:28:08,  2.30s/it] 34%|████████████████████████████▎                                                      | 6876/20117 [4:19:05<8:35:27,  2.34s/it] 34%|████████████████████████████▎                                                      | 6877/20117 [4:19:07<8:36:48,  2.34s/it] 34%|████████████████████████████▍                                                      | 6878/20117 [4:19:09<8:32:56,  2.32s/it] 34%|████████████████████████████▍                                                      | 6879/20117 [4:19:12<8:36:58,  2.34s/it] 34%|████████████████████████████▍                                                      | 6880/20117 [4:19:14<8:38:30,  2.35s/it]                                                                                                                                 {'loss': 0.2135, 'grad_norm': 0.46098119020462036, 'learning_rate': 0.00014854325425828305, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 278.19, 'epoch': 0.68}
 34%|████████████████████████████▍                                                      | 6880/20117 [4:19:14<8:38:30,  2.35s/it] 34%|████████████████████████████▍                                                      | 6881/20117 [4:19:16<8:36:39,  2.34s/it] 34%|████████████████████████████▍                                                      | 6882/20117 [4:19:19<8:36:45,  2.34s/it] 34%|████████████████████████████▍                                                      | 6883/20117 [4:19:21<8:41:44,  2.37s/it] 34%|████████████████████████████▍                                                      | 6884/20117 [4:19:24<8:43:01,  2.37s/it] 34%|████████████████████████████▍                                                      | 6885/20117 [4:19:26<8:41:25,  2.36s/it] 34%|████████████████████████████▍                                                      | 6886/20117 [4:19:28<8:37:55,  2.35s/it] 34%|████████████████████████████▍                                                      | 6887/20117 [4:19:31<8:38:28,  2.35s/it] 34%|████████████████████████████▍                                                      | 6888/20117 [4:19:33<8:41:36,  2.37s/it] 34%|████████████████████████████▍                                                      | 6889/20117 [4:19:35<8:42:04,  2.37s/it] 34%|████████████████████████████▍                                                      | 6890/20117 [4:19:38<8:41:56,  2.37s/it]                                                                                                                                 {'loss': 0.2283, 'grad_norm': 0.45799604058265686, 'learning_rate': 0.00014840598050188715, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 358.38, 'epoch': 0.68}
 34%|████████████████████████████▍                                                      | 6890/20117 [4:19:38<8:41:56,  2.37s/it] 34%|████████████████████████████▍                                                      | 6891/20117 [4:19:40<8:41:22,  2.37s/it] 34%|████████████████████████████▍                                                      | 6892/20117 [4:19:42<8:39:08,  2.36s/it] 34%|████████████████████████████▍                                                      | 6893/20117 [4:19:45<8:36:55,  2.35s/it] 34%|████████████████████████████▍                                                      | 6894/20117 [4:19:47<8:41:37,  2.37s/it] 34%|████████████████████████████▍                                                      | 6895/20117 [4:19:50<8:38:36,  2.35s/it] 34%|████████████████████████████▍                                                      | 6896/20117 [4:19:52<8:35:54,  2.34s/it] 34%|████████████████████████████▍                                                      | 6897/20117 [4:19:54<8:35:39,  2.34s/it] 34%|████████████████████████████▍                                                      | 6898/20117 [4:19:56<8:34:01,  2.33s/it] 34%|████████████████████████████▍                                                      | 6899/20117 [4:19:59<8:29:41,  2.31s/it] 34%|████████████████████████████▍                                                      | 6900/20117 [4:20:01<8:29:01,  2.31s/it]                                                                                                                                 {'loss': 0.2261, 'grad_norm': 0.6016408205032349, 'learning_rate': 0.00014826858751133042, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 368.61, 'epoch': 0.69}
 34%|████████████████████████████▍                                                      | 6900/20117 [4:20:01<8:29:01,  2.31s/it] 34%|████████████████████████████▍                                                      | 6901/20117 [4:20:03<8:28:33,  2.31s/it] 34%|████████████████████████████▍                                                      | 6902/20117 [4:20:06<8:31:47,  2.32s/it] 34%|████████████████████████████▍                                                      | 6903/20117 [4:20:08<8:57:30,  2.44s/it] 34%|████████████████████████████▍                                                      | 6904/20117 [4:20:11<8:52:33,  2.42s/it] 34%|████████████████████████████▍                                                      | 6905/20117 [4:20:13<9:08:55,  2.49s/it] 34%|████████████████████████████▍                                                      | 6906/20117 [4:20:16<9:15:38,  2.52s/it] 34%|████████████████████████████▍                                                      | 6907/20117 [4:20:18<9:03:15,  2.47s/it] 34%|████████████████████████████▌                                                      | 6908/20117 [4:20:21<9:01:13,  2.46s/it] 34%|████████████████████████████▌                                                      | 6909/20117 [4:20:23<8:51:51,  2.42s/it] 34%|████████████████████████████▌                                                      | 6910/20117 [4:20:26<8:47:44,  2.40s/it]                                                                                                                                 {'loss': 0.2799, 'grad_norm': 0.488506555557251, 'learning_rate': 0.00014813107562504084, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 367.4, 'epoch': 0.69}
 34%|████████████████████████████▌                                                      | 6910/20117 [4:20:26<8:47:44,  2.40s/it] 34%|████████████████████████████▌                                                      | 6911/20117 [4:20:28<8:42:47,  2.38s/it] 34%|████████████████████████████▌                                                      | 6912/20117 [4:20:30<8:40:33,  2.37s/it] 34%|████████████████████████████▌                                                      | 6913/20117 [4:20:33<8:41:44,  2.37s/it] 34%|████████████████████████████▌                                                      | 6914/20117 [4:20:35<8:39:11,  2.36s/it] 34%|████████████████████████████▌                                                      | 6915/20117 [4:20:37<8:32:27,  2.33s/it] 34%|████████████████████████████▌                                                      | 6916/20117 [4:20:39<8:30:49,  2.32s/it] 34%|████████████████████████████▌                                                      | 6917/20117 [4:20:42<8:36:22,  2.35s/it] 34%|████████████████████████████▌                                                      | 6918/20117 [4:20:44<8:36:07,  2.35s/it] 34%|████████████████████████████▌                                                      | 6919/20117 [4:20:46<8:30:38,  2.32s/it] 34%|████████████████████████████▌                                                      | 6920/20117 [4:20:49<8:26:24,  2.30s/it]                                                                                                                                 {'loss': 0.1868, 'grad_norm': 0.6327788829803467, 'learning_rate': 0.00014799344518173928, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 300.97, 'epoch': 0.69}
 34%|████████████████████████████▌                                                      | 6920/20117 [4:20:49<8:26:24,  2.30s/it] 34%|████████████████████████████▌                                                      | 6921/20117 [4:20:51<8:27:40,  2.31s/it] 34%|████████████████████████████▌                                                      | 6922/20117 [4:20:53<8:29:58,  2.32s/it] 34%|████████████████████████████▌                                                      | 6923/20117 [4:20:56<8:33:05,  2.33s/it] 34%|████████████████████████████▌                                                      | 6924/20117 [4:20:58<8:40:05,  2.37s/it] 34%|████████████████████████████▌                                                      | 6925/20117 [4:21:01<8:40:14,  2.37s/it] 34%|████████████████████████████▌                                                      | 6926/20117 [4:21:03<8:41:27,  2.37s/it] 34%|████████████████████████████▌                                                      | 6927/20117 [4:21:05<8:39:27,  2.36s/it] 34%|████████████████████████████▌                                                      | 6928/20117 [4:21:08<8:34:39,  2.34s/it] 34%|████████████████████████████▌                                                      | 6929/20117 [4:21:10<8:32:33,  2.33s/it] 34%|████████████████████████████▌                                                      | 6930/20117 [4:21:12<8:31:20,  2.33s/it]                                                                                                                                 {'loss': 0.2496, 'grad_norm': 0.4955579340457916, 'learning_rate': 0.00014785569652043856, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.13, 'epoch': 0.69}
 34%|████████████████████████████▌                                                      | 6930/20117 [4:21:12<8:31:20,  2.33s/it] 34%|████████████████████████████▌                                                      | 6931/20117 [4:21:15<8:30:18,  2.32s/it] 34%|████████████████████████████▌                                                      | 6932/20117 [4:21:17<8:30:49,  2.32s/it] 34%|████████████████████████████▌                                                      | 6933/20117 [4:21:19<8:30:13,  2.32s/it] 34%|████████████████████████████▌                                                      | 6934/20117 [4:21:21<8:30:12,  2.32s/it] 34%|████████████████████████████▌                                                      | 6935/20117 [4:21:24<8:30:36,  2.32s/it] 34%|████████████████████████████▌                                                      | 6936/20117 [4:21:26<8:28:30,  2.31s/it] 34%|████████████████████████████▌                                                      | 6937/20117 [4:21:28<8:26:45,  2.31s/it] 34%|████████████████████████████▋                                                      | 6938/20117 [4:21:31<8:25:07,  2.30s/it] 34%|████████████████████████████▋                                                      | 6939/20117 [4:21:33<8:27:00,  2.31s/it] 34%|████████████████████████████▋                                                      | 6940/20117 [4:21:35<8:26:27,  2.31s/it]                                                                                                                                 {'loss': 0.2611, 'grad_norm': 0.5724585652351379, 'learning_rate': 0.0001477178299804428, 'memory/max_active (GiB)': 18.82, 'memory/max_allocated (GiB)': 18.82, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 326.77, 'epoch': 0.69}
 34%|████████████████████████████▋                                                      | 6940/20117 [4:21:35<8:26:27,  2.31s/it] 35%|████████████████████████████▋                                                      | 6941/20117 [4:21:38<8:29:49,  2.32s/it] 35%|████████████████████████████▋                                                      | 6942/20117 [4:21:40<8:32:44,  2.34s/it] 35%|████████████████████████████▋                                                      | 6943/20117 [4:21:42<8:27:44,  2.31s/it] 35%|████████████████████████████▋                                                      | 6944/20117 [4:21:45<8:27:40,  2.31s/it] 35%|████████████████████████████▋                                                      | 6945/20117 [4:21:47<8:28:51,  2.32s/it] 35%|████████████████████████████▋                                                      | 6946/20117 [4:21:49<8:30:48,  2.33s/it] 35%|████████████████████████████▋                                                      | 6947/20117 [4:21:52<8:25:35,  2.30s/it] 35%|████████████████████████████▋                                                      | 6948/20117 [4:21:54<8:25:57,  2.31s/it] 35%|████████████████████████████▋                                                      | 6949/20117 [4:21:56<8:27:03,  2.31s/it] 35%|████████████████████████████▋                                                      | 6950/20117 [4:21:59<8:30:06,  2.32s/it]                                                                                                                                 {'loss': 0.1107, 'grad_norm': 0.2057613730430603, 'learning_rate': 0.00014757984590134642, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 293.99, 'epoch': 0.69}
 35%|████████████████████████████▋                                                      | 6950/20117 [4:21:59<8:30:06,  2.32s/it] 35%|████████████████████████████▋                                                      | 6951/20117 [4:22:01<8:32:24,  2.34s/it] 35%|████████████████████████████▋                                                      | 6952/20117 [4:22:03<8:31:23,  2.33s/it] 35%|████████████████████████████▋                                                      | 6953/20117 [4:22:06<8:31:59,  2.33s/it] 35%|████████████████████████████▋                                                      | 6954/20117 [4:22:08<8:31:18,  2.33s/it] 35%|████████████████████████████▋                                                      | 6955/20117 [4:22:10<8:33:57,  2.34s/it] 35%|████████████████████████████▋                                                      | 6956/20117 [4:22:13<9:00:23,  2.46s/it] 35%|████████████████████████████▋                                                      | 6957/20117 [4:22:15<8:51:38,  2.42s/it] 35%|████████████████████████████▋                                                      | 6958/20117 [4:22:18<8:46:54,  2.40s/it] 35%|████████████████████████████▋                                                      | 6959/20117 [4:22:20<8:42:22,  2.38s/it] 35%|████████████████████████████▋                                                      | 6960/20117 [4:22:22<8:36:23,  2.35s/it]                                                                                                                                 {'loss': 0.2379, 'grad_norm': 0.3206622004508972, 'learning_rate': 0.00014744174462303334, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 371.29, 'epoch': 0.69}
 35%|████████████████████████████▋                                                      | 6960/20117 [4:22:22<8:36:23,  2.35s/it] 35%|████████████████████████████▋                                                      | 6961/20117 [4:22:25<8:39:59,  2.37s/it] 35%|████████████████████████████▋                                                      | 6962/20117 [4:22:27<8:38:05,  2.36s/it] 35%|████████████████████████████▋                                                      | 6963/20117 [4:22:29<8:34:57,  2.35s/it] 35%|████████████████████████████▋                                                      | 6964/20117 [4:22:32<8:36:34,  2.36s/it] 35%|████████████████████████████▋                                                      | 6965/20117 [4:22:34<8:39:46,  2.37s/it] 35%|████████████████████████████▋                                                      | 6966/20117 [4:22:36<8:35:27,  2.35s/it] 35%|████████████████████████████▋                                                      | 6967/20117 [4:22:39<8:31:28,  2.33s/it] 35%|████████████████████████████▋                                                      | 6968/20117 [4:22:41<8:28:35,  2.32s/it] 35%|████████████████████████████▊                                                      | 6969/20117 [4:22:43<8:22:28,  2.29s/it] 35%|████████████████████████████▊                                                      | 6970/20117 [4:22:46<8:26:01,  2.31s/it]                                                                                                                                 {'loss': 0.2558, 'grad_norm': 0.3926986753940582, 'learning_rate': 0.00014730352648567623, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.71, 'epoch': 0.69}
 35%|████████████████████████████▊                                                      | 6970/20117 [4:22:46<8:26:01,  2.31s/it] 35%|████████████████████████████▊                                                      | 6971/20117 [4:22:48<8:25:59,  2.31s/it] 35%|████████████████████████████▊                                                      | 6972/20117 [4:22:50<8:24:50,  2.30s/it] 35%|████████████████████████████▊                                                      | 6973/20117 [4:22:52<8:23:03,  2.30s/it] 35%|████████████████████████████▊                                                      | 6974/20117 [4:22:55<8:27:56,  2.32s/it] 35%|████████████████████████████▊                                                      | 6975/20117 [4:22:57<8:39:30,  2.37s/it] 35%|████████████████████████████▊                                                      | 6976/20117 [4:23:00<8:37:54,  2.36s/it] 35%|████████████████████████████▊                                                      | 6977/20117 [4:23:02<8:36:05,  2.36s/it] 35%|████████████████████████████▊                                                      | 6978/20117 [4:23:04<8:34:35,  2.35s/it] 35%|████████████████████████████▊                                                      | 6979/20117 [4:23:07<8:33:19,  2.34s/it] 35%|████████████████████████████▊                                                      | 6980/20117 [4:23:09<8:33:23,  2.34s/it]                                                                                                                                 {'loss': 0.2601, 'grad_norm': 0.34591636061668396, 'learning_rate': 0.00014716519182973552, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.12, 'epoch': 0.69}
 35%|████████████████████████████▊                                                      | 6980/20117 [4:23:09<8:33:23,  2.34s/it] 35%|████████████████████████████▊                                                      | 6981/20117 [4:23:11<8:35:02,  2.35s/it] 35%|████████████████████████████▊                                                      | 6982/20117 [4:23:14<8:42:35,  2.39s/it] 35%|████████████████████████████▊                                                      | 6983/20117 [4:23:16<8:44:06,  2.39s/it] 35%|████████████████████████████▊                                                      | 6984/20117 [4:23:19<8:39:05,  2.37s/it] 35%|████████████████████████████▊                                                      | 6985/20117 [4:23:21<8:35:46,  2.36s/it] 35%|████████████████████████████▊                                                      | 6986/20117 [4:23:23<8:36:38,  2.36s/it] 35%|████████████████████████████▊                                                      | 6987/20117 [4:23:26<8:33:18,  2.35s/it] 35%|████████████████████████████▊                                                      | 6988/20117 [4:23:28<8:32:47,  2.34s/it] 35%|████████████████████████████▊                                                      | 6989/20117 [4:23:30<8:32:02,  2.34s/it] 35%|████████████████████████████▊                                                      | 6990/20117 [4:23:33<8:32:09,  2.34s/it]                                                                                                                                 {'loss': 0.2027, 'grad_norm': 0.5908513069152832, 'learning_rate': 0.00014702674099595876, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 345.62, 'epoch': 0.69}
 35%|████████████████████████████▊                                                      | 6990/20117 [4:23:33<8:32:09,  2.34s/it] 35%|████████████████████████████▊                                                      | 6991/20117 [4:23:35<8:26:01,  2.31s/it] 35%|████████████████████████████▊                                                      | 6992/20117 [4:23:37<8:25:52,  2.31s/it] 35%|████████████████████████████▊                                                      | 6993/20117 [4:23:40<8:28:42,  2.33s/it] 35%|████████████████████████████▊                                                      | 6994/20117 [4:23:42<8:24:43,  2.31s/it] 35%|████████████████████████████▊                                                      | 6995/20117 [4:23:44<8:21:19,  2.29s/it] 35%|████████████████████████████▊                                                      | 6996/20117 [4:23:46<8:27:37,  2.32s/it] 35%|████████████████████████████▊                                                      | 6997/20117 [4:23:49<8:24:26,  2.31s/it] 35%|████████████████████████████▊                                                      | 6998/20117 [4:23:51<8:25:36,  2.31s/it] 35%|████████████████████████████▉                                                      | 6999/20117 [4:23:53<8:29:14,  2.33s/it] 35%|████████████████████████████▉                                                      | 7000/20117 [4:23:56<8:29:17,  2.33s/it]                                                                                                                                 {'loss': 0.1987, 'grad_norm': 0.3830493986606598, 'learning_rate': 0.00014688817432537962, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.97, 'epoch': 0.7}
 35%|████████████████████████████▉                                                      | 7000/20117 [4:23:56<8:29:17,  2.33s/it] 35%|████████████████████████████▉                                                      | 7001/20117 [4:23:58<8:27:16,  2.32s/it] 35%|████████████████████████████▉                                                      | 7002/20117 [4:24:00<8:24:17,  2.31s/it] 35%|████████████████████████████▉                                                      | 7003/20117 [4:24:03<8:24:31,  2.31s/it] 35%|████████████████████████████▉                                                      | 7004/20117 [4:24:05<8:26:26,  2.32s/it] 35%|████████████████████████████▉                                                      | 7005/20117 [4:24:07<8:31:51,  2.34s/it] 35%|████████████████████████████▉                                                      | 7006/20117 [4:24:10<8:30:48,  2.34s/it] 35%|████████████████████████████▉                                                      | 7007/20117 [4:24:12<8:32:12,  2.34s/it] 35%|████████████████████████████▉                                                      | 7008/20117 [4:24:14<8:32:12,  2.34s/it] 35%|████████████████████████████▉                                                      | 7009/20117 [4:24:17<8:33:15,  2.35s/it] 35%|████████████████████████████▉                                                      | 7010/20117 [4:24:19<8:51:49,  2.43s/it]                                                                                                                                 {'loss': 0.2059, 'grad_norm': 0.444762647151947, 'learning_rate': 0.00014674949215931707, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 289.86, 'epoch': 0.7}
 35%|████████████████████████████▉                                                      | 7010/20117 [4:24:19<8:51:49,  2.43s/it] 35%|████████████████████████████▉                                                      | 7011/20117 [4:24:22<8:45:44,  2.41s/it] 35%|████████████████████████████▉                                                      | 7012/20117 [4:24:24<8:35:48,  2.36s/it] 35%|████████████████████████████▉                                                      | 7013/20117 [4:24:26<8:23:14,  2.30s/it] 35%|████████████████████████████▉                                                      | 7014/20117 [4:24:28<8:15:06,  2.27s/it] 35%|████████████████████████████▉                                                      | 7015/20117 [4:24:31<8:12:22,  2.25s/it] 35%|████████████████████████████▉                                                      | 7016/20117 [4:24:33<8:08:31,  2.24s/it] 35%|████████████████████████████▉                                                      | 7017/20117 [4:24:35<8:10:20,  2.25s/it] 35%|████████████████████████████▉                                                      | 7018/20117 [4:24:37<8:11:57,  2.25s/it] 35%|████████████████████████████▉                                                      | 7019/20117 [4:24:40<8:16:32,  2.27s/it] 35%|████████████████████████████▉                                                      | 7020/20117 [4:24:42<8:22:31,  2.30s/it]                                                                                                                                 {'loss': 0.2115, 'grad_norm': 0.31576088070869446, 'learning_rate': 0.00014661069483937458, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 339.22, 'epoch': 0.7}
 35%|████████████████████████████▉                                                      | 7020/20117 [4:24:42<8:22:31,  2.30s/it] 35%|████████████████████████████▉                                                      | 7021/20117 [4:24:44<8:22:43,  2.30s/it] 35%|████████████████████████████▉                                                      | 7022/20117 [4:24:47<8:28:59,  2.33s/it] 35%|████████████████████████████▉                                                      | 7023/20117 [4:24:49<8:31:47,  2.35s/it] 35%|████████████████████████████▉                                                      | 7024/20117 [4:24:51<8:31:39,  2.34s/it] 35%|████████████████████████████▉                                                      | 7025/20117 [4:24:54<8:34:23,  2.36s/it] 35%|████████████████████████████▉                                                      | 7026/20117 [4:24:56<8:27:29,  2.33s/it] 35%|████████████████████████████▉                                                      | 7027/20117 [4:24:58<8:21:08,  2.30s/it] 35%|████████████████████████████▉                                                      | 7028/20117 [4:25:00<8:14:29,  2.27s/it] 35%|█████████████████████████████                                                      | 7029/20117 [4:25:03<8:08:10,  2.24s/it] 35%|█████████████████████████████                                                      | 7030/20117 [4:25:05<8:11:30,  2.25s/it]                                                                                                                                 {'loss': 0.265, 'grad_norm': 0.4755282700061798, 'learning_rate': 0.00014647178270743932, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 329.94, 'epoch': 0.7}
 35%|█████████████████████████████                                                      | 7030/20117 [4:25:05<8:11:30,  2.25s/it] 35%|█████████████████████████████                                                      | 7031/20117 [4:25:07<8:20:23,  2.29s/it] 35%|█████████████████████████████                                                      | 7032/20117 [4:25:10<8:27:37,  2.33s/it] 35%|█████████████████████████████                                                      | 7033/20117 [4:25:12<8:29:11,  2.34s/it] 35%|█████████████████████████████                                                      | 7034/20117 [4:25:14<8:29:48,  2.34s/it] 35%|█████████████████████████████                                                      | 7035/20117 [4:25:17<8:28:04,  2.33s/it] 35%|█████████████████████████████                                                      | 7036/20117 [4:25:19<8:28:43,  2.33s/it] 35%|█████████████████████████████                                                      | 7037/20117 [4:25:21<8:27:16,  2.33s/it] 35%|█████████████████████████████                                                      | 7038/20117 [4:25:24<8:21:48,  2.30s/it] 35%|█████████████████████████████                                                      | 7039/20117 [4:25:26<8:21:35,  2.30s/it] 35%|█████████████████████████████                                                      | 7040/20117 [4:25:28<8:22:11,  2.30s/it]                                                                                                                                 {'loss': 0.2492, 'grad_norm': 0.4698229134082794, 'learning_rate': 0.00014633275610568123, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 315.54, 'epoch': 0.7}
 35%|█████████████████████████████                                                      | 7040/20117 [4:25:28<8:22:11,  2.30s/it] 35%|█████████████████████████████                                                      | 7041/20117 [4:25:31<8:23:48,  2.31s/it] 35%|█████████████████████████████                                                      | 7042/20117 [4:25:33<8:22:11,  2.30s/it] 35%|█████████████████████████████                                                      | 7043/20117 [4:25:35<8:24:01,  2.31s/it] 35%|█████████████████████████████                                                      | 7044/20117 [4:25:38<8:24:35,  2.32s/it] 35%|█████████████████████████████                                                      | 7045/20117 [4:25:40<8:23:18,  2.31s/it] 35%|█████████████████████████████                                                      | 7046/20117 [4:25:42<8:21:50,  2.30s/it] 35%|█████████████████████████████                                                      | 7047/20117 [4:25:44<8:22:14,  2.31s/it] 35%|█████████████████████████████                                                      | 7048/20117 [4:25:47<8:20:55,  2.30s/it] 35%|█████████████████████████████                                                      | 7049/20117 [4:25:49<8:21:30,  2.30s/it] 35%|█████████████████████████████                                                      | 7050/20117 [4:25:51<8:25:02,  2.32s/it]                                                                                                                                 {'loss': 0.2412, 'grad_norm': 0.3248315453529358, 'learning_rate': 0.00014619361537655215, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 294.7, 'epoch': 0.7}
 35%|█████████████████████████████                                                      | 7050/20117 [4:25:51<8:25:02,  2.32s/it] 35%|█████████████████████████████                                                      | 7051/20117 [4:25:54<8:22:35,  2.31s/it] 35%|█████████████████████████████                                                      | 7052/20117 [4:25:56<8:19:42,  2.29s/it] 35%|█████████████████████████████                                                      | 7053/20117 [4:25:58<8:18:43,  2.29s/it] 35%|█████████████████████████████                                                      | 7054/20117 [4:26:00<8:17:45,  2.29s/it] 35%|█████████████████████████████                                                      | 7055/20117 [4:26:03<8:16:55,  2.28s/it] 35%|█████████████████████████████                                                      | 7056/20117 [4:26:05<8:18:03,  2.29s/it] 35%|█████████████████████████████                                                      | 7057/20117 [4:26:07<8:21:15,  2.30s/it] 35%|█████████████████████████████                                                      | 7058/20117 [4:26:10<8:21:33,  2.30s/it] 35%|█████████████████████████████                                                      | 7059/20117 [4:26:12<8:22:34,  2.31s/it] 35%|█████████████████████████████▏                                                     | 7060/20117 [4:26:14<8:21:59,  2.31s/it]                                                                                                                                 {'loss': 0.2356, 'grad_norm': 0.48639553785324097, 'learning_rate': 0.0001460543608627852, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 280.17, 'epoch': 0.7}
 35%|█████████████████████████████▏                                                     | 7060/20117 [4:26:14<8:21:59,  2.31s/it] 35%|█████████████████████████████▏                                                     | 7061/20117 [4:26:17<8:20:15,  2.30s/it] 35%|█████████████████████████████▏                                                     | 7062/20117 [4:26:19<8:20:05,  2.30s/it] 35%|█████████████████████████████▏                                                     | 7063/20117 [4:26:21<8:23:13,  2.31s/it] 35%|█████████████████████████████▏                                                     | 7064/20117 [4:26:24<8:51:17,  2.44s/it] 35%|█████████████████████████████▏                                                     | 7065/20117 [4:26:26<8:41:23,  2.40s/it] 35%|█████████████████████████████▏                                                     | 7066/20117 [4:26:29<8:37:41,  2.38s/it] 35%|█████████████████████████████▏                                                     | 7067/20117 [4:26:31<8:30:49,  2.35s/it] 35%|█████████████████████████████▏                                                     | 7068/20117 [4:26:33<8:28:55,  2.34s/it] 35%|█████████████████████████████▏                                                     | 7069/20117 [4:26:36<8:27:12,  2.33s/it] 35%|█████████████████████████████▏                                                     | 7070/20117 [4:26:38<8:22:54,  2.31s/it]                                                                                                                                 {'loss': 0.1679, 'grad_norm': 0.5937051773071289, 'learning_rate': 0.00014591499290739362, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.7, 'epoch': 0.7}
 35%|█████████████████████████████▏                                                     | 7070/20117 [4:26:38<8:22:54,  2.31s/it] 35%|█████████████████████████████▏                                                     | 7071/20117 [4:26:40<8:25:38,  2.33s/it] 35%|█████████████████████████████▏                                                     | 7072/20117 [4:26:43<8:29:45,  2.34s/it] 35%|█████████████████████████████▏                                                     | 7073/20117 [4:26:45<8:23:45,  2.32s/it] 35%|█████████████████████████████▏                                                     | 7074/20117 [4:26:47<8:27:39,  2.34s/it] 35%|█████████████████████████████▏                                                     | 7075/20117 [4:26:50<8:30:04,  2.35s/it] 35%|█████████████████████████████▏                                                     | 7076/20117 [4:26:52<8:26:21,  2.33s/it] 35%|█████████████████████████████▏                                                     | 7077/20117 [4:26:54<8:24:15,  2.32s/it] 35%|█████████████████████████████▏                                                     | 7078/20117 [4:26:56<8:24:43,  2.32s/it] 35%|█████████████████████████████▏                                                     | 7079/20117 [4:26:59<8:24:43,  2.32s/it] 35%|█████████████████████████████▏                                                     | 7080/20117 [4:27:01<8:25:32,  2.33s/it]                                                                                                                                 {'loss': 0.2474, 'grad_norm': 0.3488394021987915, 'learning_rate': 0.00014577551185367013, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 400.34, 'epoch': 0.7}
 35%|█████████████████████████████▏                                                     | 7080/20117 [4:27:01<8:25:32,  2.33s/it] 35%|█████████████████████████████▏                                                     | 7081/20117 [4:27:03<8:26:32,  2.33s/it] 35%|█████████████████████████████▏                                                     | 7082/20117 [4:27:06<8:30:18,  2.35s/it] 35%|█████████████████████████████▏                                                     | 7083/20117 [4:27:08<8:30:27,  2.35s/it] 35%|█████████████████████████████▏                                                     | 7084/20117 [4:27:10<8:24:47,  2.32s/it] 35%|█████████████████████████████▏                                                     | 7085/20117 [4:27:13<8:28:21,  2.34s/it] 35%|█████████████████████████████▏                                                     | 7086/20117 [4:27:15<8:26:28,  2.33s/it] 35%|█████████████████████████████▏                                                     | 7087/20117 [4:27:18<8:29:52,  2.35s/it] 35%|█████████████████████████████▏                                                     | 7088/20117 [4:27:20<8:29:07,  2.34s/it] 35%|█████████████████████████████▏                                                     | 7089/20117 [4:27:22<8:30:59,  2.35s/it] 35%|█████████████████████████████▎                                                     | 7090/20117 [4:27:25<8:29:03,  2.34s/it]                                                                                                                                 {'loss': 0.2709, 'grad_norm': 0.4485851526260376, 'learning_rate': 0.0001456359180451861, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 392.29, 'epoch': 0.7}
 35%|█████████████████████████████▎                                                     | 7090/20117 [4:27:25<8:29:03,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7091/20117 [4:27:27<8:29:24,  2.35s/it] 35%|█████████████████████████████▎                                                     | 7092/20117 [4:27:29<8:31:30,  2.36s/it] 35%|█████████████████████████████▎                                                     | 7093/20117 [4:27:32<8:38:32,  2.39s/it] 35%|█████████████████████████████▎                                                     | 7094/20117 [4:27:34<8:37:04,  2.38s/it] 35%|█████████████████████████████▎                                                     | 7095/20117 [4:27:37<8:40:29,  2.40s/it] 35%|█████████████████████████████▎                                                     | 7096/20117 [4:27:39<8:36:06,  2.38s/it] 35%|█████████████████████████████▎                                                     | 7097/20117 [4:27:41<8:28:55,  2.35s/it] 35%|█████████████████████████████▎                                                     | 7098/20117 [4:27:44<8:28:52,  2.35s/it] 35%|█████████████████████████████▎                                                     | 7099/20117 [4:27:46<8:26:57,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7100/20117 [4:27:48<8:27:05,  2.34s/it]                                                                                                                                 {'loss': 0.2353, 'grad_norm': 0.4746951758861542, 'learning_rate': 0.00014549621182579055, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 330.73, 'epoch': 0.71}
 35%|█████████████████████████████▎                                                     | 7100/20117 [4:27:48<8:27:05,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7101/20117 [4:27:51<8:26:53,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7102/20117 [4:27:53<8:30:54,  2.36s/it] 35%|█████████████████████████████▎                                                     | 7103/20117 [4:27:55<8:29:06,  2.35s/it] 35%|█████████████████████████████▎                                                     | 7104/20117 [4:27:58<8:29:42,  2.35s/it] 35%|█████████████████████████████▎                                                     | 7105/20117 [4:28:00<8:26:32,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7106/20117 [4:28:02<8:26:30,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7107/20117 [4:28:05<8:26:30,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7108/20117 [4:28:07<8:24:03,  2.32s/it] 35%|█████████████████████████████▎                                                     | 7109/20117 [4:28:09<8:23:58,  2.32s/it] 35%|█████████████████████████████▎                                                     | 7110/20117 [4:28:12<8:26:23,  2.34s/it]                                                                                                                                 {'loss': 0.2576, 'grad_norm': 0.5027205944061279, 'learning_rate': 0.00014535639353960942, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.65, 'epoch': 0.71}
 35%|█████████████████████████████▎                                                     | 7110/20117 [4:28:12<8:26:23,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7111/20117 [4:28:14<8:25:48,  2.33s/it] 35%|█████████████████████████████▎                                                     | 7112/20117 [4:28:16<8:28:10,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7113/20117 [4:28:19<8:25:06,  2.33s/it] 35%|█████████████████████████████▎                                                     | 7114/20117 [4:28:21<8:27:20,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7115/20117 [4:28:23<8:27:16,  2.34s/it] 35%|█████████████████████████████▎                                                     | 7116/20117 [4:28:26<8:48:23,  2.44s/it] 35%|█████████████████████████████▎                                                     | 7117/20117 [4:28:28<8:41:23,  2.41s/it] 35%|█████████████████████████████▎                                                     | 7118/20117 [4:28:31<8:34:50,  2.38s/it] 35%|█████████████████████████████▎                                                     | 7119/20117 [4:28:33<8:38:07,  2.39s/it] 35%|█████████████████████████████▍                                                     | 7120/20117 [4:28:35<8:32:31,  2.37s/it]                                                                                                                                 {'loss': 0.2186, 'grad_norm': 0.449788361787796, 'learning_rate': 0.00014521646353104472, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 269.74, 'epoch': 0.71}
 35%|█████████████████████████████▍                                                     | 7120/20117 [4:28:35<8:32:31,  2.37s/it] 35%|█████████████████████████████▍                                                     | 7121/20117 [4:28:38<8:25:22,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7122/20117 [4:28:40<8:25:04,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7123/20117 [4:28:42<8:24:28,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7124/20117 [4:28:45<8:21:43,  2.32s/it] 35%|█████████████████████████████▍                                                     | 7125/20117 [4:28:47<8:23:17,  2.32s/it] 35%|█████████████████████████████▍                                                     | 7126/20117 [4:28:49<8:22:13,  2.32s/it] 35%|█████████████████████████████▍                                                     | 7127/20117 [4:28:51<8:22:06,  2.32s/it] 35%|█████████████████████████████▍                                                     | 7128/20117 [4:28:54<8:24:01,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7129/20117 [4:28:56<8:21:08,  2.32s/it] 35%|█████████████████████████████▍                                                     | 7130/20117 [4:28:58<8:24:17,  2.33s/it]                                                                                                                                 {'loss': 0.2481, 'grad_norm': 0.31661751866340637, 'learning_rate': 0.00014507642214477362, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 302.22, 'epoch': 0.71}
 35%|█████████████████████████████▍                                                     | 7130/20117 [4:28:58<8:24:17,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7131/20117 [4:29:01<8:27:12,  2.34s/it] 35%|█████████████████████████████▍                                                     | 7132/20117 [4:29:03<8:27:14,  2.34s/it] 35%|█████████████████████████████▍                                                     | 7133/20117 [4:29:06<8:26:20,  2.34s/it] 35%|█████████████████████████████▍                                                     | 7134/20117 [4:29:08<8:25:35,  2.34s/it] 35%|█████████████████████████████▍                                                     | 7135/20117 [4:29:10<8:24:27,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7136/20117 [4:29:13<8:24:27,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7137/20117 [4:29:15<8:23:57,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7138/20117 [4:29:17<8:24:42,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7139/20117 [4:29:20<8:24:41,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7140/20117 [4:29:22<8:23:14,  2.33s/it]                                                                                                                                 {'loss': 0.2284, 'grad_norm': 0.3295125663280487, 'learning_rate': 0.00014493626972574765, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 350.96, 'epoch': 0.71}
 35%|█████████████████████████████▍                                                     | 7140/20117 [4:29:22<8:23:14,  2.33s/it] 35%|█████████████████████████████▍                                                     | 7141/20117 [4:29:24<8:29:31,  2.36s/it] 36%|█████████████████████████████▍                                                     | 7142/20117 [4:29:27<8:31:16,  2.36s/it] 36%|█████████████████████████████▍                                                     | 7143/20117 [4:29:29<8:27:30,  2.35s/it] 36%|█████████████████████████████▍                                                     | 7144/20117 [4:29:31<8:27:39,  2.35s/it] 36%|█████████████████████████████▍                                                     | 7145/20117 [4:29:34<8:20:32,  2.32s/it] 36%|█████████████████████████████▍                                                     | 7146/20117 [4:29:36<8:18:49,  2.31s/it] 36%|█████████████████████████████▍                                                     | 7147/20117 [4:29:38<8:24:36,  2.33s/it] 36%|█████████████████████████████▍                                                     | 7148/20117 [4:29:41<8:27:12,  2.35s/it] 36%|█████████████████████████████▍                                                     | 7149/20117 [4:29:43<8:27:42,  2.35s/it] 36%|█████████████████████████████▍                                                     | 7150/20117 [4:29:45<8:23:48,  2.33s/it]                                                                                                                                 {'loss': 0.2427, 'grad_norm': 0.5383651256561279, 'learning_rate': 0.0001447960066191919, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 345.67, 'epoch': 0.71}
 36%|█████████████████████████████▍                                                     | 7150/20117 [4:29:45<8:23:48,  2.33s/it] 36%|█████████████████████████████▌                                                     | 7151/20117 [4:29:48<8:21:04,  2.32s/it] 36%|█████████████████████████████▌                                                     | 7152/20117 [4:29:50<8:25:09,  2.34s/it] 36%|█████████████████████████████▌                                                     | 7153/20117 [4:29:52<8:22:51,  2.33s/it] 36%|█████████████████████████████▌                                                     | 7154/20117 [4:29:55<8:29:02,  2.36s/it] 36%|█████████████████████████████▌                                                     | 7155/20117 [4:29:57<8:29:50,  2.36s/it] 36%|█████████████████████████████▌                                                     | 7156/20117 [4:29:59<8:27:38,  2.35s/it] 36%|█████████████████████████████▌                                                     | 7157/20117 [4:30:02<8:27:40,  2.35s/it] 36%|█████████████████████████████▌                                                     | 7158/20117 [4:30:04<8:26:55,  2.35s/it] 36%|█████████████████████████████▌                                                     | 7159/20117 [4:30:06<8:27:49,  2.35s/it] 36%|█████████████████████████████▌                                                     | 7160/20117 [4:30:09<8:24:46,  2.34s/it]                                                                                                                                 {'loss': 0.2434, 'grad_norm': 0.3970474898815155, 'learning_rate': 0.00014465563317060394, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 374.3, 'epoch': 0.71}
 36%|█████████████████████████████▌                                                     | 7160/20117 [4:30:09<8:24:46,  2.34s/it] 36%|█████████████████████████████▌                                                     | 7161/20117 [4:30:11<8:24:45,  2.34s/it] 36%|█████████████████████████████▌                                                     | 7162/20117 [4:30:13<8:22:31,  2.33s/it] 36%|█████████████████████████████▌                                                     | 7163/20117 [4:30:16<8:24:50,  2.34s/it] 36%|█████████████████████████████▌                                                     | 7164/20117 [4:30:18<8:22:47,  2.33s/it] 36%|█████████████████████████████▌                                                     | 7165/20117 [4:30:20<8:25:20,  2.34s/it] 36%|█████████████████████████████▌                                                     | 7166/20117 [4:30:23<8:20:16,  2.32s/it] 36%|█████████████████████████████▌                                                     | 7167/20117 [4:30:25<8:47:03,  2.44s/it] 36%|█████████████████████████████▌                                                     | 7168/20117 [4:30:28<8:36:34,  2.39s/it] 36%|█████████████████████████████▌                                                     | 7169/20117 [4:30:30<8:28:44,  2.36s/it] 36%|█████████████████████████████▌                                                     | 7170/20117 [4:30:32<8:32:02,  2.37s/it]                                                                                                                                 {'loss': 0.1649, 'grad_norm': 0.16766348481178284, 'learning_rate': 0.00014451514972575332, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 304.14, 'epoch': 0.71}
 36%|█████████████████████████████▌                                                     | 7170/20117 [4:30:32<8:32:02,  2.37s/it] 36%|█████████████████████████████▌                                                     | 7171/20117 [4:30:35<8:26:06,  2.35s/it] 36%|█████████████████████████████▌                                                     | 7172/20117 [4:30:37<8:24:24,  2.34s/it] 36%|█████████████████████████████▌                                                     | 7173/20117 [4:30:39<8:23:27,  2.33s/it] 36%|█████████████████████████████▌                                                     | 7174/20117 [4:30:42<8:18:31,  2.31s/it] 36%|█████████████████████████████▌                                                     | 7175/20117 [4:30:44<8:16:05,  2.30s/it] 36%|█████████████████████████████▌                                                     | 7176/20117 [4:30:46<8:17:51,  2.31s/it] 36%|█████████████████████████████▌                                                     | 7177/20117 [4:30:48<8:18:37,  2.31s/it] 36%|█████████████████████████████▌                                                     | 7178/20117 [4:30:51<8:18:55,  2.31s/it] 36%|█████████████████████████████▌                                                     | 7179/20117 [4:30:53<8:22:27,  2.33s/it] 36%|█████████████████████████████▌                                                     | 7180/20117 [4:30:55<8:20:29,  2.32s/it]                                                                                                                                 {'loss': 0.2633, 'grad_norm': 0.4426742196083069, 'learning_rate': 0.00014437455663068042, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 374.18, 'epoch': 0.71}
 36%|█████████████████████████████▌                                                     | 7180/20117 [4:30:55<8:20:29,  2.32s/it] 36%|█████████████████████████████▋                                                     | 7181/20117 [4:30:58<8:23:42,  2.34s/it] 36%|█████████████████████████████▋                                                     | 7182/20117 [4:31:00<8:23:32,  2.34s/it] 36%|█████████████████████████████▋                                                     | 7183/20117 [4:31:02<8:19:58,  2.32s/it] 36%|█████████████████████████████▋                                                     | 7184/20117 [4:31:05<8:21:15,  2.33s/it] 36%|█████████████████████████████▋                                                     | 7185/20117 [4:31:07<8:20:09,  2.32s/it] 36%|█████████████████████████████▋                                                     | 7186/20117 [4:31:09<8:27:16,  2.35s/it] 36%|█████████████████████████████▋                                                     | 7187/20117 [4:31:12<8:20:46,  2.32s/it] 36%|█████████████████████████████▋                                                     | 7188/20117 [4:31:14<8:16:11,  2.30s/it] 36%|█████████████████████████████▋                                                     | 7189/20117 [4:31:16<8:19:56,  2.32s/it] 36%|█████████████████████████████▋                                                     | 7190/20117 [4:31:19<8:17:05,  2.31s/it]                                                                                                                                 {'loss': 0.2584, 'grad_norm': 0.4757481515407562, 'learning_rate': 0.00014423385423169575, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 302.88, 'epoch': 0.71}
 36%|█████████████████████████████▋                                                     | 7190/20117 [4:31:19<8:17:05,  2.31s/it] 36%|█████████████████████████████▋                                                     | 7191/20117 [4:31:21<8:18:11,  2.31s/it] 36%|█████████████████████████████▋                                                     | 7192/20117 [4:31:23<8:22:15,  2.33s/it] 36%|█████████████████████████████▋                                                     | 7193/20117 [4:31:26<8:25:36,  2.35s/it] 36%|█████████████████████████████▋                                                     | 7194/20117 [4:31:28<8:30:48,  2.37s/it] 36%|█████████████████████████████▋                                                     | 7195/20117 [4:31:30<8:24:54,  2.34s/it] 36%|█████████████████████████████▋                                                     | 7196/20117 [4:31:33<8:13:25,  2.29s/it] 36%|█████████████████████████████▋                                                     | 7197/20117 [4:31:35<8:06:54,  2.26s/it] 36%|█████████████████████████████▋                                                     | 7198/20117 [4:31:37<8:05:02,  2.25s/it] 36%|█████████████████████████████▋                                                     | 7199/20117 [4:31:39<7:59:03,  2.23s/it] 36%|█████████████████████████████▋                                                     | 7200/20117 [4:31:41<8:01:57,  2.24s/it]                                                                                                                                 {'loss': 0.2386, 'grad_norm': 0.4964188039302826, 'learning_rate': 0.00014409304287537906, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 325.69, 'epoch': 0.72}
 36%|█████████████████████████████▋                                                     | 7200/20117 [4:31:41<8:01:57,  2.24s/it] 36%|█████████████████████████████▋                                                     | 7201/20117 [4:31:44<8:02:43,  2.24s/it] 36%|█████████████████████████████▋                                                     | 7202/20117 [4:31:46<8:11:44,  2.28s/it] 36%|█████████████████████████████▋                                                     | 7203/20117 [4:31:48<8:16:42,  2.31s/it] 36%|█████████████████████████████▋                                                     | 7204/20117 [4:31:51<8:17:42,  2.31s/it] 36%|█████████████████████████████▋                                                     | 7205/20117 [4:31:53<8:17:54,  2.31s/it] 36%|█████████████████████████████▋                                                     | 7206/20117 [4:31:55<8:19:34,  2.32s/it] 36%|█████████████████████████████▋                                                     | 7207/20117 [4:31:58<8:20:54,  2.33s/it] 36%|█████████████████████████████▋                                                     | 7208/20117 [4:32:00<8:21:12,  2.33s/it] 36%|█████████████████████████████▋                                                     | 7209/20117 [4:32:02<8:16:22,  2.31s/it] 36%|█████████████████████████████▋                                                     | 7210/20117 [4:32:05<8:12:22,  2.29s/it]                                                                                                                                 {'loss': 0.2161, 'grad_norm': 0.5026222467422485, 'learning_rate': 0.0001439521229085785, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 332.92, 'epoch': 0.72}
 36%|█████████████████████████████▋                                                     | 7210/20117 [4:32:05<8:12:22,  2.29s/it] 36%|█████████████████████████████▊                                                     | 7211/20117 [4:32:07<8:05:15,  2.26s/it] 36%|█████████████████████████████▊                                                     | 7212/20117 [4:32:09<8:03:58,  2.25s/it] 36%|█████████████████████████████▊                                                     | 7213/20117 [4:32:11<8:05:14,  2.26s/it] 36%|█████████████████████████████▊                                                     | 7214/20117 [4:32:14<8:13:18,  2.29s/it] 36%|█████████████████████████████▊                                                     | 7215/20117 [4:32:16<8:18:32,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7216/20117 [4:32:18<8:18:40,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7217/20117 [4:32:21<8:18:10,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7218/20117 [4:32:23<8:17:31,  2.31s/it] 36%|█████████████████████████████▊                                                     | 7219/20117 [4:32:25<8:17:13,  2.31s/it] 36%|█████████████████████████████▊                                                     | 7220/20117 [4:32:28<8:24:14,  2.35s/it]                                                                                                                                 {'loss': 0.2157, 'grad_norm': 0.41850724816322327, 'learning_rate': 0.00014381109467840976, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 328.35, 'epoch': 0.72}
 36%|█████████████████████████████▊                                                     | 7220/20117 [4:32:28<8:24:14,  2.35s/it] 36%|█████████████████████████████▊                                                     | 7221/20117 [4:32:30<8:47:46,  2.46s/it] 36%|█████████████████████████████▊                                                     | 7222/20117 [4:32:33<8:39:57,  2.42s/it] 36%|█████████████████████████████▊                                                     | 7223/20117 [4:32:35<8:35:59,  2.40s/it] 36%|█████████████████████████████▊                                                     | 7224/20117 [4:32:37<8:31:46,  2.38s/it] 36%|█████████████████████████████▊                                                     | 7225/20117 [4:32:40<8:25:17,  2.35s/it] 36%|█████████████████████████████▊                                                     | 7226/20117 [4:32:42<8:20:56,  2.33s/it] 36%|█████████████████████████████▊                                                     | 7227/20117 [4:32:44<8:19:09,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7228/20117 [4:32:47<8:18:24,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7229/20117 [4:32:49<8:16:41,  2.31s/it] 36%|█████████████████████████████▊                                                     | 7230/20117 [4:32:51<8:15:14,  2.31s/it]                                                                                                                                 {'loss': 0.2112, 'grad_norm': 0.3922070264816284, 'learning_rate': 0.00014366995853225514, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 263.36, 'epoch': 0.72}
 36%|█████████████████████████████▊                                                     | 7230/20117 [4:32:51<8:15:14,  2.31s/it] 36%|█████████████████████████████▊                                                     | 7231/20117 [4:32:54<8:13:39,  2.30s/it] 36%|█████████████████████████████▊                                                     | 7232/20117 [4:32:56<8:15:58,  2.31s/it] 36%|█████████████████████████████▊                                                     | 7233/20117 [4:32:58<8:11:15,  2.29s/it] 36%|█████████████████████████████▊                                                     | 7234/20117 [4:33:00<8:14:05,  2.30s/it] 36%|█████████████████████████████▊                                                     | 7235/20117 [4:33:03<8:17:56,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7236/20117 [4:33:05<8:14:42,  2.30s/it] 36%|█████████████████████████████▊                                                     | 7237/20117 [4:33:07<8:17:40,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7238/20117 [4:33:10<8:17:46,  2.32s/it] 36%|█████████████████████████████▊                                                     | 7239/20117 [4:33:12<8:13:20,  2.30s/it] 36%|█████████████████████████████▊                                                     | 7240/20117 [4:33:14<8:19:07,  2.33s/it]                                                                                                                                 {'loss': 0.2715, 'grad_norm': 0.5679214000701904, 'learning_rate': 0.0001435287148177628, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 434.11, 'epoch': 0.72}
 36%|█████████████████████████████▊                                                     | 7240/20117 [4:33:14<8:19:07,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7241/20117 [4:33:17<8:17:40,  2.32s/it] 36%|█████████████████████████████▉                                                     | 7242/20117 [4:33:19<8:22:12,  2.34s/it] 36%|█████████████████████████████▉                                                     | 7243/20117 [4:33:21<8:25:25,  2.36s/it] 36%|█████████████████████████████▉                                                     | 7244/20117 [4:33:24<8:28:58,  2.37s/it] 36%|█████████████████████████████▉                                                     | 7245/20117 [4:33:26<8:22:34,  2.34s/it] 36%|█████████████████████████████▉                                                     | 7246/20117 [4:33:28<8:24:59,  2.35s/it] 36%|█████████████████████████████▉                                                     | 7247/20117 [4:33:31<8:23:00,  2.35s/it] 36%|█████████████████████████████▉                                                     | 7248/20117 [4:33:33<8:20:32,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7249/20117 [4:33:35<8:15:26,  2.31s/it] 36%|█████████████████████████████▉                                                     | 7250/20117 [4:33:38<8:22:04,  2.34s/it]                                                                                                                                 {'loss': 0.252, 'grad_norm': 0.5302831530570984, 'learning_rate': 0.0001433873638828458, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 358.75, 'epoch': 0.72}
 36%|█████████████████████████████▉                                                     | 7250/20117 [4:33:38<8:22:04,  2.34s/it] 36%|█████████████████████████████▉                                                     | 7251/20117 [4:33:40<8:20:36,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7252/20117 [4:33:42<8:16:18,  2.31s/it] 36%|█████████████████████████████▉                                                     | 7253/20117 [4:33:45<8:19:21,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7254/20117 [4:33:47<8:17:37,  2.32s/it] 36%|█████████████████████████████▉                                                     | 7255/20117 [4:33:49<8:19:46,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7256/20117 [4:33:52<8:19:03,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7257/20117 [4:33:54<8:15:49,  2.31s/it] 36%|█████████████████████████████▉                                                     | 7258/20117 [4:33:56<8:16:54,  2.32s/it] 36%|█████████████████████████████▉                                                     | 7259/20117 [4:33:59<8:19:38,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7260/20117 [4:34:01<8:18:04,  2.32s/it]                                                                                                                                 {'loss': 0.2613, 'grad_norm': 0.49475687742233276, 'learning_rate': 0.00014324590607568149, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.3, 'epoch': 0.72}
 36%|█████████████████████████████▉                                                     | 7260/20117 [4:34:01<8:18:04,  2.32s/it] 36%|█████████████████████████████▉                                                     | 7261/20117 [4:34:03<8:24:14,  2.35s/it] 36%|█████████████████████████████▉                                                     | 7262/20117 [4:34:06<8:23:13,  2.35s/it] 36%|█████████████████████████████▉                                                     | 7263/20117 [4:34:08<8:20:10,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7264/20117 [4:34:10<8:21:05,  2.34s/it] 36%|█████████████████████████████▉                                                     | 7265/20117 [4:34:13<8:15:13,  2.31s/it] 36%|█████████████████████████████▉                                                     | 7266/20117 [4:34:15<8:14:22,  2.31s/it] 36%|█████████████████████████████▉                                                     | 7267/20117 [4:34:17<8:18:29,  2.33s/it] 36%|█████████████████████████████▉                                                     | 7268/20117 [4:34:20<8:15:27,  2.31s/it] 36%|█████████████████████████████▉                                                     | 7269/20117 [4:34:22<8:16:03,  2.32s/it] 36%|█████████████████████████████▉                                                     | 7270/20117 [4:34:24<8:17:19,  2.32s/it]                                                                                                                                 {'loss': 0.288, 'grad_norm': 0.4263441264629364, 'learning_rate': 0.00014310434174471024, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 378.27, 'epoch': 0.72}
 36%|█████████████████████████████▉                                                     | 7270/20117 [4:34:24<8:17:19,  2.32s/it] 36%|█████████████████████████████▉                                                     | 7271/20117 [4:34:27<8:23:01,  2.35s/it] 36%|██████████████████████████████                                                     | 7272/20117 [4:34:29<8:26:20,  2.37s/it] 36%|██████████████████████████████                                                     | 7273/20117 [4:34:32<8:51:10,  2.48s/it] 36%|██████████████████████████████                                                     | 7274/20117 [4:34:34<8:45:49,  2.46s/it] 36%|██████████████████████████████                                                     | 7275/20117 [4:34:37<8:40:10,  2.43s/it] 36%|██████████████████████████████                                                     | 7276/20117 [4:34:39<8:36:08,  2.41s/it] 36%|██████████████████████████████                                                     | 7277/20117 [4:34:41<8:36:37,  2.41s/it] 36%|██████████████████████████████                                                     | 7278/20117 [4:34:44<8:32:26,  2.39s/it] 36%|██████████████████████████████                                                     | 7279/20117 [4:34:46<8:27:53,  2.37s/it] 36%|██████████████████████████████                                                     | 7280/20117 [4:34:48<8:27:05,  2.37s/it]                                                                                                                                 {'loss': 0.2572, 'grad_norm': 0.4663153886795044, 'learning_rate': 0.000142962671238635, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 390.5, 'epoch': 0.72}
 36%|██████████████████████████████                                                     | 7280/20117 [4:34:48<8:27:05,  2.37s/it] 36%|██████████████████████████████                                                     | 7281/20117 [4:34:51<8:24:42,  2.36s/it] 36%|██████████████████████████████                                                     | 7282/20117 [4:34:53<8:23:40,  2.35s/it] 36%|██████████████████████████████                                                     | 7283/20117 [4:34:55<8:25:03,  2.36s/it] 36%|██████████████████████████████                                                     | 7284/20117 [4:34:58<8:20:08,  2.34s/it] 36%|██████████████████████████████                                                     | 7285/20117 [4:35:00<8:19:27,  2.34s/it] 36%|██████████████████████████████                                                     | 7286/20117 [4:35:02<8:16:11,  2.32s/it] 36%|██████████████████████████████                                                     | 7287/20117 [4:35:05<8:11:36,  2.30s/it] 36%|██████████████████████████████                                                     | 7288/20117 [4:35:07<8:12:02,  2.30s/it] 36%|██████████████████████████████                                                     | 7289/20117 [4:35:09<8:08:16,  2.28s/it] 36%|██████████████████████████████                                                     | 7290/20117 [4:35:12<8:11:56,  2.30s/it]                                                                                                                                 {'loss': 0.2024, 'grad_norm': 0.3563691973686218, 'learning_rate': 0.0001428208949064201, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 330.99, 'epoch': 0.72}
 36%|██████████████████████████████                                                     | 7290/20117 [4:35:12<8:11:56,  2.30s/it] 36%|██████████████████████████████                                                     | 7291/20117 [4:35:14<8:08:47,  2.29s/it] 36%|██████████████████████████████                                                     | 7292/20117 [4:35:16<8:06:46,  2.28s/it] 36%|██████████████████████████████                                                     | 7293/20117 [4:35:18<8:08:22,  2.28s/it] 36%|██████████████████████████████                                                     | 7294/20117 [4:35:21<8:09:29,  2.29s/it] 36%|██████████████████████████████                                                     | 7295/20117 [4:35:23<8:07:07,  2.28s/it] 36%|██████████████████████████████                                                     | 7296/20117 [4:35:25<8:14:58,  2.32s/it] 36%|██████████████████████████████                                                     | 7297/20117 [4:35:28<8:08:29,  2.29s/it] 36%|██████████████████████████████                                                     | 7298/20117 [4:35:30<8:10:09,  2.29s/it] 36%|██████████████████████████████                                                     | 7299/20117 [4:35:32<8:07:32,  2.28s/it] 36%|██████████████████████████████                                                     | 7300/20117 [4:35:34<8:10:38,  2.30s/it]                                                                                                                                 {'loss': 0.2371, 'grad_norm': 0.2805791199207306, 'learning_rate': 0.00014267901309729066, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 410.4, 'epoch': 0.73}
 36%|██████████████████████████████                                                     | 7300/20117 [4:35:34<8:10:38,  2.30s/it] 36%|██████████████████████████████                                                     | 7301/20117 [4:35:37<8:11:26,  2.30s/it] 36%|██████████████████████████████▏                                                    | 7302/20117 [4:35:39<8:10:15,  2.30s/it] 36%|██████████████████████████████▏                                                    | 7303/20117 [4:35:41<8:05:05,  2.27s/it] 36%|██████████████████████████████▏                                                    | 7304/20117 [4:35:43<8:05:10,  2.27s/it] 36%|██████████████████████████████▏                                                    | 7305/20117 [4:35:46<8:02:04,  2.26s/it] 36%|██████████████████████████████▏                                                    | 7306/20117 [4:35:48<8:04:29,  2.27s/it] 36%|██████████████████████████████▏                                                    | 7307/20117 [4:35:50<8:11:29,  2.30s/it] 36%|██████████████████████████████▏                                                    | 7308/20117 [4:35:53<8:04:33,  2.27s/it] 36%|██████████████████████████████▏                                                    | 7309/20117 [4:35:55<8:06:33,  2.28s/it] 36%|██████████████████████████████▏                                                    | 7310/20117 [4:35:57<8:07:21,  2.28s/it]                                                                                                                                 {'loss': 0.231, 'grad_norm': 0.30967897176742554, 'learning_rate': 0.00014253702616073155, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 302.94, 'epoch': 0.73}
 36%|██████████████████████████████▏                                                    | 7310/20117 [4:35:57<8:07:21,  2.28s/it] 36%|██████████████████████████████▏                                                    | 7311/20117 [4:35:59<8:00:46,  2.25s/it] 36%|██████████████████████████████▏                                                    | 7312/20117 [4:36:02<8:08:53,  2.29s/it] 36%|██████████████████████████████▏                                                    | 7313/20117 [4:36:04<8:05:30,  2.28s/it] 36%|██████████████████████████████▏                                                    | 7314/20117 [4:36:06<8:02:57,  2.26s/it] 36%|██████████████████████████████▏                                                    | 7315/20117 [4:36:09<8:04:22,  2.27s/it] 36%|██████████████████████████████▏                                                    | 7316/20117 [4:36:11<8:01:21,  2.26s/it] 36%|██████████████████████████████▏                                                    | 7317/20117 [4:36:13<8:01:13,  2.26s/it] 36%|██████████████████████████████▏                                                    | 7318/20117 [4:36:15<8:03:03,  2.26s/it] 36%|██████████████████████████████▏                                                    | 7319/20117 [4:36:18<8:03:20,  2.27s/it] 36%|██████████████████████████████▏                                                    | 7320/20117 [4:36:20<8:07:30,  2.29s/it]                                                                                                                                 {'loss': 0.1885, 'grad_norm': 0.353834867477417, 'learning_rate': 0.00014239493444648658, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 296.75, 'epoch': 0.73}
 36%|██████████████████████████████▏                                                    | 7320/20117 [4:36:20<8:07:30,  2.29s/it] 36%|██████████████████████████████▏                                                    | 7321/20117 [4:36:22<8:09:22,  2.29s/it] 36%|██████████████████████████████▏                                                    | 7322/20117 [4:36:24<8:10:01,  2.30s/it] 36%|██████████████████████████████▏                                                    | 7323/20117 [4:36:27<8:12:12,  2.31s/it] 36%|██████████████████████████████▏                                                    | 7324/20117 [4:36:29<8:11:39,  2.31s/it] 36%|██████████████████████████████▏                                                    | 7325/20117 [4:36:31<8:11:10,  2.30s/it] 36%|██████████████████████████████▏                                                    | 7326/20117 [4:36:34<8:33:41,  2.41s/it] 36%|██████████████████████████████▏                                                    | 7327/20117 [4:36:36<8:20:00,  2.35s/it] 36%|██████████████████████████████▏                                                    | 7328/20117 [4:36:39<8:20:12,  2.35s/it] 36%|██████████████████████████████▏                                                    | 7329/20117 [4:36:41<8:15:30,  2.32s/it] 36%|██████████████████████████████▏                                                    | 7330/20117 [4:36:43<8:06:23,  2.28s/it]                                                                                                                                 {'loss': 0.2713, 'grad_norm': 0.32480210065841675, 'learning_rate': 0.00014225273830455773, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 389.69, 'epoch': 0.73}
 36%|██████████████████████████████▏                                                    | 7330/20117 [4:36:43<8:06:23,  2.28s/it] 36%|██████████████████████████████▏                                                    | 7331/20117 [4:36:45<8:08:15,  2.29s/it] 36%|██████████████████████████████▎                                                    | 7332/20117 [4:36:48<8:05:32,  2.28s/it] 36%|██████████████████████████████▎                                                    | 7333/20117 [4:36:50<8:08:46,  2.29s/it] 36%|██████████████████████████████▎                                                    | 7334/20117 [4:36:52<8:08:51,  2.29s/it] 36%|██████████████████████████████▎                                                    | 7335/20117 [4:36:55<8:07:24,  2.29s/it] 36%|██████████████████████████████▎                                                    | 7336/20117 [4:36:57<8:06:03,  2.28s/it] 36%|██████████████████████████████▎                                                    | 7337/20117 [4:36:59<8:07:53,  2.29s/it] 36%|██████████████████████████████▎                                                    | 7338/20117 [4:37:01<8:02:50,  2.27s/it] 36%|██████████████████████████████▎                                                    | 7339/20117 [4:37:04<8:05:41,  2.28s/it] 36%|██████████████████████████████▎                                                    | 7340/20117 [4:37:06<8:03:28,  2.27s/it]                                                                                                                                 {'loss': 0.3248, 'grad_norm': 0.6818671226501465, 'learning_rate': 0.00014211043808520405, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 355.32, 'epoch': 0.73}
 36%|██████████████████████████████▎                                                    | 7340/20117 [4:37:06<8:03:28,  2.27s/it] 36%|██████████████████████████████▎                                                    | 7341/20117 [4:37:08<8:03:06,  2.27s/it] 36%|██████████████████████████████▎                                                    | 7342/20117 [4:37:10<8:06:14,  2.28s/it] 37%|██████████████████████████████▎                                                    | 7343/20117 [4:37:13<8:06:50,  2.29s/it] 37%|██████████████████████████████▎                                                    | 7344/20117 [4:37:15<8:04:41,  2.28s/it] 37%|██████████████████████████████▎                                                    | 7345/20117 [4:37:17<8:04:23,  2.28s/it] 37%|██████████████████████████████▎                                                    | 7346/20117 [4:37:19<7:58:50,  2.25s/it] 37%|██████████████████████████████▎                                                    | 7347/20117 [4:37:22<8:02:32,  2.27s/it] 37%|██████████████████████████████▎                                                    | 7348/20117 [4:37:24<8:03:30,  2.27s/it] 37%|██████████████████████████████▎                                                    | 7349/20117 [4:37:26<8:02:02,  2.27s/it] 37%|██████████████████████████████▎                                                    | 7350/20117 [4:37:29<8:03:14,  2.27s/it]                                                                                                                                 {'loss': 0.2262, 'grad_norm': 0.5786187648773193, 'learning_rate': 0.0001419680341389412, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 321.29, 'epoch': 0.73}
 37%|██████████████████████████████▎                                                    | 7350/20117 [4:37:29<8:03:14,  2.27s/it] 37%|██████████████████████████████▎                                                    | 7351/20117 [4:37:31<8:01:38,  2.26s/it] 37%|██████████████████████████████▎                                                    | 7352/20117 [4:37:33<8:01:59,  2.27s/it] 37%|██████████████████████████████▎                                                    | 7353/20117 [4:37:35<8:06:06,  2.29s/it] 37%|██████████████████████████████▎                                                    | 7354/20117 [4:37:38<8:01:37,  2.26s/it] 37%|██████████████████████████████▎                                                    | 7355/20117 [4:37:40<8:02:53,  2.27s/it] 37%|██████████████████████████████▎                                                    | 7356/20117 [4:37:42<8:05:26,  2.28s/it] 37%|██████████████████████████████▎                                                    | 7357/20117 [4:37:45<8:06:36,  2.29s/it] 37%|██████████████████████████████▎                                                    | 7358/20117 [4:37:47<8:04:44,  2.28s/it] 37%|██████████████████████████████▎                                                    | 7359/20117 [4:37:49<8:02:21,  2.27s/it] 37%|██████████████████████████████▎                                                    | 7360/20117 [4:37:51<8:00:54,  2.26s/it]                                                                                                                                 {'loss': 0.2653, 'grad_norm': 0.5133084058761597, 'learning_rate': 0.0001418255268165401, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 348.43, 'epoch': 0.73}
 37%|██████████████████████████████▎                                                    | 7360/20117 [4:37:51<8:00:54,  2.26s/it] 37%|██████████████████████████████▎                                                    | 7361/20117 [4:37:54<8:04:28,  2.28s/it] 37%|██████████████████████████████▎                                                    | 7362/20117 [4:37:56<8:01:32,  2.27s/it] 37%|██████████████████████████████▍                                                    | 7363/20117 [4:37:58<7:59:48,  2.26s/it] 37%|██████████████████████████████▍                                                    | 7364/20117 [4:38:00<8:02:46,  2.27s/it] 37%|██████████████████████████████▍                                                    | 7365/20117 [4:38:03<7:58:26,  2.25s/it] 37%|██████████████████████████████▍                                                    | 7366/20117 [4:38:05<8:06:42,  2.29s/it] 37%|██████████████████████████████▍                                                    | 7367/20117 [4:38:07<8:09:17,  2.30s/it] 37%|██████████████████████████████▍                                                    | 7368/20117 [4:38:10<8:06:13,  2.29s/it] 37%|██████████████████████████████▍                                                    | 7369/20117 [4:38:12<8:06:03,  2.29s/it] 37%|██████████████████████████████▍                                                    | 7370/20117 [4:38:14<8:09:53,  2.31s/it]                                                                                                                                 {'loss': 0.2312, 'grad_norm': 0.4247760474681854, 'learning_rate': 0.0001416829164690264, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.14, 'epoch': 0.73}
 37%|██████████████████████████████▍                                                    | 7370/20117 [4:38:14<8:09:53,  2.31s/it] 37%|██████████████████████████████▍                                                    | 7371/20117 [4:38:16<8:06:58,  2.29s/it] 37%|██████████████████████████████▍                                                    | 7372/20117 [4:38:19<8:11:21,  2.31s/it] 37%|██████████████████████████████▍                                                    | 7373/20117 [4:38:21<8:10:43,  2.31s/it] 37%|██████████████████████████████▍                                                    | 7374/20117 [4:38:23<8:07:42,  2.30s/it] 37%|██████████████████████████████▍                                                    | 7375/20117 [4:38:26<8:09:14,  2.30s/it] 37%|██████████████████████████████▍                                                    | 7376/20117 [4:38:28<8:06:18,  2.29s/it] 37%|██████████████████████████████▍                                                    | 7377/20117 [4:38:30<8:09:58,  2.31s/it] 37%|██████████████████████████████▍                                                    | 7378/20117 [4:38:33<8:09:47,  2.31s/it] 37%|██████████████████████████████▍                                                    | 7379/20117 [4:38:35<8:03:58,  2.28s/it] 37%|██████████████████████████████▍                                                    | 7380/20117 [4:38:37<8:17:33,  2.34s/it]                                                                                                                                 {'loss': 0.2825, 'grad_norm': 0.32232165336608887, 'learning_rate': 0.00014154020344767955, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 379.26, 'epoch': 0.73}
 37%|██████████████████████████████▍                                                    | 7380/20117 [4:38:37<8:17:33,  2.34s/it] 37%|██████████████████████████████▍                                                    | 7381/20117 [4:38:40<8:10:52,  2.31s/it] 37%|██████████████████████████████▍                                                    | 7382/20117 [4:38:42<8:19:09,  2.35s/it] 37%|██████████████████████████████▍                                                    | 7383/20117 [4:38:44<8:18:18,  2.35s/it] 37%|██████████████████████████████▍                                                    | 7384/20117 [4:38:47<8:14:10,  2.33s/it] 37%|██████████████████████████████▍                                                    | 7385/20117 [4:38:49<8:23:53,  2.37s/it] 37%|██████████████████████████████▍                                                    | 7386/20117 [4:38:52<8:26:40,  2.39s/it] 37%|██████████████████████████████▍                                                    | 7387/20117 [4:38:54<8:26:13,  2.39s/it] 37%|██████████████████████████████▍                                                    | 7388/20117 [4:38:56<8:26:35,  2.39s/it] 37%|██████████████████████████████▍                                                    | 7389/20117 [4:38:59<8:23:51,  2.38s/it] 37%|██████████████████████████████▍                                                    | 7390/20117 [4:39:01<8:19:58,  2.36s/it]                                                                                                                                 {'loss': 0.2205, 'grad_norm': 0.4452918767929077, 'learning_rate': 0.0001413973881040319, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.25, 'epoch': 0.73}
 37%|██████████████████████████████▍                                                    | 7390/20117 [4:39:01<8:19:58,  2.36s/it] 37%|██████████████████████████████▍                                                    | 7391/20117 [4:39:03<8:12:03,  2.32s/it] 37%|██████████████████████████████▍                                                    | 7392/20117 [4:39:05<8:08:42,  2.30s/it] 37%|██████████████████████████████▌                                                    | 7393/20117 [4:39:08<8:00:07,  2.26s/it] 37%|██████████████████████████████▌                                                    | 7394/20117 [4:39:10<7:55:22,  2.24s/it] 37%|██████████████████████████████▌                                                    | 7395/20117 [4:39:12<7:51:47,  2.23s/it] 37%|██████████████████████████████▌                                                    | 7396/20117 [4:39:14<7:58:04,  2.25s/it] 37%|██████████████████████████████▌                                                    | 7397/20117 [4:39:17<8:04:54,  2.29s/it] 37%|██████████████████████████████▌                                                    | 7398/20117 [4:39:19<8:08:44,  2.31s/it] 37%|██████████████████████████████▌                                                    | 7399/20117 [4:39:21<8:10:31,  2.31s/it] 37%|██████████████████████████████▌                                                    | 7400/20117 [4:39:24<8:07:02,  2.30s/it]                                                                                                                                 {'loss': 0.2868, 'grad_norm': 0.3855791985988617, 'learning_rate': 0.0001412544707898678, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 410.81, 'epoch': 0.74}
 37%|██████████████████████████████▌                                                    | 7400/20117 [4:39:24<8:07:02,  2.30s/it] 37%|██████████████████████████████▌                                                    | 7401/20117 [4:39:26<8:06:30,  2.30s/it] 37%|██████████████████████████████▌                                                    | 7402/20117 [4:39:28<8:05:15,  2.29s/it] 37%|██████████████████████████████▌                                                    | 7403/20117 [4:39:31<8:03:59,  2.28s/it] 37%|██████████████████████████████▌                                                    | 7404/20117 [4:39:33<8:06:04,  2.29s/it] 37%|██████████████████████████████▌                                                    | 7405/20117 [4:39:35<8:05:32,  2.29s/it] 37%|██████████████████████████████▌                                                    | 7406/20117 [4:39:37<8:04:51,  2.29s/it] 37%|██████████████████████████████▌                                                    | 7407/20117 [4:39:40<8:04:29,  2.29s/it] 37%|██████████████████████████████▌                                                    | 7408/20117 [4:39:42<8:05:39,  2.29s/it] 37%|██████████████████████████████▌                                                    | 7409/20117 [4:39:44<8:03:39,  2.28s/it] 37%|██████████████████████████████▌                                                    | 7410/20117 [4:39:46<8:01:57,  2.28s/it]                                                                                                                                 {'loss': 0.2523, 'grad_norm': 0.42609113454818726, 'learning_rate': 0.00014111145185722283, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 331.0, 'epoch': 0.74}
 37%|██████████████████████████████▌                                                    | 7410/20117 [4:39:47<8:01:57,  2.28s/it] 37%|██████████████████████████████▌                                                    | 7411/20117 [4:39:49<8:00:37,  2.27s/it] 37%|██████████████████████████████▌                                                    | 7412/20117 [4:39:51<8:02:47,  2.28s/it] 37%|██████████████████████████████▌                                                    | 7413/20117 [4:39:53<8:02:22,  2.28s/it] 37%|██████████████████████████████▌                                                    | 7414/20117 [4:39:56<8:12:48,  2.33s/it] 37%|██████████████████████████████▌                                                    | 7415/20117 [4:39:58<8:12:48,  2.33s/it] 37%|██████████████████████████████▌                                                    | 7416/20117 [4:40:00<8:11:44,  2.32s/it] 37%|██████████████████████████████▌                                                    | 7417/20117 [4:40:03<8:07:54,  2.31s/it] 37%|██████████████████████████████▌                                                    | 7418/20117 [4:40:05<8:08:17,  2.31s/it] 37%|██████████████████████████████▌                                                    | 7419/20117 [4:40:07<8:06:05,  2.30s/it] 37%|██████████████████████████████▌                                                    | 7420/20117 [4:40:10<8:06:47,  2.30s/it]                                                                                                                                 {'loss': 0.2962, 'grad_norm': 0.47836732864379883, 'learning_rate': 0.00014096833165838283, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 344.4, 'epoch': 0.74}
 37%|██████████████████████████████▌                                                    | 7420/20117 [4:40:10<8:06:47,  2.30s/it] 37%|██████████████████████████████▌                                                    | 7421/20117 [4:40:12<8:11:09,  2.32s/it] 37%|██████████████████████████████▌                                                    | 7422/20117 [4:40:14<8:09:54,  2.32s/it] 37%|██████████████████████████████▋                                                    | 7423/20117 [4:40:17<8:09:40,  2.31s/it] 37%|██████████████████████████████▋                                                    | 7424/20117 [4:40:19<8:14:15,  2.34s/it] 37%|██████████████████████████████▋                                                    | 7425/20117 [4:40:21<8:12:33,  2.33s/it] 37%|██████████████████████████████▋                                                    | 7426/20117 [4:40:24<8:12:46,  2.33s/it] 37%|██████████████████████████████▋                                                    | 7427/20117 [4:40:26<8:09:09,  2.31s/it] 37%|██████████████████████████████▋                                                    | 7428/20117 [4:40:28<8:10:14,  2.32s/it] 37%|██████████████████████████████▋                                                    | 7429/20117 [4:40:31<8:10:46,  2.32s/it] 37%|██████████████████████████████▋                                                    | 7430/20117 [4:40:33<8:10:04,  2.32s/it]                                                                                                                                 {'loss': 0.3254, 'grad_norm': 0.508818507194519, 'learning_rate': 0.0001408251105458831, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 389.2, 'epoch': 0.74}
 37%|██████████████████████████████▋                                                    | 7430/20117 [4:40:33<8:10:04,  2.32s/it] 37%|██████████████████████████████▋                                                    | 7431/20117 [4:40:35<8:15:22,  2.34s/it] 37%|██████████████████████████████▋                                                    | 7432/20117 [4:40:38<8:35:01,  2.44s/it] 37%|██████████████████████████████▋                                                    | 7433/20117 [4:40:40<8:26:08,  2.39s/it] 37%|██████████████████████████████▋                                                    | 7434/20117 [4:40:42<8:21:22,  2.37s/it] 37%|██████████████████████████████▋                                                    | 7435/20117 [4:40:45<8:14:59,  2.34s/it] 37%|██████████████████████████████▋                                                    | 7436/20117 [4:40:47<8:13:05,  2.33s/it] 37%|██████████████████████████████▋                                                    | 7437/20117 [4:40:49<8:12:30,  2.33s/it] 37%|██████████████████████████████▋                                                    | 7438/20117 [4:40:52<8:08:47,  2.31s/it] 37%|██████████████████████████████▋                                                    | 7439/20117 [4:40:54<8:06:42,  2.30s/it] 37%|██████████████████████████████▋                                                    | 7440/20117 [4:40:56<8:03:50,  2.29s/it]                                                                                                                                 {'loss': 0.2353, 'grad_norm': 0.3887844681739807, 'learning_rate': 0.00014068178887250752, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 380.02, 'epoch': 0.74}
 37%|██████████████████████████████▋                                                    | 7440/20117 [4:40:56<8:03:50,  2.29s/it] 37%|██████████████████████████████▋                                                    | 7441/20117 [4:40:58<8:01:55,  2.28s/it] 37%|██████████████████████████████▋                                                    | 7442/20117 [4:41:01<8:04:18,  2.29s/it] 37%|██████████████████████████████▋                                                    | 7443/20117 [4:41:03<8:06:35,  2.30s/it] 37%|██████████████████████████████▋                                                    | 7444/20117 [4:41:05<8:10:26,  2.32s/it] 37%|██████████████████████████████▋                                                    | 7445/20117 [4:41:08<8:09:26,  2.32s/it] 37%|██████████████████████████████▋                                                    | 7446/20117 [4:41:10<8:07:51,  2.31s/it] 37%|██████████████████████████████▋                                                    | 7447/20117 [4:41:12<8:09:15,  2.32s/it] 37%|██████████████████████████████▋                                                    | 7448/20117 [4:41:15<8:02:48,  2.29s/it] 37%|██████████████████████████████▋                                                    | 7449/20117 [4:41:17<7:56:48,  2.26s/it] 37%|██████████████████████████████▋                                                    | 7450/20117 [4:41:19<8:02:04,  2.28s/it]                                                                                                                                 {'loss': 0.2424, 'grad_norm': 0.41547468304634094, 'learning_rate': 0.00014053836699128765, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 376.26, 'epoch': 0.74}
 37%|██████████████████████████████▋                                                    | 7450/20117 [4:41:19<8:02:04,  2.28s/it] 37%|██████████████████████████████▋                                                    | 7451/20117 [4:41:21<7:59:57,  2.27s/it] 37%|██████████████████████████████▋                                                    | 7452/20117 [4:41:24<8:03:35,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7453/20117 [4:41:26<8:07:37,  2.31s/it] 37%|██████████████████████████████▊                                                    | 7454/20117 [4:41:28<8:02:18,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7455/20117 [4:41:31<8:03:01,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7456/20117 [4:41:33<8:05:10,  2.30s/it] 37%|██████████████████████████████▊                                                    | 7457/20117 [4:41:35<8:02:30,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7458/20117 [4:41:38<8:03:37,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7459/20117 [4:41:40<8:02:04,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7460/20117 [4:41:42<7:58:51,  2.27s/it]                                                                                                                                 {'loss': 0.2329, 'grad_norm': 0.5015019178390503, 'learning_rate': 0.00014039484525550186, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.38, 'epoch': 0.74}
 37%|██████████████████████████████▊                                                    | 7460/20117 [4:41:42<7:58:51,  2.27s/it] 37%|██████████████████████████████▊                                                    | 7461/20117 [4:41:44<8:00:39,  2.28s/it] 37%|██████████████████████████████▊                                                    | 7462/20117 [4:41:47<7:59:32,  2.27s/it] 37%|██████████████████████████████▊                                                    | 7463/20117 [4:41:49<8:01:21,  2.28s/it] 37%|██████████████████████████████▊                                                    | 7464/20117 [4:41:51<8:03:07,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7465/20117 [4:41:53<7:58:23,  2.27s/it] 37%|██████████████████████████████▊                                                    | 7466/20117 [4:41:56<8:00:30,  2.28s/it] 37%|██████████████████████████████▊                                                    | 7467/20117 [4:41:58<8:01:47,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7468/20117 [4:42:00<7:58:42,  2.27s/it] 37%|██████████████████████████████▊                                                    | 7469/20117 [4:42:03<7:57:56,  2.27s/it] 37%|██████████████████████████████▊                                                    | 7470/20117 [4:42:05<8:00:44,  2.28s/it]                                                                                                                                 {'loss': 0.2186, 'grad_norm': 0.43546929955482483, 'learning_rate': 0.0001402512240186746, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 303.33, 'epoch': 0.74}
 37%|██████████████████████████████▊                                                    | 7470/20117 [4:42:05<8:00:44,  2.28s/it] 37%|██████████████████████████████▊                                                    | 7471/20117 [4:42:07<8:01:14,  2.28s/it] 37%|██████████████████████████████▊                                                    | 7472/20117 [4:42:09<8:03:14,  2.29s/it] 37%|██████████████████████████████▊                                                    | 7473/20117 [4:42:12<8:06:25,  2.31s/it] 37%|██████████████████████████████▊                                                    | 7474/20117 [4:42:14<8:06:28,  2.31s/it] 37%|██████████████████████████████▊                                                    | 7475/20117 [4:42:16<8:10:37,  2.33s/it] 37%|██████████████████████████████▊                                                    | 7476/20117 [4:42:19<8:09:24,  2.32s/it] 37%|██████████████████████████████▊                                                    | 7477/20117 [4:42:21<8:13:44,  2.34s/it] 37%|██████████████████████████████▊                                                    | 7478/20117 [4:42:23<8:12:50,  2.34s/it] 37%|██████████████████████████████▊                                                    | 7479/20117 [4:42:26<8:13:50,  2.34s/it] 37%|██████████████████████████████▊                                                    | 7480/20117 [4:42:28<8:15:46,  2.35s/it]                                                                                                                                 {'loss': 0.2439, 'grad_norm': 0.5051418542861938, 'learning_rate': 0.0001401075036345753, 'memory/max_active (GiB)': 19.19, 'memory/max_allocated (GiB)': 19.19, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 375.31, 'epoch': 0.74}
 37%|██████████████████████████████▊                                                    | 7480/20117 [4:42:28<8:15:46,  2.35s/it] 37%|██████████████████████████████▊                                                    | 7481/20117 [4:42:31<8:17:34,  2.36s/it] 37%|██████████████████████████████▊                                                    | 7482/20117 [4:42:33<8:19:32,  2.37s/it] 37%|██████████████████████████████▊                                                    | 7483/20117 [4:42:35<8:18:45,  2.37s/it] 37%|██████████████████████████████▉                                                    | 7484/20117 [4:42:38<8:15:36,  2.35s/it] 37%|██████████████████████████████▉                                                    | 7485/20117 [4:42:40<8:35:01,  2.45s/it] 37%|██████████████████████████████▉                                                    | 7486/20117 [4:42:43<8:25:42,  2.40s/it] 37%|██████████████████████████████▉                                                    | 7487/20117 [4:42:45<8:18:00,  2.37s/it] 37%|██████████████████████████████▉                                                    | 7488/20117 [4:42:47<8:12:36,  2.34s/it] 37%|██████████████████████████████▉                                                    | 7489/20117 [4:42:49<8:05:21,  2.31s/it] 37%|██████████████████████████████▉                                                    | 7490/20117 [4:42:52<8:04:58,  2.30s/it]                                                                                                                                 {'loss': 0.277, 'grad_norm': 0.35766085982322693, 'learning_rate': 0.0001399636844572176, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.63, 'epoch': 0.74}
 37%|██████████████████████████████▉                                                    | 7490/20117 [4:42:52<8:04:58,  2.30s/it] 37%|██████████████████████████████▉                                                    | 7491/20117 [4:42:54<8:03:41,  2.30s/it] 37%|██████████████████████████████▉                                                    | 7492/20117 [4:42:56<8:03:41,  2.30s/it] 37%|██████████████████████████████▉                                                    | 7493/20117 [4:42:59<8:06:48,  2.31s/it] 37%|██████████████████████████████▉                                                    | 7494/20117 [4:43:01<8:04:32,  2.30s/it] 37%|██████████████████████████████▉                                                    | 7495/20117 [4:43:03<8:00:54,  2.29s/it] 37%|██████████████████████████████▉                                                    | 7496/20117 [4:43:06<8:03:25,  2.30s/it] 37%|██████████████████████████████▉                                                    | 7497/20117 [4:43:08<8:02:04,  2.29s/it] 37%|██████████████████████████████▉                                                    | 7498/20117 [4:43:10<8:09:05,  2.33s/it] 37%|██████████████████████████████▉                                                    | 7499/20117 [4:43:13<8:10:55,  2.33s/it] 37%|██████████████████████████████▉                                                    | 7500/20117 [4:43:15<8:10:04,  2.33s/it]                                                                                                                                 {'loss': 0.2474, 'grad_norm': 0.5930467247962952, 'learning_rate': 0.0001398197668408586, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 396.18, 'epoch': 0.75}
 37%|██████████████████████████████▉                                                    | 7500/20117 [4:43:15<8:10:04,  2.33s/it] 37%|██████████████████████████████▉                                                    | 7501/20117 [4:43:17<8:14:13,  2.35s/it] 37%|██████████████████████████████▉                                                    | 7502/20117 [4:43:20<8:12:28,  2.34s/it] 37%|██████████████████████████████▉                                                    | 7503/20117 [4:43:22<8:16:54,  2.36s/it] 37%|██████████████████████████████▉                                                    | 7504/20117 [4:43:24<8:15:01,  2.35s/it] 37%|██████████████████████████████▉                                                    | 7505/20117 [4:43:27<8:18:22,  2.37s/it] 37%|██████████████████████████████▉                                                    | 7506/20117 [4:43:29<8:12:50,  2.34s/it] 37%|██████████████████████████████▉                                                    | 7507/20117 [4:43:31<8:06:54,  2.32s/it] 37%|██████████████████████████████▉                                                    | 7508/20117 [4:43:34<8:08:19,  2.32s/it] 37%|██████████████████████████████▉                                                    | 7509/20117 [4:43:36<8:07:46,  2.32s/it] 37%|██████████████████████████████▉                                                    | 7510/20117 [4:43:38<8:06:26,  2.32s/it]                                                                                                                                 {'loss': 0.2408, 'grad_norm': 0.4920576810836792, 'learning_rate': 0.00013967575113999777, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 382.95, 'epoch': 0.75}
 37%|██████████████████████████████▉                                                    | 7510/20117 [4:43:38<8:06:26,  2.32s/it] 37%|██████████████████████████████▉                                                    | 7511/20117 [4:43:41<8:04:19,  2.31s/it] 37%|██████████████████████████████▉                                                    | 7512/20117 [4:43:43<8:08:49,  2.33s/it] 37%|██████████████████████████████▉                                                    | 7513/20117 [4:43:45<8:05:30,  2.31s/it] 37%|███████████████████████████████                                                    | 7514/20117 [4:43:47<8:04:40,  2.31s/it] 37%|███████████████████████████████                                                    | 7515/20117 [4:43:50<8:08:51,  2.33s/it] 37%|███████████████████████████████                                                    | 7516/20117 [4:43:52<8:06:11,  2.32s/it] 37%|███████████████████████████████                                                    | 7517/20117 [4:43:55<8:11:09,  2.34s/it] 37%|███████████████████████████████                                                    | 7518/20117 [4:43:57<8:12:52,  2.35s/it] 37%|███████████████████████████████                                                    | 7519/20117 [4:43:59<8:08:24,  2.33s/it] 37%|███████████████████████████████                                                    | 7520/20117 [4:44:02<8:13:16,  2.35s/it]                                                                                                                                 {'loss': 0.2249, 'grad_norm': 0.44312262535095215, 'learning_rate': 0.0001395316377093762, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 387.85, 'epoch': 0.75}
 37%|███████████████████████████████                                                    | 7520/20117 [4:44:02<8:13:16,  2.35s/it] 37%|███████████████████████████████                                                    | 7521/20117 [4:44:04<8:06:48,  2.32s/it] 37%|███████████████████████████████                                                    | 7522/20117 [4:44:06<8:07:38,  2.32s/it] 37%|███████████████████████████████                                                    | 7523/20117 [4:44:08<8:07:17,  2.32s/it] 37%|███████████████████████████████                                                    | 7524/20117 [4:44:11<8:09:09,  2.33s/it] 37%|███████████████████████████████                                                    | 7525/20117 [4:44:13<8:12:48,  2.35s/it] 37%|███████████████████████████████                                                    | 7526/20117 [4:44:16<8:08:54,  2.33s/it] 37%|███████████████████████████████                                                    | 7527/20117 [4:44:18<8:07:17,  2.32s/it] 37%|███████████████████████████████                                                    | 7528/20117 [4:44:20<8:04:51,  2.31s/it] 37%|███████████████████████████████                                                    | 7529/20117 [4:44:22<8:00:41,  2.29s/it] 37%|███████████████████████████████                                                    | 7530/20117 [4:44:25<7:59:48,  2.29s/it]                                                                                                                                 {'loss': 0.2141, 'grad_norm': 0.4043440818786621, 'learning_rate': 0.00013938742690397575, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.87, 'epoch': 0.75}
 37%|███████████████████████████████                                                    | 7530/20117 [4:44:25<7:59:48,  2.29s/it] 37%|███████████████████████████████                                                    | 7531/20117 [4:44:27<8:00:02,  2.29s/it] 37%|███████████████████████████████                                                    | 7532/20117 [4:44:29<8:03:09,  2.30s/it] 37%|███████████████████████████████                                                    | 7533/20117 [4:44:32<8:06:08,  2.32s/it] 37%|███████████████████████████████                                                    | 7534/20117 [4:44:34<8:09:10,  2.33s/it] 37%|███████████████████████████████                                                    | 7535/20117 [4:44:36<8:08:58,  2.33s/it] 37%|███████████████████████████████                                                    | 7536/20117 [4:44:39<8:03:56,  2.31s/it] 37%|███████████████████████████████                                                    | 7537/20117 [4:44:41<8:01:14,  2.30s/it] 37%|███████████████████████████████                                                    | 7538/20117 [4:44:43<8:21:17,  2.39s/it] 37%|███████████████████████████████                                                    | 7539/20117 [4:44:46<8:12:17,  2.35s/it] 37%|███████████████████████████████                                                    | 7540/20117 [4:44:48<8:10:53,  2.34s/it]                                                                                                                                 {'loss': 0.1528, 'grad_norm': 0.3910767138004303, 'learning_rate': 0.00013924311907901813, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 308.96, 'epoch': 0.75}
 37%|███████████████████████████████                                                    | 7540/20117 [4:44:48<8:10:53,  2.34s/it] 37%|███████████████████████████████                                                    | 7541/20117 [4:44:50<8:09:13,  2.33s/it] 37%|███████████████████████████████                                                    | 7542/20117 [4:44:53<8:06:26,  2.32s/it] 37%|███████████████████████████████                                                    | 7543/20117 [4:44:55<8:04:07,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7544/20117 [4:44:57<8:02:05,  2.30s/it] 38%|███████████████████████████████▏                                                   | 7545/20117 [4:44:59<8:00:27,  2.29s/it] 38%|███████████████████████████████▏                                                   | 7546/20117 [4:45:02<7:58:27,  2.28s/it] 38%|███████████████████████████████▏                                                   | 7547/20117 [4:45:04<8:01:45,  2.30s/it] 38%|███████████████████████████████▏                                                   | 7548/20117 [4:45:06<7:59:54,  2.29s/it] 38%|███████████████████████████████▏                                                   | 7549/20117 [4:45:09<8:00:40,  2.29s/it] 38%|███████████████████████████████▏                                                   | 7550/20117 [4:45:11<8:00:04,  2.29s/it]                                                                                                                                 {'loss': 0.2192, 'grad_norm': 0.3407839238643646, 'learning_rate': 0.00013909871458996399, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 316.74, 'epoch': 0.75}
 38%|███████████████████████████████▏                                                   | 7550/20117 [4:45:11<8:00:04,  2.29s/it] 38%|███████████████████████████████▏                                                   | 7551/20117 [4:45:13<8:03:00,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7552/20117 [4:45:16<8:02:49,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7553/20117 [4:45:18<8:03:12,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7554/20117 [4:45:20<8:03:07,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7555/20117 [4:45:22<8:02:47,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7556/20117 [4:45:25<7:56:59,  2.28s/it] 38%|███████████████████████████████▏                                                   | 7557/20117 [4:45:27<7:57:19,  2.28s/it] 38%|███████████████████████████████▏                                                   | 7558/20117 [4:45:29<8:03:39,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7559/20117 [4:45:32<8:02:34,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7560/20117 [4:45:34<8:04:26,  2.31s/it]                                                                                                                                 {'loss': 0.2317, 'grad_norm': 0.316240519285202, 'learning_rate': 0.00013895421379251207, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 332.89, 'epoch': 0.75}
 38%|███████████████████████████████▏                                                   | 7560/20117 [4:45:34<8:04:26,  2.31s/it] 38%|███████████████████████████████▏                                                   | 7561/20117 [4:45:36<8:05:07,  2.32s/it] 38%|███████████████████████████████▏                                                   | 7562/20117 [4:45:39<8:00:37,  2.30s/it] 38%|███████████████████████████████▏                                                   | 7563/20117 [4:45:41<7:55:25,  2.27s/it] 38%|███████████████████████████████▏                                                   | 7564/20117 [4:45:43<7:53:23,  2.26s/it] 38%|███████████████████████████████▏                                                   | 7565/20117 [4:45:45<7:54:08,  2.27s/it] 38%|███████████████████████████████▏                                                   | 7566/20117 [4:45:48<7:54:52,  2.27s/it] 38%|███████████████████████████████▏                                                   | 7567/20117 [4:45:50<7:55:02,  2.27s/it] 38%|███████████████████████████████▏                                                   | 7568/20117 [4:45:52<8:05:30,  2.32s/it] 38%|███████████████████████████████▏                                                   | 7569/20117 [4:45:55<8:11:15,  2.35s/it] 38%|███████████████████████████████▏                                                   | 7570/20117 [4:45:57<8:16:10,  2.37s/it]                                                                                                                                 {'loss': 0.2413, 'grad_norm': 0.49255135655403137, 'learning_rate': 0.00013880961704259846, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.94, 'epoch': 0.75}
 38%|███████████████████████████████▏                                                   | 7570/20117 [4:45:57<8:16:10,  2.37s/it] 38%|███████████████████████████████▏                                                   | 7571/20117 [4:45:59<8:14:42,  2.37s/it] 38%|███████████████████████████████▏                                                   | 7572/20117 [4:46:02<8:12:38,  2.36s/it] 38%|███████████████████████████████▏                                                   | 7573/20117 [4:46:04<8:11:25,  2.35s/it] 38%|███████████████████████████████▏                                                   | 7574/20117 [4:46:06<8:08:38,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7575/20117 [4:46:09<8:05:10,  2.32s/it] 38%|███████████████████████████████▎                                                   | 7576/20117 [4:46:11<8:05:29,  2.32s/it] 38%|███████████████████████████████▎                                                   | 7577/20117 [4:46:13<7:57:14,  2.28s/it] 38%|███████████████████████████████▎                                                   | 7578/20117 [4:46:15<7:53:06,  2.26s/it] 38%|███████████████████████████████▎                                                   | 7579/20117 [4:46:18<7:54:23,  2.27s/it] 38%|███████████████████████████████▎                                                   | 7580/20117 [4:46:20<8:00:23,  2.30s/it]                                                                                                                                 {'loss': 0.2434, 'grad_norm': 0.4979618489742279, 'learning_rate': 0.0001386649246963955, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 307.59, 'epoch': 0.75}
 38%|███████████████████████████████▎                                                   | 7580/20117 [4:46:20<8:00:23,  2.30s/it] 38%|███████████████████████████████▎                                                   | 7581/20117 [4:46:23<8:10:46,  2.35s/it] 38%|███████████████████████████████▎                                                   | 7582/20117 [4:46:25<8:10:02,  2.35s/it] 38%|███████████████████████████████▎                                                   | 7583/20117 [4:46:27<8:14:19,  2.37s/it] 38%|███████████████████████████████▎                                                   | 7584/20117 [4:46:30<8:09:03,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7585/20117 [4:46:32<8:14:53,  2.37s/it] 38%|███████████████████████████████▎                                                   | 7586/20117 [4:46:34<8:18:10,  2.39s/it] 38%|███████████████████████████████▎                                                   | 7587/20117 [4:46:37<8:15:06,  2.37s/it] 38%|███████████████████████████████▎                                                   | 7588/20117 [4:46:39<8:12:12,  2.36s/it] 38%|███████████████████████████████▎                                                   | 7589/20117 [4:46:42<8:27:16,  2.43s/it] 38%|███████████████████████████████▎                                                   | 7590/20117 [4:46:44<8:20:58,  2.40s/it]                                                                                                                                 {'loss': 0.2112, 'grad_norm': 0.2949107885360718, 'learning_rate': 0.00013852013711031095, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 363.41, 'epoch': 0.75}
 38%|███████████████████████████████▎                                                   | 7590/20117 [4:46:44<8:20:58,  2.40s/it] 38%|███████████████████████████████▎                                                   | 7591/20117 [4:46:46<8:15:36,  2.37s/it] 38%|███████████████████████████████▎                                                   | 7592/20117 [4:46:49<8:12:37,  2.36s/it] 38%|███████████████████████████████▎                                                   | 7593/20117 [4:46:51<8:09:26,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7594/20117 [4:46:53<8:05:17,  2.33s/it] 38%|███████████████████████████████▎                                                   | 7595/20117 [4:46:56<8:07:37,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7596/20117 [4:46:58<8:07:12,  2.33s/it] 38%|███████████████████████████████▎                                                   | 7597/20117 [4:47:00<8:08:27,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7598/20117 [4:47:03<8:09:10,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7599/20117 [4:47:05<8:07:09,  2.33s/it] 38%|███████████████████████████████▎                                                   | 7600/20117 [4:47:07<8:09:03,  2.34s/it]                                                                                                                                 {'loss': 0.2232, 'grad_norm': 0.3708727955818176, 'learning_rate': 0.0001383752546409873, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 386.01, 'epoch': 0.76}
 38%|███████████████████████████████▎                                                   | 7600/20117 [4:47:07<8:09:03,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7601/20117 [4:47:10<8:07:45,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7602/20117 [4:47:12<8:08:46,  2.34s/it] 38%|███████████████████████████████▎                                                   | 7603/20117 [4:47:14<8:12:18,  2.36s/it] 38%|███████████████████████████████▎                                                   | 7604/20117 [4:47:17<8:12:27,  2.36s/it] 38%|███████████████████████████████▍                                                   | 7605/20117 [4:47:19<8:11:01,  2.35s/it] 38%|███████████████████████████████▍                                                   | 7606/20117 [4:47:21<8:09:13,  2.35s/it] 38%|███████████████████████████████▍                                                   | 7607/20117 [4:47:24<8:14:40,  2.37s/it] 38%|███████████████████████████████▍                                                   | 7608/20117 [4:47:26<8:13:24,  2.37s/it] 38%|███████████████████████████████▍                                                   | 7609/20117 [4:47:29<8:12:40,  2.36s/it] 38%|███████████████████████████████▍                                                   | 7610/20117 [4:47:31<8:20:04,  2.40s/it]                                                                                                                                 {'loss': 0.2707, 'grad_norm': 0.6432907581329346, 'learning_rate': 0.00013823027764530067, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 313.62, 'epoch': 0.76}
 38%|███████████████████████████████▍                                                   | 7610/20117 [4:47:31<8:20:04,  2.40s/it] 38%|███████████████████████████████▍                                                   | 7611/20117 [4:47:33<8:17:09,  2.39s/it] 38%|███████████████████████████████▍                                                   | 7612/20117 [4:47:36<8:14:20,  2.37s/it] 38%|███████████████████████████████▍                                                   | 7613/20117 [4:47:38<8:13:47,  2.37s/it] 38%|███████████████████████████████▍                                                   | 7614/20117 [4:47:41<8:13:42,  2.37s/it] 38%|███████████████████████████████▍                                                   | 7615/20117 [4:47:43<8:24:15,  2.42s/it] 38%|███████████████████████████████▍                                                   | 7616/20117 [4:47:46<8:33:07,  2.46s/it] 38%|███████████████████████████████▍                                                   | 7617/20117 [4:47:48<8:35:17,  2.47s/it] 38%|███████████████████████████████▍                                                   | 7618/20117 [4:47:50<8:26:45,  2.43s/it] 38%|███████████████████████████████▍                                                   | 7619/20117 [4:47:53<8:20:11,  2.40s/it] 38%|███████████████████████████████▍                                                   | 7620/20117 [4:47:55<8:10:19,  2.35s/it]                                                                                                                                 {'loss': 0.2353, 'grad_norm': 0.3710288405418396, 'learning_rate': 0.00013808520648036005, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 432.88, 'epoch': 0.76}
 38%|███████████████████████████████▍                                                   | 7620/20117 [4:47:55<8:10:19,  2.35s/it] 38%|███████████████████████████████▍                                                   | 7621/20117 [4:47:57<8:08:21,  2.34s/it] 38%|███████████████████████████████▍                                                   | 7622/20117 [4:48:00<8:07:20,  2.34s/it] 38%|███████████████████████████████▍                                                   | 7623/20117 [4:48:02<8:06:49,  2.34s/it] 38%|███████████████████████████████▍                                                   | 7624/20117 [4:48:04<8:11:52,  2.36s/it] 38%|███████████████████████████████▍                                                   | 7625/20117 [4:48:07<8:06:57,  2.34s/it] 38%|███████████████████████████████▍                                                   | 7626/20117 [4:48:09<8:03:46,  2.32s/it] 38%|███████████████████████████████▍                                                   | 7627/20117 [4:48:11<8:05:42,  2.33s/it] 38%|███████████████████████████████▍                                                   | 7628/20117 [4:48:14<8:05:33,  2.33s/it] 38%|███████████████████████████████▍                                                   | 7629/20117 [4:48:16<8:09:45,  2.35s/it] 38%|███████████████████████████████▍                                                   | 7630/20117 [4:48:18<8:08:07,  2.35s/it]                                                                                                                                 {'loss': 0.212, 'grad_norm': 0.3577297031879425, 'learning_rate': 0.00013794004150350636, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 323.27, 'epoch': 0.76}
 38%|███████████████████████████████▍                                                   | 7630/20117 [4:48:18<8:08:07,  2.35s/it] 38%|███████████████████████████████▍                                                   | 7631/20117 [4:48:21<8:05:20,  2.33s/it] 38%|███████████████████████████████▍                                                   | 7632/20117 [4:48:23<8:10:25,  2.36s/it] 38%|███████████████████████████████▍                                                   | 7633/20117 [4:48:25<8:06:30,  2.34s/it] 38%|███████████████████████████████▍                                                   | 7634/20117 [4:48:28<8:06:39,  2.34s/it] 38%|███████████████████████████████▌                                                   | 7635/20117 [4:48:30<8:09:32,  2.35s/it] 38%|███████████████████████████████▌                                                   | 7636/20117 [4:48:33<8:11:36,  2.36s/it] 38%|███████████████████████████████▌                                                   | 7637/20117 [4:48:35<8:14:27,  2.38s/it] 38%|███████████████████████████████▌                                                   | 7638/20117 [4:48:37<8:17:39,  2.39s/it] 38%|███████████████████████████████▌                                                   | 7639/20117 [4:48:40<8:15:31,  2.38s/it] 38%|███████████████████████████████▌                                                   | 7640/20117 [4:48:42<8:36:12,  2.48s/it]                                                                                                                                 {'loss': 0.2747, 'grad_norm': 0.4883553385734558, 'learning_rate': 0.00013779478307231164, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 292.24, 'epoch': 0.76}
 38%|███████████████████████████████▌                                                   | 7640/20117 [4:48:42<8:36:12,  2.48s/it] 38%|███████████████████████████████▌                                                   | 7641/20117 [4:48:45<8:27:13,  2.44s/it] 38%|███████████████████████████████▌                                                   | 7642/20117 [4:48:47<8:19:13,  2.40s/it] 38%|███████████████████████████████▌                                                   | 7643/20117 [4:48:49<8:14:49,  2.38s/it] 38%|███████████████████████████████▌                                                   | 7644/20117 [4:48:52<8:07:07,  2.34s/it] 38%|███████████████████████████████▌                                                   | 7645/20117 [4:48:54<8:06:24,  2.34s/it] 38%|███████████████████████████████▌                                                   | 7646/20117 [4:48:56<8:07:32,  2.35s/it] 38%|███████████████████████████████▌                                                   | 7647/20117 [4:48:59<8:05:08,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7648/20117 [4:49:01<8:05:58,  2.34s/it] 38%|███████████████████████████████▌                                                   | 7649/20117 [4:49:03<8:05:35,  2.34s/it] 38%|███████████████████████████████▌                                                   | 7650/20117 [4:49:06<8:10:45,  2.36s/it]                                                                                                                                 {'loss': 0.233, 'grad_norm': 0.19372917711734772, 'learning_rate': 0.00013764943154457812, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 375.4, 'epoch': 0.76}
 38%|███████████████████████████████▌                                                   | 7650/20117 [4:49:06<8:10:45,  2.36s/it] 38%|███████████████████████████████▌                                                   | 7651/20117 [4:49:08<8:09:09,  2.35s/it] 38%|███████████████████████████████▌                                                   | 7652/20117 [4:49:10<8:04:52,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7653/20117 [4:49:13<8:07:31,  2.35s/it] 38%|███████████████████████████████▌                                                   | 7654/20117 [4:49:15<8:03:18,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7655/20117 [4:49:17<8:03:38,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7656/20117 [4:49:20<8:03:49,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7657/20117 [4:49:22<8:04:02,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7658/20117 [4:49:24<8:04:52,  2.34s/it] 38%|███████████████████████████████▌                                                   | 7659/20117 [4:49:27<8:05:36,  2.34s/it] 38%|███████████████████████████████▌                                                   | 7660/20117 [4:49:29<8:01:14,  2.32s/it]                                                                                                                                 {'loss': 0.219, 'grad_norm': 0.46450668573379517, 'learning_rate': 0.00013750398727833735, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.49, 'epoch': 0.76}
 38%|███████████████████████████████▌                                                   | 7660/20117 [4:49:29<8:01:14,  2.32s/it] 38%|███████████████████████████████▌                                                   | 7661/20117 [4:49:31<8:03:04,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7662/20117 [4:49:34<8:04:33,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7663/20117 [4:49:36<8:02:42,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7664/20117 [4:49:38<8:03:17,  2.33s/it] 38%|███████████████████████████████▌                                                   | 7665/20117 [4:49:41<8:02:10,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7666/20117 [4:49:43<8:00:49,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7667/20117 [4:49:45<8:02:29,  2.33s/it] 38%|███████████████████████████████▋                                                   | 7668/20117 [4:49:48<8:00:32,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7669/20117 [4:49:50<8:03:28,  2.33s/it] 38%|███████████████████████████████▋                                                   | 7670/20117 [4:49:52<8:03:31,  2.33s/it]                                                                                                                                 {'loss': 0.2376, 'grad_norm': 0.3964915871620178, 'learning_rate': 0.00013735845063184921, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 309.02, 'epoch': 0.76}
 38%|███████████████████████████████▋                                                   | 7670/20117 [4:49:52<8:03:31,  2.33s/it] 38%|███████████████████████████████▋                                                   | 7671/20117 [4:49:55<8:00:19,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7672/20117 [4:49:57<8:08:12,  2.35s/it] 38%|███████████████████████████████▋                                                   | 7673/20117 [4:49:59<8:07:28,  2.35s/it] 38%|███████████████████████████████▋                                                   | 7674/20117 [4:50:02<8:05:51,  2.34s/it] 38%|███████████████████████████████▋                                                   | 7675/20117 [4:50:04<8:03:30,  2.33s/it] 38%|███████████████████████████████▋                                                   | 7676/20117 [4:50:06<8:03:55,  2.33s/it] 38%|███████████████████████████████▋                                                   | 7677/20117 [4:50:09<8:03:52,  2.33s/it] 38%|███████████████████████████████▋                                                   | 7678/20117 [4:50:11<8:02:14,  2.33s/it] 38%|███████████████████████████████▋                                                   | 7679/20117 [4:50:13<8:00:11,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7680/20117 [4:50:16<7:59:19,  2.31s/it]                                                                                                                                 {'loss': 0.2547, 'grad_norm': 0.6207079887390137, 'learning_rate': 0.00013721282196360127, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 338.05, 'epoch': 0.76}
 38%|███████████████████████████████▋                                                   | 7680/20117 [4:50:16<7:59:19,  2.31s/it] 38%|███████████████████████████████▋                                                   | 7681/20117 [4:50:18<7:57:30,  2.30s/it] 38%|███████████████████████████████▋                                                   | 7682/20117 [4:50:20<8:00:50,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7683/20117 [4:50:23<8:00:05,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7684/20117 [4:50:25<7:57:45,  2.31s/it] 38%|███████████████████████████████▋                                                   | 7685/20117 [4:50:27<8:00:25,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7686/20117 [4:50:29<7:58:16,  2.31s/it] 38%|███████████████████████████████▋                                                   | 7687/20117 [4:50:32<7:57:40,  2.31s/it] 38%|███████████████████████████████▋                                                   | 7688/20117 [4:50:34<7:58:11,  2.31s/it] 38%|███████████████████████████████▋                                                   | 7689/20117 [4:50:36<7:55:46,  2.30s/it] 38%|███████████████████████████████▋                                                   | 7690/20117 [4:50:39<7:59:34,  2.32s/it]                                                                                                                                 {'loss': 0.2504, 'grad_norm': 0.2084685117006302, 'learning_rate': 0.00013706710163230773, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 294.32, 'epoch': 0.76}
 38%|███████████████████████████████▋                                                   | 7690/20117 [4:50:39<7:59:34,  2.32s/it] 38%|███████████████████████████████▋                                                   | 7691/20117 [4:50:41<8:25:02,  2.44s/it] 38%|███████████████████████████████▋                                                   | 7692/20117 [4:50:44<8:19:01,  2.41s/it] 38%|███████████████████████████████▋                                                   | 7693/20117 [4:50:46<8:14:16,  2.39s/it] 38%|███████████████████████████████▋                                                   | 7694/20117 [4:50:48<8:07:33,  2.35s/it] 38%|███████████████████████████████▋                                                   | 7695/20117 [4:50:51<8:04:04,  2.34s/it] 38%|███████████████████████████████▊                                                   | 7696/20117 [4:50:53<8:04:59,  2.34s/it] 38%|███████████████████████████████▊                                                   | 7697/20117 [4:50:55<8:02:06,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7698/20117 [4:50:58<8:05:04,  2.34s/it] 38%|███████████████████████████████▊                                                   | 7699/20117 [4:51:00<8:00:49,  2.32s/it] 38%|███████████████████████████████▊                                                   | 7700/20117 [4:51:02<7:56:34,  2.30s/it]                                                                                                                                 {'loss': 0.1809, 'grad_norm': 0.4136933386325836, 'learning_rate': 0.0001369212899969086, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 314.54, 'epoch': 0.77}
 38%|███████████████████████████████▊                                                   | 7700/20117 [4:51:02<7:56:34,  2.30s/it] 38%|███████████████████████████████▊                                                   | 7701/20117 [4:51:05<8:01:21,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7702/20117 [4:51:07<7:59:11,  2.32s/it] 38%|███████████████████████████████▊                                                   | 7703/20117 [4:51:09<7:56:14,  2.30s/it] 38%|███████████████████████████████▊                                                   | 7704/20117 [4:51:12<8:04:29,  2.34s/it] 38%|███████████████████████████████▊                                                   | 7705/20117 [4:51:14<8:05:03,  2.34s/it] 38%|███████████████████████████████▊                                                   | 7706/20117 [4:51:16<8:04:04,  2.34s/it] 38%|███████████████████████████████▊                                                   | 7707/20117 [4:51:19<8:07:24,  2.36s/it] 38%|███████████████████████████████▊                                                   | 7708/20117 [4:51:21<8:06:22,  2.35s/it] 38%|███████████████████████████████▊                                                   | 7709/20117 [4:51:23<8:08:57,  2.36s/it] 38%|███████████████████████████████▊                                                   | 7710/20117 [4:51:26<8:05:24,  2.35s/it]                                                                                                                                 {'loss': 0.255, 'grad_norm': 0.529629111289978, 'learning_rate': 0.0001367753874165687, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.08, 'epoch': 0.77}
 38%|███████████████████████████████▊                                                   | 7710/20117 [4:51:26<8:05:24,  2.35s/it] 38%|███████████████████████████████▊                                                   | 7711/20117 [4:51:28<8:06:44,  2.35s/it] 38%|███████████████████████████████▊                                                   | 7712/20117 [4:51:30<8:04:29,  2.34s/it] 38%|███████████████████████████████▊                                                   | 7713/20117 [4:51:33<8:01:39,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7714/20117 [4:51:35<8:02:00,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7715/20117 [4:51:37<8:02:30,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7716/20117 [4:51:40<7:56:59,  2.31s/it] 38%|███████████████████████████████▊                                                   | 7717/20117 [4:51:42<7:59:33,  2.32s/it] 38%|███████████████████████████████▊                                                   | 7718/20117 [4:51:44<8:01:10,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7719/20117 [4:51:47<8:01:39,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7720/20117 [4:51:49<8:00:11,  2.32s/it]                                                                                                                                 {'loss': 0.2128, 'grad_norm': 0.36684682965278625, 'learning_rate': 0.0001366293942506769, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 318.04, 'epoch': 0.77}
 38%|███████████████████████████████▊                                                   | 7720/20117 [4:51:49<8:00:11,  2.32s/it] 38%|███████████████████████████████▊                                                   | 7721/20117 [4:51:51<7:58:51,  2.32s/it] 38%|███████████████████████████████▊                                                   | 7722/20117 [4:51:54<7:59:39,  2.32s/it] 38%|███████████████████████████████▊                                                   | 7723/20117 [4:51:56<8:02:17,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7724/20117 [4:51:58<8:01:27,  2.33s/it] 38%|███████████████████████████████▊                                                   | 7725/20117 [4:52:01<8:00:17,  2.33s/it] 38%|███████████████████████████████▉                                                   | 7726/20117 [4:52:03<8:00:12,  2.33s/it] 38%|███████████████████████████████▉                                                   | 7727/20117 [4:52:05<7:58:11,  2.32s/it] 38%|███████████████████████████████▉                                                   | 7728/20117 [4:52:08<8:03:48,  2.34s/it] 38%|███████████████████████████████▉                                                   | 7729/20117 [4:52:10<8:05:33,  2.35s/it] 38%|███████████████████████████████▉                                                   | 7730/20117 [4:52:12<8:04:20,  2.35s/it]                                                                                                                                 {'loss': 0.2159, 'grad_norm': 0.40612316131591797, 'learning_rate': 0.00013648331085884527, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 364.09, 'epoch': 0.77}
 38%|███████████████████████████████▉                                                   | 7730/20117 [4:52:12<8:04:20,  2.35s/it] 38%|███████████████████████████████▉                                                   | 7731/20117 [4:52:15<8:00:14,  2.33s/it] 38%|███████████████████████████████▉                                                   | 7732/20117 [4:52:17<7:57:07,  2.31s/it] 38%|███████████████████████████████▉                                                   | 7733/20117 [4:52:19<7:59:42,  2.32s/it] 38%|███████████████████████████████▉                                                   | 7734/20117 [4:52:22<7:56:32,  2.31s/it] 38%|███████████████████████████████▉                                                   | 7735/20117 [4:52:24<7:56:05,  2.31s/it] 38%|███████████████████████████████▉                                                   | 7736/20117 [4:52:26<7:59:06,  2.32s/it] 38%|███████████████████████████████▉                                                   | 7737/20117 [4:52:29<7:59:58,  2.33s/it] 38%|███████████████████████████████▉                                                   | 7738/20117 [4:52:31<8:01:22,  2.33s/it] 38%|███████████████████████████████▉                                                   | 7739/20117 [4:52:33<7:57:26,  2.31s/it] 38%|███████████████████████████████▉                                                   | 7740/20117 [4:52:36<7:58:29,  2.32s/it]                                                                                                                                 {'loss': 0.2255, 'grad_norm': 0.13119497895240784, 'learning_rate': 0.0001363371376009081, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.24, 'epoch': 0.77}
 38%|███████████████████████████████▉                                                   | 7740/20117 [4:52:36<7:58:29,  2.32s/it] 38%|███████████████████████████████▉                                                   | 7741/20117 [4:52:38<8:03:55,  2.35s/it] 38%|███████████████████████████████▉                                                   | 7742/20117 [4:52:40<8:01:53,  2.34s/it] 38%|███████████████████████████████▉                                                   | 7743/20117 [4:52:43<7:59:01,  2.32s/it] 38%|███████████████████████████████▉                                                   | 7744/20117 [4:52:45<8:11:07,  2.38s/it] 38%|███████████████████████████████▉                                                   | 7745/20117 [4:52:47<7:58:54,  2.32s/it] 39%|███████████████████████████████▉                                                   | 7746/20117 [4:52:49<7:51:57,  2.29s/it] 39%|███████████████████████████████▉                                                   | 7747/20117 [4:52:52<7:56:11,  2.31s/it] 39%|███████████████████████████████▉                                                   | 7748/20117 [4:52:54<7:57:00,  2.31s/it] 39%|███████████████████████████████▉                                                   | 7749/20117 [4:52:57<8:10:27,  2.38s/it] 39%|███████████████████████████████▉                                                   | 7750/20117 [4:52:59<8:15:44,  2.41s/it]                                                                                                                                 {'loss': 0.2595, 'grad_norm': 0.5006715655326843, 'learning_rate': 0.00013619087483692099, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 338.82, 'epoch': 0.77}
 39%|███████████████████████████████▉                                                   | 7750/20117 [4:52:59<8:15:44,  2.41s/it] 39%|███████████████████████████████▉                                                   | 7751/20117 [4:53:02<8:16:07,  2.41s/it] 39%|███████████████████████████████▉                                                   | 7752/20117 [4:53:04<8:16:22,  2.41s/it] 39%|███████████████████████████████▉                                                   | 7753/20117 [4:53:06<8:12:39,  2.39s/it] 39%|███████████████████████████████▉                                                   | 7754/20117 [4:53:09<8:11:58,  2.39s/it] 39%|███████████████████████████████▉                                                   | 7755/20117 [4:53:11<8:14:06,  2.40s/it] 39%|████████████████████████████████                                                   | 7756/20117 [4:53:13<8:04:50,  2.35s/it] 39%|████████████████████████████████                                                   | 7757/20117 [4:53:16<8:06:02,  2.36s/it] 39%|████████████████████████████████                                                   | 7758/20117 [4:53:18<7:59:19,  2.33s/it] 39%|████████████████████████████████                                                   | 7759/20117 [4:53:20<8:08:24,  2.37s/it] 39%|████████████████████████████████                                                   | 7760/20117 [4:53:23<8:19:11,  2.42s/it]                                                                                                                                 {'loss': 0.203, 'grad_norm': 0.3994678258895874, 'learning_rate': 0.00013604452292716003, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.9, 'epoch': 0.77}
 39%|████████████████████████████████                                                   | 7760/20117 [4:53:23<8:19:11,  2.42s/it] 39%|████████████████████████████████                                                   | 7761/20117 [4:53:26<8:27:11,  2.46s/it] 39%|████████████████████████████████                                                   | 7762/20117 [4:53:28<8:22:29,  2.44s/it] 39%|████████████████████████████████                                                   | 7763/20117 [4:53:30<8:12:28,  2.39s/it] 39%|████████████████████████████████                                                   | 7764/20117 [4:53:33<8:13:39,  2.40s/it] 39%|████████████████████████████████                                                   | 7765/20117 [4:53:35<8:10:12,  2.38s/it] 39%|████████████████████████████████                                                   | 7766/20117 [4:53:37<8:05:16,  2.36s/it] 39%|████████████████████████████████                                                   | 7767/20117 [4:53:40<8:03:06,  2.35s/it] 39%|████████████████████████████████                                                   | 7768/20117 [4:53:42<8:01:46,  2.34s/it] 39%|████████████████████████████████                                                   | 7769/20117 [4:53:44<8:00:41,  2.34s/it] 39%|████████████████████████████████                                                   | 7770/20117 [4:53:47<7:57:05,  2.32s/it]                                                                                                                                 {'loss': 0.2537, 'grad_norm': 0.17447052896022797, 'learning_rate': 0.00013589808223212087, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 376.09, 'epoch': 0.77}
 39%|████████████████████████████████                                                   | 7770/20117 [4:53:47<7:57:05,  2.32s/it] 39%|████████████████████████████████                                                   | 7771/20117 [4:53:49<7:54:17,  2.30s/it] 39%|████████████████████████████████                                                   | 7772/20117 [4:53:51<7:56:01,  2.31s/it] 39%|████████████████████████████████                                                   | 7773/20117 [4:53:54<8:11:37,  2.39s/it] 39%|████████████████████████████████                                                   | 7774/20117 [4:53:56<8:18:19,  2.42s/it] 39%|████████████████████████████████                                                   | 7775/20117 [4:53:59<8:14:57,  2.41s/it] 39%|████████████████████████████████                                                   | 7776/20117 [4:54:01<8:08:20,  2.37s/it] 39%|████████████████████████████████                                                   | 7777/20117 [4:54:03<8:04:11,  2.35s/it] 39%|████████████████████████████████                                                   | 7778/20117 [4:54:06<8:04:13,  2.35s/it] 39%|████████████████████████████████                                                   | 7779/20117 [4:54:08<7:59:20,  2.33s/it] 39%|████████████████████████████████                                                   | 7780/20117 [4:54:10<7:59:04,  2.33s/it]                                                                                                                                 {'loss': 0.2112, 'grad_norm': 0.5262983441352844, 'learning_rate': 0.000135751553112518, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 338.13, 'epoch': 0.77}
 39%|████████████████████████████████                                                   | 7780/20117 [4:54:10<7:59:04,  2.33s/it] 39%|████████████████████████████████                                                   | 7781/20117 [4:54:12<7:59:11,  2.33s/it] 39%|████████████████████████████████                                                   | 7782/20117 [4:54:15<7:53:33,  2.30s/it] 39%|████████████████████████████████                                                   | 7783/20117 [4:54:17<7:53:12,  2.30s/it] 39%|████████████████████████████████                                                   | 7784/20117 [4:54:19<7:49:10,  2.28s/it] 39%|████████████████████████████████                                                   | 7785/20117 [4:54:21<7:45:37,  2.27s/it] 39%|████████████████████████████████                                                   | 7786/20117 [4:54:24<7:46:28,  2.27s/it] 39%|████████████████████████████████▏                                                  | 7787/20117 [4:54:26<7:46:30,  2.27s/it] 39%|████████████████████████████████▏                                                  | 7788/20117 [4:54:28<7:46:53,  2.27s/it] 39%|████████████████████████████████▏                                                  | 7789/20117 [4:54:31<7:46:31,  2.27s/it] 39%|████████████████████████████████▏                                                  | 7790/20117 [4:54:33<7:49:29,  2.29s/it]                                                                                                                                 {'loss': 0.2235, 'grad_norm': 0.32633262872695923, 'learning_rate': 0.00013560493592928356, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 331.86, 'epoch': 0.77}
 39%|████████████████████████████████▏                                                  | 7790/20117 [4:54:33<7:49:29,  2.29s/it] 39%|████████████████████████████████▏                                                  | 7791/20117 [4:54:35<7:52:26,  2.30s/it] 39%|████████████████████████████████▏                                                  | 7792/20117 [4:54:37<7:51:35,  2.30s/it] 39%|████████████████████████████████▏                                                  | 7793/20117 [4:54:40<7:49:28,  2.29s/it] 39%|████████████████████████████████▏                                                  | 7794/20117 [4:54:42<7:47:25,  2.28s/it] 39%|████████████████████████████████▏                                                  | 7795/20117 [4:54:44<7:50:48,  2.29s/it] 39%|████████████████████████████████▏                                                  | 7796/20117 [4:54:47<7:53:48,  2.31s/it] 39%|████████████████████████████████▏                                                  | 7797/20117 [4:54:49<7:57:03,  2.32s/it] 39%|████████████████████████████████▏                                                  | 7798/20117 [4:54:52<8:18:28,  2.43s/it] 39%|████████████████████████████████▏                                                  | 7799/20117 [4:54:54<8:14:08,  2.41s/it] 39%|████████████████████████████████▏                                                  | 7800/20117 [4:54:56<8:14:26,  2.41s/it]                                                                                                                                 {'loss': 0.297, 'grad_norm': 0.4296337068080902, 'learning_rate': 0.00013545823104356663, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 370.8, 'epoch': 0.78}
 39%|████████████████████████████████▏                                                  | 7800/20117 [4:54:56<8:14:26,  2.41s/it] 39%|████████████████████████████████▏                                                  | 7801/20117 [4:54:59<8:07:19,  2.37s/it] 39%|████████████████████████████████▏                                                  | 7802/20117 [4:55:01<8:05:15,  2.36s/it] 39%|████████████████████████████████▏                                                  | 7803/20117 [4:55:03<7:56:10,  2.32s/it] 39%|████████████████████████████████▏                                                  | 7804/20117 [4:55:06<7:59:30,  2.34s/it] 39%|████████████████████████████████▏                                                  | 7805/20117 [4:55:08<8:05:24,  2.37s/it] 39%|████████████████████████████████▏                                                  | 7806/20117 [4:55:10<7:59:04,  2.33s/it] 39%|████████████████████████████████▏                                                  | 7807/20117 [4:55:13<7:58:36,  2.33s/it] 39%|████████████████████████████████▏                                                  | 7808/20117 [4:55:15<7:52:52,  2.31s/it] 39%|████████████████████████████████▏                                                  | 7809/20117 [4:55:17<7:51:50,  2.30s/it] 39%|████████████████████████████████▏                                                  | 7810/20117 [4:55:20<7:52:55,  2.31s/it]                                                                                                                                 {'loss': 0.1952, 'grad_norm': 0.5057851672172546, 'learning_rate': 0.00013531143881673237, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 321.52, 'epoch': 0.78}
 39%|████████████████████████████████▏                                                  | 7810/20117 [4:55:20<7:52:55,  2.31s/it] 39%|████████████████████████████████▏                                                  | 7811/20117 [4:55:22<7:50:11,  2.29s/it] 39%|████████████████████████████████▏                                                  | 7812/20117 [4:55:24<7:48:18,  2.28s/it] 39%|████████████████████████████████▏                                                  | 7813/20117 [4:55:26<7:49:44,  2.29s/it] 39%|████████████████████████████████▏                                                  | 7814/20117 [4:55:29<7:46:52,  2.28s/it] 39%|████████████████████████████████▏                                                  | 7815/20117 [4:55:31<7:46:44,  2.28s/it] 39%|████████████████████████████████▏                                                  | 7816/20117 [4:55:33<7:46:36,  2.28s/it] 39%|████████████████████████████████▎                                                  | 7817/20117 [4:55:36<7:51:14,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7818/20117 [4:55:38<7:52:35,  2.31s/it] 39%|████████████████████████████████▎                                                  | 7819/20117 [4:55:40<7:51:57,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7820/20117 [4:55:43<7:54:02,  2.31s/it]                                                                                                                                 {'loss': 0.2589, 'grad_norm': 0.49617013335227966, 'learning_rate': 0.00013516455961036104, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 299.57, 'epoch': 0.78}
 39%|████████████████████████████████▎                                                  | 7820/20117 [4:55:43<7:54:02,  2.31s/it] 39%|████████████████████████████████▎                                                  | 7821/20117 [4:55:45<7:54:43,  2.32s/it] 39%|████████████████████████████████▎                                                  | 7822/20117 [4:55:47<7:51:42,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7823/20117 [4:55:49<7:54:05,  2.31s/it] 39%|████████████████████████████████▎                                                  | 7824/20117 [4:55:52<7:51:54,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7825/20117 [4:55:54<7:50:29,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7826/20117 [4:55:56<7:47:22,  2.28s/it] 39%|████████████████████████████████▎                                                  | 7827/20117 [4:55:59<7:49:32,  2.29s/it] 39%|████████████████████████████████▎                                                  | 7828/20117 [4:56:01<7:53:00,  2.31s/it] 39%|████████████████████████████████▎                                                  | 7829/20117 [4:56:03<7:49:33,  2.29s/it] 39%|████████████████████████████████▎                                                  | 7830/20117 [4:56:05<7:49:12,  2.29s/it]                                                                                                                                 {'loss': 0.2328, 'grad_norm': 0.3173094689846039, 'learning_rate': 0.00013501759378624722, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 349.07, 'epoch': 0.78}
 39%|████████████████████████████████▎                                                  | 7830/20117 [4:56:05<7:49:12,  2.29s/it] 39%|████████████████████████████████▎                                                  | 7831/20117 [4:56:08<7:50:16,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7832/20117 [4:56:10<7:50:45,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7833/20117 [4:56:12<7:49:38,  2.29s/it] 39%|████████████████████████████████▎                                                  | 7834/20117 [4:56:15<7:51:34,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7835/20117 [4:56:17<7:51:39,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7836/20117 [4:56:19<7:48:32,  2.29s/it] 39%|████████████████████████████████▎                                                  | 7837/20117 [4:56:22<7:51:18,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7838/20117 [4:56:24<7:50:28,  2.30s/it] 39%|████████████████████████████████▎                                                  | 7839/20117 [4:56:26<7:54:11,  2.32s/it] 39%|████████████████████████████████▎                                                  | 7840/20117 [4:56:29<7:52:56,  2.31s/it]                                                                                                                                 {'loss': 0.2472, 'grad_norm': 0.4631012976169586, 'learning_rate': 0.00013487054170639877, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 361.13, 'epoch': 0.78}
 39%|████████████████████████████████▎                                                  | 7840/20117 [4:56:29<7:52:56,  2.31s/it] 39%|████████████████████████████████▎                                                  | 7841/20117 [4:56:31<7:52:40,  2.31s/it] 39%|████████████████████████████████▎                                                  | 7842/20117 [4:56:33<7:53:58,  2.32s/it] 39%|████████████████████████████████▎                                                  | 7843/20117 [4:56:35<7:54:29,  2.32s/it] 39%|████████████████████████████████▎                                                  | 7844/20117 [4:56:38<7:52:43,  2.31s/it] 39%|████████████████████████████████▎                                                  | 7845/20117 [4:56:40<7:55:49,  2.33s/it] 39%|████████████████████████████████▎                                                  | 7846/20117 [4:56:42<7:55:46,  2.33s/it] 39%|████████████████████████████████▍                                                  | 7847/20117 [4:56:45<7:52:23,  2.31s/it] 39%|████████████████████████████████▍                                                  | 7848/20117 [4:56:47<8:02:46,  2.36s/it] 39%|████████████████████████████████▍                                                  | 7849/20117 [4:56:50<8:20:37,  2.45s/it] 39%|████████████████████████████████▍                                                  | 7850/20117 [4:56:52<8:15:34,  2.42s/it]                                                                                                                                 {'loss': 0.2183, 'grad_norm': 0.3672430217266083, 'learning_rate': 0.000134723403733036, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 324.51, 'epoch': 0.78}
 39%|████████████████████████████████▍                                                  | 7850/20117 [4:56:52<8:15:34,  2.42s/it] 39%|████████████████████████████████▍                                                  | 7851/20117 [4:56:55<8:09:10,  2.39s/it] 39%|████████████████████████████████▍                                                  | 7852/20117 [4:56:57<8:06:18,  2.38s/it] 39%|████████████████████████████████▍                                                  | 7853/20117 [4:56:59<8:01:37,  2.36s/it] 39%|████████████████████████████████▍                                                  | 7854/20117 [4:57:02<7:58:18,  2.34s/it] 39%|████████████████████████████████▍                                                  | 7855/20117 [4:57:04<7:57:50,  2.34s/it] 39%|████████████████████████████████▍                                                  | 7856/20117 [4:57:06<7:54:19,  2.32s/it] 39%|████████████████████████████████▍                                                  | 7857/20117 [4:57:08<7:53:48,  2.32s/it] 39%|████████████████████████████████▍                                                  | 7858/20117 [4:57:11<7:52:46,  2.31s/it] 39%|████████████████████████████████▍                                                  | 7859/20117 [4:57:13<7:52:37,  2.31s/it] 39%|████████████████████████████████▍                                                  | 7860/20117 [4:57:15<7:51:56,  2.31s/it]                                                                                                                                 {'loss': 0.3104, 'grad_norm': 0.5141401886940002, 'learning_rate': 0.00013457618022859092, 'memory/max_active (GiB)': 21.47, 'memory/max_allocated (GiB)': 21.47, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 384.61, 'epoch': 0.78}
 39%|████████████████████████████████▍                                                  | 7860/20117 [4:57:15<7:51:56,  2.31s/it] 39%|████████████████████████████████▍                                                  | 7861/20117 [4:57:18<7:57:10,  2.34s/it] 39%|████████████████████████████████▍                                                  | 7862/20117 [4:57:20<7:59:51,  2.35s/it] 39%|████████████████████████████████▍                                                  | 7863/20117 [4:57:23<8:05:27,  2.38s/it] 39%|████████████████████████████████▍                                                  | 7864/20117 [4:57:25<8:00:08,  2.35s/it] 39%|████████████████████████████████▍                                                  | 7865/20117 [4:57:27<8:01:56,  2.36s/it] 39%|████████████████████████████████▍                                                  | 7866/20117 [4:57:30<8:03:31,  2.37s/it] 39%|████████████████████████████████▍                                                  | 7867/20117 [4:57:32<8:00:56,  2.36s/it] 39%|████████████████████████████████▍                                                  | 7868/20117 [4:57:34<8:01:54,  2.36s/it] 39%|████████████████████████████████▍                                                  | 7869/20117 [4:57:37<8:05:23,  2.38s/it] 39%|████████████████████████████████▍                                                  | 7870/20117 [4:57:39<8:00:22,  2.35s/it]                                                                                                                                 {'loss': 0.2228, 'grad_norm': 0.43661215901374817, 'learning_rate': 0.00013442887155570607, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.28, 'epoch': 0.78}
 39%|████████████████████████████████▍                                                  | 7870/20117 [4:57:39<8:00:22,  2.35s/it] 39%|████████████████████████████████▍                                                  | 7871/20117 [4:57:41<7:58:21,  2.34s/it] 39%|████████████████████████████████▍                                                  | 7872/20117 [4:57:44<8:03:25,  2.37s/it] 39%|████████████████████████████████▍                                                  | 7873/20117 [4:57:46<8:03:15,  2.37s/it] 39%|████████████████████████████████▍                                                  | 7874/20117 [4:57:49<8:01:36,  2.36s/it] 39%|████████████████████████████████▍                                                  | 7875/20117 [4:57:51<7:59:07,  2.35s/it] 39%|████████████████████████████████▍                                                  | 7876/20117 [4:57:53<7:58:31,  2.35s/it] 39%|████████████████████████████████▍                                                  | 7877/20117 [4:57:56<8:01:20,  2.36s/it] 39%|████████████████████████████████▌                                                  | 7878/20117 [4:57:58<8:04:01,  2.37s/it] 39%|████████████████████████████████▌                                                  | 7879/20117 [4:58:00<8:09:06,  2.40s/it] 39%|████████████████████████████████▌                                                  | 7880/20117 [4:58:03<8:11:52,  2.41s/it]                                                                                                                                 {'loss': 0.2215, 'grad_norm': 0.375987708568573, 'learning_rate': 0.00013428147807723387, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 327.47, 'epoch': 0.78}
 39%|████████████████████████████████▌                                                  | 7880/20117 [4:58:03<8:11:52,  2.41s/it] 39%|████████████████████████████████▌                                                  | 7881/20117 [4:58:05<8:05:25,  2.38s/it] 39%|████████████████████████████████▌                                                  | 7882/20117 [4:58:08<8:06:26,  2.39s/it] 39%|████████████████████████████████▌                                                  | 7883/20117 [4:58:10<8:05:45,  2.38s/it] 39%|████████████████████████████████▌                                                  | 7884/20117 [4:58:12<8:06:29,  2.39s/it] 39%|████████████████████████████████▌                                                  | 7885/20117 [4:58:15<8:09:09,  2.40s/it] 39%|████████████████████████████████▌                                                  | 7886/20117 [4:58:17<8:03:32,  2.37s/it] 39%|████████████████████████████████▌                                                  | 7887/20117 [4:58:19<8:02:47,  2.37s/it] 39%|████████████████████████████████▌                                                  | 7888/20117 [4:58:22<8:04:47,  2.38s/it] 39%|████████████████████████████████▌                                                  | 7889/20117 [4:58:24<8:04:32,  2.38s/it] 39%|████████████████████████████████▌                                                  | 7890/20117 [4:58:27<8:08:19,  2.40s/it]                                                                                                                                 {'loss': 0.263, 'grad_norm': 0.2800423204898834, 'learning_rate': 0.00013413400015623562, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 367.65, 'epoch': 0.78}
 39%|████████████████████████████████▌                                                  | 7890/20117 [4:58:27<8:08:19,  2.40s/it] 39%|████████████████████████████████▌                                                  | 7891/20117 [4:58:29<8:04:13,  2.38s/it] 39%|████████████████████████████████▌                                                  | 7892/20117 [4:58:31<8:01:17,  2.36s/it] 39%|████████████████████████████████▌                                                  | 7893/20117 [4:58:34<8:02:07,  2.37s/it] 39%|████████████████████████████████▌                                                  | 7894/20117 [4:58:36<7:57:55,  2.35s/it] 39%|████████████████████████████████▌                                                  | 7895/20117 [4:58:38<8:00:26,  2.36s/it] 39%|████████████████████████████████▌                                                  | 7896/20117 [4:58:41<8:04:10,  2.38s/it] 39%|████████████████████████████████▌                                                  | 7897/20117 [4:58:43<8:00:52,  2.36s/it] 39%|████████████████████████████████▌                                                  | 7898/20117 [4:58:45<7:59:26,  2.35s/it] 39%|████████████████████████████████▌                                                  | 7899/20117 [4:58:48<7:53:28,  2.33s/it] 39%|████████████████████████████████▌                                                  | 7900/20117 [4:58:50<7:50:29,  2.31s/it]                                                                                                                                 {'loss': 0.2533, 'grad_norm': 0.44610151648521423, 'learning_rate': 0.00013398643815598063, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 364.29, 'epoch': 0.79}
 39%|████████████████████████████████▌                                                  | 7900/20117 [4:58:50<7:50:29,  2.31s/it] 39%|████████████████████████████████▌                                                  | 7901/20117 [4:58:53<8:15:27,  2.43s/it] 39%|████████████████████████████████▌                                                  | 7902/20117 [4:58:55<8:06:07,  2.39s/it] 39%|████████████████████████████████▌                                                  | 7903/20117 [4:58:57<7:59:38,  2.36s/it] 39%|████████████████████████████████▌                                                  | 7904/20117 [4:59:00<7:57:01,  2.34s/it] 39%|████████████████████████████████▌                                                  | 7905/20117 [4:59:02<7:55:13,  2.33s/it] 39%|████████████████████████████████▌                                                  | 7906/20117 [4:59:04<7:54:07,  2.33s/it] 39%|████████████████████████████████▌                                                  | 7907/20117 [4:59:07<7:52:10,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7908/20117 [4:59:09<7:53:52,  2.33s/it] 39%|████████████████████████████████▋                                                  | 7909/20117 [4:59:11<7:55:03,  2.33s/it] 39%|████████████████████████████████▋                                                  | 7910/20117 [4:59:14<7:53:09,  2.33s/it]                                                                                                                                 {'loss': 0.2273, 'grad_norm': 0.6232859492301941, 'learning_rate': 0.0001338387924399452, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.13, 'epoch': 0.79}
 39%|████████████████████████████████▋                                                  | 7910/20117 [4:59:14<7:53:09,  2.33s/it] 39%|████████████████████████████████▋                                                  | 7911/20117 [4:59:16<7:52:08,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7912/20117 [4:59:18<7:46:54,  2.30s/it] 39%|████████████████████████████████▋                                                  | 7913/20117 [4:59:20<7:44:07,  2.28s/it] 39%|████████████████████████████████▋                                                  | 7914/20117 [4:59:23<7:51:52,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7915/20117 [4:59:25<7:51:20,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7916/20117 [4:59:27<7:48:36,  2.30s/it] 39%|████████████████████████████████▋                                                  | 7917/20117 [4:59:30<7:51:11,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7918/20117 [4:59:32<7:47:09,  2.30s/it] 39%|████████████████████████████████▋                                                  | 7919/20117 [4:59:34<7:47:31,  2.30s/it] 39%|████████████████████████████████▋                                                  | 7920/20117 [4:59:37<7:51:13,  2.32s/it]                                                                                                                                 {'loss': 0.2007, 'grad_norm': 0.3155955374240875, 'learning_rate': 0.00013369106337181202, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.92, 'epoch': 0.79}
 39%|████████████████████████████████▋                                                  | 7920/20117 [4:59:37<7:51:13,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7921/20117 [4:59:39<7:53:53,  2.33s/it] 39%|████████████████████████████████▋                                                  | 7922/20117 [4:59:41<7:55:02,  2.34s/it] 39%|████████████████████████████████▋                                                  | 7923/20117 [4:59:44<7:50:36,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7924/20117 [4:59:46<7:43:20,  2.28s/it] 39%|████████████████████████████████▋                                                  | 7925/20117 [4:59:48<7:35:19,  2.24s/it] 39%|████████████████████████████████▋                                                  | 7926/20117 [4:59:50<7:30:00,  2.21s/it] 39%|████████████████████████████████▋                                                  | 7927/20117 [4:59:52<7:28:45,  2.21s/it] 39%|████████████████████████████████▋                                                  | 7928/20117 [4:59:55<7:33:52,  2.23s/it] 39%|████████████████████████████████▋                                                  | 7929/20117 [4:59:57<7:34:22,  2.24s/it] 39%|████████████████████████████████▋                                                  | 7930/20117 [4:59:59<7:47:01,  2.30s/it]                                                                                                                                 {'loss': 0.1722, 'grad_norm': 0.47753843665122986, 'learning_rate': 0.00013354325131546902, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 350.28, 'epoch': 0.79}
 39%|████████████████████████████████▋                                                  | 7930/20117 [4:59:59<7:47:01,  2.30s/it] 39%|████████████████████████████████▋                                                  | 7931/20117 [5:00:02<7:52:07,  2.32s/it] 39%|████████████████████████████████▋                                                  | 7932/20117 [5:00:04<7:56:26,  2.35s/it] 39%|████████████████████████████████▋                                                  | 7933/20117 [5:00:06<7:54:59,  2.34s/it] 39%|████████████████████████████████▋                                                  | 7934/20117 [5:00:09<7:56:40,  2.35s/it] 39%|████████████████████████████████▋                                                  | 7935/20117 [5:00:11<7:59:12,  2.36s/it] 39%|████████████████████████████████▋                                                  | 7936/20117 [5:00:13<7:57:25,  2.35s/it] 39%|████████████████████████████████▋                                                  | 7937/20117 [5:00:16<7:54:32,  2.34s/it] 39%|████████████████████████████████▊                                                  | 7938/20117 [5:00:18<7:42:00,  2.28s/it] 39%|████████████████████████████████▊                                                  | 7939/20117 [5:00:20<7:34:07,  2.24s/it] 39%|████████████████████████████████▊                                                  | 7940/20117 [5:00:22<7:27:45,  2.21s/it]                                                                                                                                 {'loss': 0.2172, 'grad_norm': 0.6098092198371887, 'learning_rate': 0.0001333953566350085, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 328.34, 'epoch': 0.79}
 39%|████████████████████████████████▊                                                  | 7940/20117 [5:00:22<7:27:45,  2.21s/it] 39%|████████████████████████████████▊                                                  | 7941/20117 [5:00:25<7:37:32,  2.25s/it] 39%|████████████████████████████████▊                                                  | 7942/20117 [5:00:27<7:44:40,  2.29s/it] 39%|████████████████████████████████▊                                                  | 7943/20117 [5:00:29<7:52:33,  2.33s/it] 39%|████████████████████████████████▊                                                  | 7944/20117 [5:00:32<7:51:46,  2.33s/it] 39%|████████████████████████████████▊                                                  | 7945/20117 [5:00:34<7:50:40,  2.32s/it] 39%|████████████████████████████████▊                                                  | 7946/20117 [5:00:36<7:48:01,  2.31s/it] 40%|████████████████████████████████▊                                                  | 7947/20117 [5:00:39<7:48:48,  2.31s/it] 40%|████████████████████████████████▊                                                  | 7948/20117 [5:00:41<7:45:36,  2.30s/it] 40%|████████████████████████████████▊                                                  | 7949/20117 [5:00:43<7:44:59,  2.29s/it] 40%|████████████████████████████████▊                                                  | 7950/20117 [5:00:45<7:47:48,  2.31s/it]                                                                                                                                 {'loss': 0.2365, 'grad_norm': 0.40892454981803894, 'learning_rate': 0.00013324737969472628, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 376.03, 'epoch': 0.79}
 40%|████████████████████████████████▊                                                  | 7950/20117 [5:00:45<7:47:48,  2.31s/it] 40%|████████████████████████████████▊                                                  | 7951/20117 [5:00:48<7:49:05,  2.31s/it] 40%|████████████████████████████████▊                                                  | 7952/20117 [5:00:50<7:49:28,  2.32s/it] 40%|████████████████████████████████▊                                                  | 7953/20117 [5:00:53<8:14:00,  2.44s/it] 40%|████████████████████████████████▊                                                  | 7954/20117 [5:00:55<8:10:22,  2.42s/it] 40%|████████████████████████████████▊                                                  | 7955/20117 [5:00:58<8:08:12,  2.41s/it] 40%|████████████████████████████████▊                                                  | 7956/20117 [5:01:00<8:05:11,  2.39s/it] 40%|████████████████████████████████▊                                                  | 7957/20117 [5:01:02<8:00:34,  2.37s/it] 40%|████████████████████████████████▊                                                  | 7958/20117 [5:01:05<8:00:31,  2.37s/it] 40%|████████████████████████████████▊                                                  | 7959/20117 [5:01:07<7:58:26,  2.36s/it] 40%|████████████████████████████████▊                                                  | 7960/20117 [5:01:09<7:53:41,  2.34s/it]                                                                                                                                 {'loss': 0.265, 'grad_norm': 0.6622501015663147, 'learning_rate': 0.00013309932085912092, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 401.88, 'epoch': 0.79}
 40%|████████████████████████████████▊                                                  | 7960/20117 [5:01:09<7:53:41,  2.34s/it] 40%|████████████████████████████████▊                                                  | 7961/20117 [5:01:11<7:48:09,  2.31s/it] 40%|████████████████████████████████▊                                                  | 7962/20117 [5:01:14<7:51:59,  2.33s/it] 40%|████████████████████████████████▊                                                  | 7963/20117 [5:01:16<7:50:37,  2.32s/it] 40%|████████████████████████████████▊                                                  | 7964/20117 [5:01:18<7:45:36,  2.30s/it] 40%|████████████████████████████████▊                                                  | 7965/20117 [5:01:21<7:43:57,  2.29s/it] 40%|████████████████████████████████▊                                                  | 7966/20117 [5:01:23<7:43:42,  2.29s/it] 40%|████████████████████████████████▊                                                  | 7967/20117 [5:01:25<7:42:41,  2.28s/it] 40%|████████████████████████████████▊                                                  | 7968/20117 [5:01:28<7:46:28,  2.30s/it] 40%|████████████████████████████████▉                                                  | 7969/20117 [5:01:30<7:40:10,  2.27s/it] 40%|████████████████████████████████▉                                                  | 7970/20117 [5:01:32<7:37:47,  2.26s/it]                                                                                                                                 {'loss': 0.2164, 'grad_norm': 0.5111701488494873, 'learning_rate': 0.00013295118049289255, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 359.69, 'epoch': 0.79}
 40%|████████████████████████████████▉                                                  | 7970/20117 [5:01:32<7:37:47,  2.26s/it] 40%|████████████████████████████████▉                                                  | 7971/20117 [5:01:34<7:40:40,  2.28s/it] 40%|████████████████████████████████▉                                                  | 7972/20117 [5:01:37<7:38:46,  2.27s/it] 40%|████████████████████████████████▉                                                  | 7973/20117 [5:01:39<7:37:35,  2.26s/it] 40%|████████████████████████████████▉                                                  | 7974/20117 [5:01:41<7:40:41,  2.28s/it] 40%|████████████████████████████████▉                                                  | 7975/20117 [5:01:43<7:36:35,  2.26s/it] 40%|████████████████████████████████▉                                                  | 7976/20117 [5:01:46<7:37:29,  2.26s/it] 40%|████████████████████████████████▉                                                  | 7977/20117 [5:01:48<7:40:39,  2.28s/it] 40%|████████████████████████████████▉                                                  | 7978/20117 [5:01:50<7:43:53,  2.29s/it] 40%|████████████████████████████████▉                                                  | 7979/20117 [5:01:53<7:47:44,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7980/20117 [5:01:55<7:46:37,  2.31s/it]                                                                                                                                 {'loss': 0.2567, 'grad_norm': 0.5144445300102234, 'learning_rate': 0.00013280295896094224, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 301.89, 'epoch': 0.79}
 40%|████████████████████████████████▉                                                  | 7980/20117 [5:01:55<7:46:37,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7981/20117 [5:01:57<7:46:48,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7982/20117 [5:01:59<7:41:25,  2.28s/it] 40%|████████████████████████████████▉                                                  | 7983/20117 [5:02:02<7:39:59,  2.27s/it] 40%|████████████████████████████████▉                                                  | 7984/20117 [5:02:04<7:44:47,  2.30s/it] 40%|████████████████████████████████▉                                                  | 7985/20117 [5:02:06<7:47:26,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7986/20117 [5:02:09<7:46:25,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7987/20117 [5:02:11<7:49:16,  2.32s/it] 40%|████████████████████████████████▉                                                  | 7988/20117 [5:02:13<7:47:11,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7989/20117 [5:02:16<7:45:59,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7990/20117 [5:02:18<7:47:24,  2.31s/it]                                                                                                                                 {'loss': 0.1934, 'grad_norm': 0.5570478439331055, 'learning_rate': 0.00013265465662837093, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.55, 'epoch': 0.79}
 40%|████████████████████████████████▉                                                  | 7990/20117 [5:02:18<7:47:24,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7991/20117 [5:02:20<7:45:48,  2.30s/it] 40%|████████████████████████████████▉                                                  | 7992/20117 [5:02:22<7:44:14,  2.30s/it] 40%|████████████████████████████████▉                                                  | 7993/20117 [5:02:25<7:46:38,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7994/20117 [5:02:27<7:47:19,  2.31s/it] 40%|████████████████████████████████▉                                                  | 7995/20117 [5:02:30<7:49:30,  2.32s/it] 40%|████████████████████████████████▉                                                  | 7996/20117 [5:02:32<7:48:11,  2.32s/it] 40%|████████████████████████████████▉                                                  | 7997/20117 [5:02:34<7:44:44,  2.30s/it] 40%|████████████████████████████████▉                                                  | 7998/20117 [5:02:36<7:47:09,  2.31s/it] 40%|█████████████████████████████████                                                  | 7999/20117 [5:02:39<7:43:23,  2.29s/it] 40%|█████████████████████████████████                                                  | 8000/20117 [5:02:41<7:44:39,  2.30s/it]                                                                                                                                 {'loss': 0.2247, 'grad_norm': 0.33518052101135254, 'learning_rate': 0.00013250627386047866, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 335.15, 'epoch': 0.8}
 40%|█████████████████████████████████                                                  | 8000/20117 [5:02:41<7:44:39,  2.30s/it] 40%|█████████████████████████████████                                                  | 8001/20117 [5:02:43<7:49:06,  2.32s/it] 40%|█████████████████████████████████                                                  | 8002/20117 [5:02:46<7:42:28,  2.29s/it] 40%|█████████████████████████████████                                                  | 8003/20117 [5:02:48<7:44:58,  2.30s/it] 40%|█████████████████████████████████                                                  | 8004/20117 [5:02:51<8:03:25,  2.39s/it] 40%|█████████████████████████████████                                                  | 8005/20117 [5:02:53<7:56:48,  2.36s/it] 40%|█████████████████████████████████                                                  | 8006/20117 [5:02:55<7:58:12,  2.37s/it] 40%|█████████████████████████████████                                                  | 8007/20117 [5:02:57<7:52:20,  2.34s/it] 40%|█████████████████████████████████                                                  | 8008/20117 [5:03:00<7:50:46,  2.33s/it] 40%|█████████████████████████████████                                                  | 8009/20117 [5:03:02<7:47:37,  2.32s/it] 40%|█████████████████████████████████                                                  | 8010/20117 [5:03:04<7:41:53,  2.29s/it]                                                                                                                                 {'loss': 0.1587, 'grad_norm': 0.508229672908783, 'learning_rate': 0.0001323578110227635, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.3, 'epoch': 0.8}
 40%|█████████████████████████████████                                                  | 8010/20117 [5:03:04<7:41:53,  2.29s/it] 40%|█████████████████████████████████                                                  | 8011/20117 [5:03:07<7:43:59,  2.30s/it] 40%|█████████████████████████████████                                                  | 8012/20117 [5:03:09<7:43:02,  2.30s/it] 40%|█████████████████████████████████                                                  | 8013/20117 [5:03:11<7:41:54,  2.29s/it] 40%|█████████████████████████████████                                                  | 8014/20117 [5:03:13<7:41:52,  2.29s/it] 40%|█████████████████████████████████                                                  | 8015/20117 [5:03:16<7:37:04,  2.27s/it] 40%|█████████████████████████████████                                                  | 8016/20117 [5:03:18<7:37:53,  2.27s/it] 40%|█████████████████████████████████                                                  | 8017/20117 [5:03:20<7:40:36,  2.28s/it] 40%|█████████████████████████████████                                                  | 8018/20117 [5:03:23<7:41:08,  2.29s/it] 40%|█████████████████████████████████                                                  | 8019/20117 [5:03:25<7:41:09,  2.29s/it] 40%|█████████████████████████████████                                                  | 8020/20117 [5:03:27<7:39:29,  2.28s/it]                                                                                                                                 {'loss': 0.1929, 'grad_norm': 0.3688840866088867, 'learning_rate': 0.0001322092684809208, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 289.68, 'epoch': 0.8}
 40%|█████████████████████████████████                                                  | 8020/20117 [5:03:27<7:39:29,  2.28s/it] 40%|█████████████████████████████████                                                  | 8021/20117 [5:03:29<7:40:12,  2.28s/it] 40%|█████████████████████████████████                                                  | 8022/20117 [5:03:32<7:39:42,  2.28s/it] 40%|█████████████████████████████████                                                  | 8023/20117 [5:03:34<7:40:38,  2.29s/it] 40%|█████████████████████████████████                                                  | 8024/20117 [5:03:36<7:41:53,  2.29s/it] 40%|█████████████████████████████████                                                  | 8025/20117 [5:03:39<7:40:23,  2.28s/it] 40%|█████████████████████████████████                                                  | 8026/20117 [5:03:41<7:37:43,  2.27s/it] 40%|█████████████████████████████████                                                  | 8027/20117 [5:03:43<7:44:46,  2.31s/it] 40%|█████████████████████████████████                                                  | 8028/20117 [5:03:45<7:41:30,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8029/20117 [5:03:48<7:40:37,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8030/20117 [5:03:50<7:40:57,  2.29s/it]                                                                                                                                 {'loss': 0.2318, 'grad_norm': 0.31160444021224976, 'learning_rate': 0.00013206064660084227, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 351.62, 'epoch': 0.8}
 40%|█████████████████████████████████▏                                                 | 8030/20117 [5:03:50<7:40:57,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8031/20117 [5:03:52<7:47:16,  2.32s/it] 40%|█████████████████████████████████▏                                                 | 8032/20117 [5:03:55<7:43:13,  2.30s/it] 40%|█████████████████████████████████▏                                                 | 8033/20117 [5:03:57<7:44:57,  2.31s/it] 40%|█████████████████████████████████▏                                                 | 8034/20117 [5:03:59<7:41:16,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8035/20117 [5:04:01<7:38:56,  2.28s/it] 40%|█████████████████████████████████▏                                                 | 8036/20117 [5:04:04<7:45:14,  2.31s/it] 40%|█████████████████████████████████▏                                                 | 8037/20117 [5:04:06<7:41:56,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8038/20117 [5:04:08<7:41:51,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8039/20117 [5:04:11<7:42:43,  2.30s/it] 40%|█████████████████████████████████▏                                                 | 8040/20117 [5:04:13<7:43:40,  2.30s/it]                                                                                                                                 {'loss': 0.2808, 'grad_norm': 0.46203359961509705, 'learning_rate': 0.000131911945748615, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.06, 'epoch': 0.8}
 40%|█████████████████████████████████▏                                                 | 8040/20117 [5:04:13<7:43:40,  2.30s/it] 40%|█████████████████████████████████▏                                                 | 8041/20117 [5:04:15<7:43:38,  2.30s/it] 40%|█████████████████████████████████▏                                                 | 8042/20117 [5:04:18<7:41:22,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8043/20117 [5:04:20<7:41:05,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8044/20117 [5:04:22<7:43:17,  2.30s/it] 40%|█████████████████████████████████▏                                                 | 8045/20117 [5:04:24<7:41:59,  2.30s/it] 40%|█████████████████████████████████▏                                                 | 8046/20117 [5:04:27<7:44:12,  2.31s/it] 40%|█████████████████████████████████▏                                                 | 8047/20117 [5:04:29<7:46:08,  2.32s/it] 40%|█████████████████████████████████▏                                                 | 8048/20117 [5:04:31<7:41:31,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8049/20117 [5:04:34<7:41:49,  2.30s/it] 40%|█████████████████████████████████▏                                                 | 8050/20117 [5:04:36<7:40:55,  2.29s/it]                                                                                                                                 {'loss': 0.2065, 'grad_norm': 0.4416126608848572, 'learning_rate': 0.00013176316629052054, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 312.23, 'epoch': 0.8}
 40%|█████████████████████████████████▏                                                 | 8050/20117 [5:04:36<7:40:55,  2.29s/it] 40%|█████████████████████████████████▏                                                 | 8051/20117 [5:04:38<7:47:07,  2.32s/it] 40%|█████████████████████████████████▏                                                 | 8052/20117 [5:04:41<7:45:53,  2.32s/it] 40%|█████████████████████████████████▏                                                 | 8053/20117 [5:04:43<7:44:21,  2.31s/it] 40%|█████████████████████████████████▏                                                 | 8054/20117 [5:04:46<8:00:32,  2.39s/it] 40%|█████████████████████████████████▏                                                 | 8055/20117 [5:04:48<7:56:19,  2.37s/it] 40%|█████████████████████████████████▏                                                 | 8056/20117 [5:04:50<7:48:07,  2.33s/it] 40%|█████████████████████████████████▏                                                 | 8057/20117 [5:04:52<7:50:25,  2.34s/it] 40%|█████████████████████████████████▏                                                 | 8058/20117 [5:04:55<7:45:39,  2.32s/it] 40%|█████████████████████████████████▎                                                 | 8059/20117 [5:04:57<7:44:36,  2.31s/it] 40%|█████████████████████████████████▎                                                 | 8060/20117 [5:04:59<7:45:24,  2.32s/it]                                                                                                                                 {'loss': 0.1713, 'grad_norm': 0.16777446866035461, 'learning_rate': 0.00013161430859303427, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 347.86, 'epoch': 0.8}
 40%|█████████████████████████████████▎                                                 | 8060/20117 [5:04:59<7:45:24,  2.32s/it] 40%|█████████████████████████████████▎                                                 | 8061/20117 [5:05:02<7:41:41,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8062/20117 [5:05:04<7:38:44,  2.28s/it] 40%|█████████████████████████████████▎                                                 | 8063/20117 [5:05:06<7:41:11,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8064/20117 [5:05:08<7:39:01,  2.29s/it] 40%|█████████████████████████████████▎                                                 | 8065/20117 [5:05:11<7:37:55,  2.28s/it] 40%|█████████████████████████████████▎                                                 | 8066/20117 [5:05:13<7:39:12,  2.29s/it] 40%|█████████████████████████████████▎                                                 | 8067/20117 [5:05:15<7:37:29,  2.28s/it] 40%|█████████████████████████████████▎                                                 | 8068/20117 [5:05:18<7:41:34,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8069/20117 [5:05:20<7:37:38,  2.28s/it] 40%|█████████████████████████████████▎                                                 | 8070/20117 [5:05:22<7:37:34,  2.28s/it]                                                                                                                                 {'loss': 0.1954, 'grad_norm': 0.3447447121143341, 'learning_rate': 0.0001314653730228241, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 372.63, 'epoch': 0.8}
 40%|█████████████████████████████████▎                                                 | 8070/20117 [5:05:22<7:37:34,  2.28s/it] 40%|█████████████████████████████████▎                                                 | 8071/20117 [5:05:24<7:38:57,  2.29s/it] 40%|█████████████████████████████████▎                                                 | 8072/20117 [5:05:27<7:40:24,  2.29s/it] 40%|█████████████████████████████████▎                                                 | 8073/20117 [5:05:29<7:35:55,  2.27s/it] 40%|█████████████████████████████████▎                                                 | 8074/20117 [5:05:31<7:35:47,  2.27s/it] 40%|█████████████████████████████████▎                                                 | 8075/20117 [5:05:33<7:34:31,  2.26s/it] 40%|█████████████████████████████████▎                                                 | 8076/20117 [5:05:36<7:36:12,  2.27s/it] 40%|█████████████████████████████████▎                                                 | 8077/20117 [5:05:38<7:37:40,  2.28s/it] 40%|█████████████████████████████████▎                                                 | 8078/20117 [5:05:40<7:36:00,  2.27s/it] 40%|█████████████████████████████████▎                                                 | 8079/20117 [5:05:43<7:39:51,  2.29s/it] 40%|█████████████████████████████████▎                                                 | 8080/20117 [5:05:45<7:41:51,  2.30s/it]                                                                                                                                 {'loss': 0.1929, 'grad_norm': 0.4045270085334778, 'learning_rate': 0.0001313163599467498, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 355.89, 'epoch': 0.8}
 40%|█████████████████████████████████▎                                                 | 8080/20117 [5:05:45<7:41:51,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8081/20117 [5:05:47<7:40:05,  2.29s/it] 40%|█████████████████████████████████▎                                                 | 8082/20117 [5:05:50<7:42:20,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8083/20117 [5:05:52<7:41:43,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8084/20117 [5:05:54<7:38:08,  2.28s/it] 40%|█████████████████████████████████▎                                                 | 8085/20117 [5:05:57<7:42:20,  2.31s/it] 40%|█████████████████████████████████▎                                                 | 8086/20117 [5:05:59<7:40:26,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8087/20117 [5:06:01<7:41:38,  2.30s/it] 40%|█████████████████████████████████▎                                                 | 8088/20117 [5:06:03<7:38:46,  2.29s/it] 40%|█████████████████████████████████▎                                                 | 8089/20117 [5:06:06<7:37:22,  2.28s/it] 40%|█████████████████████████████████▍                                                 | 8090/20117 [5:06:08<7:40:13,  2.30s/it]                                                                                                                                 {'loss': 0.2551, 'grad_norm': 0.462365984916687, 'learning_rate': 0.00013116726973186208, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 361.83, 'epoch': 0.8}
 40%|█████████████████████████████████▍                                                 | 8090/20117 [5:06:08<7:40:13,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8091/20117 [5:06:10<7:38:46,  2.29s/it] 40%|█████████████████████████████████▍                                                 | 8092/20117 [5:06:13<7:38:35,  2.29s/it] 40%|█████████████████████████████████▍                                                 | 8093/20117 [5:06:15<7:40:14,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8094/20117 [5:06:17<7:41:02,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8095/20117 [5:06:19<7:40:00,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8096/20117 [5:06:22<7:39:42,  2.29s/it] 40%|█████████████████████████████████▍                                                 | 8097/20117 [5:06:24<7:39:58,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8098/20117 [5:06:26<7:41:54,  2.31s/it] 40%|█████████████████████████████████▍                                                 | 8099/20117 [5:06:29<7:40:04,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8100/20117 [5:06:31<7:41:43,  2.31s/it]                                                                                                                                 {'loss': 0.2499, 'grad_norm': 0.5786636471748352, 'learning_rate': 0.00013101810274540168, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 300.27, 'epoch': 0.81}
 40%|█████████████████████████████████▍                                                 | 8100/20117 [5:06:31<7:41:43,  2.31s/it] 40%|█████████████████████████████████▍                                                 | 8101/20117 [5:06:33<7:45:14,  2.32s/it] 40%|█████████████████████████████████▍                                                 | 8102/20117 [5:06:36<7:43:18,  2.31s/it] 40%|█████████████████████████████████▍                                                 | 8103/20117 [5:06:38<7:41:50,  2.31s/it] 40%|█████████████████████████████████▍                                                 | 8104/20117 [5:06:40<7:42:54,  2.31s/it] 40%|█████████████████████████████████▍                                                 | 8105/20117 [5:06:42<7:40:16,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8106/20117 [5:06:45<7:42:38,  2.31s/it] 40%|█████████████████████████████████▍                                                 | 8107/20117 [5:06:47<7:52:40,  2.36s/it] 40%|█████████████████████████████████▍                                                 | 8108/20117 [5:06:50<7:44:37,  2.32s/it] 40%|█████████████████████████████████▍                                                 | 8109/20117 [5:06:52<7:35:01,  2.27s/it] 40%|█████████████████████████████████▍                                                 | 8110/20117 [5:06:54<7:26:18,  2.23s/it]                                                                                                                                 {'loss': 0.2186, 'grad_norm': 0.3487481474876404, 'learning_rate': 0.0001308688593547984, 'memory/max_active (GiB)': 17.11, 'memory/max_allocated (GiB)': 17.11, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 312.52, 'epoch': 0.81}
 40%|█████████████████████████████████▍                                                 | 8110/20117 [5:06:54<7:26:18,  2.23s/it] 40%|█████████████████████████████████▍                                                 | 8111/20117 [5:06:56<7:23:38,  2.22s/it] 40%|█████████████████████████████████▍                                                 | 8112/20117 [5:06:58<7:28:24,  2.24s/it] 40%|█████████████████████████████████▍                                                 | 8113/20117 [5:07:01<7:29:41,  2.25s/it] 40%|█████████████████████████████████▍                                                 | 8114/20117 [5:07:03<7:39:37,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8115/20117 [5:07:05<7:45:35,  2.33s/it] 40%|█████████████████████████████████▍                                                 | 8116/20117 [5:07:08<7:44:59,  2.32s/it] 40%|█████████████████████████████████▍                                                 | 8117/20117 [5:07:10<7:40:26,  2.30s/it] 40%|█████████████████████████████████▍                                                 | 8118/20117 [5:07:12<7:42:05,  2.31s/it] 40%|█████████████████████████████████▍                                                 | 8119/20117 [5:07:15<7:41:23,  2.31s/it] 40%|█████████████████████████████████▌                                                 | 8120/20117 [5:07:17<7:46:23,  2.33s/it]                                                                                                                                 {'loss': 0.2167, 'grad_norm': 0.3610248863697052, 'learning_rate': 0.00013071953992767015, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 328.39, 'epoch': 0.81}
 40%|█████████████████████████████████▌                                                 | 8120/20117 [5:07:17<7:46:23,  2.33s/it] 40%|█████████████████████████████████▌                                                 | 8121/20117 [5:07:19<7:43:17,  2.32s/it] 40%|█████████████████████████████████▌                                                 | 8122/20117 [5:07:21<7:34:54,  2.28s/it] 40%|█████████████████████████████████▌                                                 | 8123/20117 [5:07:24<7:27:56,  2.24s/it] 40%|█████████████████████████████████▌                                                 | 8124/20117 [5:07:26<7:24:03,  2.22s/it] 40%|█████████████████████████████████▌                                                 | 8125/20117 [5:07:28<7:26:49,  2.24s/it] 40%|█████████████████████████████████▌                                                 | 8126/20117 [5:07:30<7:31:44,  2.26s/it] 40%|█████████████████████████████████▌                                                 | 8127/20117 [5:07:33<7:40:35,  2.30s/it] 40%|█████████████████████████████████▌                                                 | 8128/20117 [5:07:35<7:40:59,  2.31s/it] 40%|█████████████████████████████████▌                                                 | 8129/20117 [5:07:37<7:40:29,  2.30s/it] 40%|█████████████████████████████████▌                                                 | 8130/20117 [5:07:40<7:38:22,  2.29s/it]                                                                                                                                 {'loss': 0.241, 'grad_norm': 0.37153443694114685, 'learning_rate': 0.00013057014483182242, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 344.6, 'epoch': 0.81}
 40%|█████████████████████████████████▌                                                 | 8130/20117 [5:07:40<7:38:22,  2.29s/it] 40%|█████████████████████████████████▌                                                 | 8131/20117 [5:07:42<7:35:43,  2.28s/it] 40%|█████████████████████████████████▌                                                 | 8132/20117 [5:07:44<7:36:57,  2.29s/it] 40%|█████████████████████████████████▌                                                 | 8133/20117 [5:07:46<7:33:28,  2.27s/it] 40%|█████████████████████████████████▌                                                 | 8134/20117 [5:07:49<7:34:09,  2.27s/it] 40%|█████████████████████████████████▌                                                 | 8135/20117 [5:07:51<7:33:38,  2.27s/it] 40%|█████████████████████████████████▌                                                 | 8136/20117 [5:07:53<7:32:54,  2.27s/it] 40%|█████████████████████████████████▌                                                 | 8137/20117 [5:07:56<7:36:13,  2.28s/it] 40%|█████████████████████████████████▌                                                 | 8138/20117 [5:07:58<7:34:43,  2.28s/it] 40%|█████████████████████████████████▌                                                 | 8139/20117 [5:08:00<7:30:50,  2.26s/it] 40%|█████████████████████████████████▌                                                 | 8140/20117 [5:08:02<7:31:43,  2.26s/it]                                                                                                                                 {'loss': 0.2749, 'grad_norm': 0.3705120086669922, 'learning_rate': 0.00013042067443524681, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 332.87, 'epoch': 0.81}
 40%|█████████████████████████████████▌                                                 | 8140/20117 [5:08:02<7:31:43,  2.26s/it] 40%|█████████████████████████████████▌                                                 | 8141/20117 [5:08:05<7:35:25,  2.28s/it] 40%|█████████████████████████████████▌                                                 | 8142/20117 [5:08:07<7:34:43,  2.28s/it] 40%|█████████████████████████████████▌                                                 | 8143/20117 [5:08:09<7:33:48,  2.27s/it] 40%|█████████████████████████████████▌                                                 | 8144/20117 [5:08:11<7:34:49,  2.28s/it] 40%|█████████████████████████████████▌                                                 | 8145/20117 [5:08:14<7:38:34,  2.30s/it] 40%|█████████████████████████████████▌                                                 | 8146/20117 [5:08:16<7:36:28,  2.29s/it] 40%|█████████████████████████████████▌                                                 | 8147/20117 [5:08:18<7:32:20,  2.27s/it] 41%|█████████████████████████████████▌                                                 | 8148/20117 [5:08:21<7:34:54,  2.28s/it] 41%|█████████████████████████████████▌                                                 | 8149/20117 [5:08:23<7:36:26,  2.29s/it] 41%|█████████████████████████████████▋                                                 | 8150/20117 [5:08:25<7:34:59,  2.28s/it]                                                                                                                                 {'loss': 0.1438, 'grad_norm': 0.3014324903488159, 'learning_rate': 0.00013027112910612052, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 308.05, 'epoch': 0.81}
 41%|█████████████████████████████████▋                                                 | 8150/20117 [5:08:25<7:34:59,  2.28s/it] 41%|█████████████████████████████████▋                                                 | 8151/20117 [5:08:27<7:38:08,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8152/20117 [5:08:30<7:37:16,  2.29s/it] 41%|█████████████████████████████████▋                                                 | 8153/20117 [5:08:32<7:36:56,  2.29s/it] 41%|█████████████████████████████████▋                                                 | 8154/20117 [5:08:34<7:39:20,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8155/20117 [5:08:37<7:35:41,  2.29s/it] 41%|█████████████████████████████████▋                                                 | 8156/20117 [5:08:39<7:33:55,  2.28s/it] 41%|█████████████████████████████████▋                                                 | 8157/20117 [5:08:41<7:32:32,  2.27s/it] 41%|█████████████████████████████████▋                                                 | 8158/20117 [5:08:43<7:31:40,  2.27s/it] 41%|█████████████████████████████████▋                                                 | 8159/20117 [5:08:46<7:32:58,  2.27s/it] 41%|█████████████████████████████████▋                                                 | 8160/20117 [5:08:48<7:54:05,  2.38s/it]                                                                                                                                 {'loss': 0.2032, 'grad_norm': 0.5011573433876038, 'learning_rate': 0.00013012150921280527, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 296.89, 'epoch': 0.81}
 41%|█████████████████████████████████▋                                                 | 8160/20117 [5:08:48<7:54:05,  2.38s/it] 41%|█████████████████████████████████▋                                                 | 8161/20117 [5:08:51<7:53:04,  2.37s/it] 41%|█████████████████████████████████▋                                                 | 8162/20117 [5:08:53<7:45:46,  2.34s/it] 41%|█████████████████████████████████▋                                                 | 8163/20117 [5:08:55<7:46:00,  2.34s/it] 41%|█████████████████████████████████▋                                                 | 8164/20117 [5:08:58<7:42:08,  2.32s/it] 41%|█████████████████████████████████▋                                                 | 8165/20117 [5:09:00<7:37:58,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8166/20117 [5:09:02<7:37:11,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8167/20117 [5:09:04<7:40:33,  2.31s/it] 41%|█████████████████████████████████▋                                                 | 8168/20117 [5:09:07<7:41:19,  2.32s/it] 41%|█████████████████████████████████▋                                                 | 8169/20117 [5:09:09<7:41:26,  2.32s/it] 41%|█████████████████████████████████▋                                                 | 8170/20117 [5:09:11<7:42:08,  2.32s/it]                                                                                                                                 {'loss': 0.2055, 'grad_norm': 0.3287215530872345, 'learning_rate': 0.00012997181512384653, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 405.75, 'epoch': 0.81}
 41%|█████████████████████████████████▋                                                 | 8170/20117 [5:09:11<7:42:08,  2.32s/it] 41%|█████████████████████████████████▋                                                 | 8171/20117 [5:09:14<7:39:36,  2.31s/it] 41%|█████████████████████████████████▋                                                 | 8172/20117 [5:09:16<7:37:13,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8173/20117 [5:09:18<7:38:23,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8174/20117 [5:09:21<7:40:54,  2.32s/it] 41%|█████████████████████████████████▋                                                 | 8175/20117 [5:09:23<7:43:04,  2.33s/it] 41%|█████████████████████████████████▋                                                 | 8176/20117 [5:09:25<7:38:07,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8177/20117 [5:09:27<7:34:10,  2.28s/it] 41%|█████████████████████████████████▋                                                 | 8178/20117 [5:09:30<7:37:07,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8179/20117 [5:09:32<7:37:54,  2.30s/it] 41%|█████████████████████████████████▋                                                 | 8180/20117 [5:09:34<7:35:51,  2.29s/it]                                                                                                                                 {'loss': 0.2805, 'grad_norm': 0.7308348417282104, 'learning_rate': 0.00012982204720797245, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 443.15, 'epoch': 0.81}
 41%|█████████████████████████████████▋                                                 | 8180/20117 [5:09:34<7:35:51,  2.29s/it] 41%|█████████████████████████████████▊                                                 | 8181/20117 [5:09:37<7:37:37,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8182/20117 [5:09:39<7:33:56,  2.28s/it] 41%|█████████████████████████████████▊                                                 | 8183/20117 [5:09:41<7:41:59,  2.32s/it] 41%|█████████████████████████████████▊                                                 | 8184/20117 [5:09:44<7:38:35,  2.31s/it] 41%|█████████████████████████████████▊                                                 | 8185/20117 [5:09:46<7:32:52,  2.28s/it] 41%|█████████████████████████████████▊                                                 | 8186/20117 [5:09:48<7:32:38,  2.28s/it] 41%|█████████████████████████████████▊                                                 | 8187/20117 [5:09:50<7:30:00,  2.26s/it] 41%|█████████████████████████████████▊                                                 | 8188/20117 [5:09:53<7:30:30,  2.27s/it] 41%|█████████████████████████████████▊                                                 | 8189/20117 [5:09:55<7:36:11,  2.29s/it] 41%|█████████████████████████████████▊                                                 | 8190/20117 [5:09:57<7:35:11,  2.29s/it]                                                                                                                                 {'loss': 0.2066, 'grad_norm': 0.5808447599411011, 'learning_rate': 0.00012967220583409304, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 401.43, 'epoch': 0.81}
 41%|█████████████████████████████████▊                                                 | 8190/20117 [5:09:57<7:35:11,  2.29s/it] 41%|█████████████████████████████████▊                                                 | 8191/20117 [5:10:00<7:36:23,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8192/20117 [5:10:02<7:35:12,  2.29s/it] 41%|█████████████████████████████████▊                                                 | 8193/20117 [5:10:04<7:38:27,  2.31s/it] 41%|█████████████████████████████████▊                                                 | 8194/20117 [5:10:06<7:36:41,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8195/20117 [5:10:09<7:33:28,  2.28s/it] 41%|█████████████████████████████████▊                                                 | 8196/20117 [5:10:11<7:33:41,  2.28s/it] 41%|█████████████████████████████████▊                                                 | 8197/20117 [5:10:13<7:36:24,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8198/20117 [5:10:16<7:33:49,  2.28s/it] 41%|█████████████████████████████████▊                                                 | 8199/20117 [5:10:18<7:33:09,  2.28s/it] 41%|█████████████████████████████████▊                                                 | 8200/20117 [5:10:20<7:38:01,  2.31s/it]                                                                                                                                 {'loss': 0.2464, 'grad_norm': 0.31127744913101196, 'learning_rate': 0.0001295222913712993, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.59, 'epoch': 0.82}
 41%|█████████████████████████████████▊                                                 | 8200/20117 [5:10:20<7:38:01,  2.31s/it] 41%|█████████████████████████████████▊                                                 | 8201/20117 [5:10:22<7:35:23,  2.29s/it] 41%|█████████████████████████████████▊                                                 | 8202/20117 [5:10:25<7:37:32,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8203/20117 [5:10:27<7:37:04,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8204/20117 [5:10:29<7:36:04,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8205/20117 [5:10:32<7:39:02,  2.31s/it] 41%|█████████████████████████████████▊                                                 | 8206/20117 [5:10:34<7:37:54,  2.31s/it] 41%|█████████████████████████████████▊                                                 | 8207/20117 [5:10:36<7:36:47,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8208/20117 [5:10:39<7:35:37,  2.30s/it] 41%|█████████████████████████████████▊                                                 | 8209/20117 [5:10:41<7:38:13,  2.31s/it] 41%|█████████████████████████████████▊                                                 | 8210/20117 [5:10:43<7:35:11,  2.29s/it]                                                                                                                                 {'loss': 0.1986, 'grad_norm': 0.2803351581096649, 'learning_rate': 0.00012937230418886224, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 401.58, 'epoch': 0.82}
 41%|█████████████████████████████████▊                                                 | 8210/20117 [5:10:43<7:35:11,  2.29s/it] 41%|█████████████████████████████████▉                                                 | 8211/20117 [5:10:45<7:32:07,  2.28s/it] 41%|█████████████████████████████████▉                                                 | 8212/20117 [5:10:48<7:31:35,  2.28s/it] 41%|█████████████████████████████████▉                                                 | 8213/20117 [5:10:50<7:32:42,  2.28s/it] 41%|█████████████████████████████████▉                                                 | 8214/20117 [5:10:53<7:52:59,  2.38s/it] 41%|█████████████████████████████████▉                                                 | 8215/20117 [5:10:55<7:44:11,  2.34s/it] 41%|█████████████████████████████████▉                                                 | 8216/20117 [5:10:57<7:42:33,  2.33s/it] 41%|█████████████████████████████████▉                                                 | 8217/20117 [5:10:59<7:36:50,  2.30s/it] 41%|█████████████████████████████████▉                                                 | 8218/20117 [5:11:02<7:34:49,  2.29s/it] 41%|█████████████████████████████████▉                                                 | 8219/20117 [5:11:04<7:38:13,  2.31s/it] 41%|█████████████████████████████████▉                                                 | 8220/20117 [5:11:06<7:33:03,  2.28s/it]                                                                                                                                 {'loss': 0.1904, 'grad_norm': 0.45312055945396423, 'learning_rate': 0.000129222244656232, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 290.19, 'epoch': 0.82}
 41%|█████████████████████████████████▉                                                 | 8220/20117 [5:11:06<7:33:03,  2.28s/it] 41%|█████████████████████████████████▉                                                 | 8221/20117 [5:11:09<7:31:05,  2.28s/it] 41%|█████████████████████████████████▉                                                 | 8222/20117 [5:11:11<7:28:57,  2.26s/it] 41%|█████████████████████████████████▉                                                 | 8223/20117 [5:11:13<7:35:38,  2.30s/it] 41%|█████████████████████████████████▉                                                 | 8224/20117 [5:11:15<7:35:33,  2.30s/it] 41%|█████████████████████████████████▉                                                 | 8225/20117 [5:11:18<7:34:30,  2.29s/it] 41%|█████████████████████████████████▉                                                 | 8226/20117 [5:11:20<7:34:51,  2.30s/it] 41%|█████████████████████████████████▉                                                 | 8227/20117 [5:11:22<7:32:07,  2.28s/it] 41%|█████████████████████████████████▉                                                 | 8228/20117 [5:11:24<7:28:17,  2.26s/it] 41%|█████████████████████████████████▉                                                 | 8229/20117 [5:11:27<7:28:51,  2.27s/it] 41%|█████████████████████████████████▉                                                 | 8230/20117 [5:11:29<7:28:41,  2.26s/it]                                                                                                                                 {'loss': 0.2074, 'grad_norm': 0.4169121980667114, 'learning_rate': 0.0001290721131430369, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 320.72, 'epoch': 0.82}
 41%|█████████████████████████████████▉                                                 | 8230/20117 [5:11:29<7:28:41,  2.26s/it] 41%|█████████████████████████████████▉                                                 | 8231/20117 [5:11:31<7:27:38,  2.26s/it] 41%|█████████████████████████████████▉                                                 | 8232/20117 [5:11:34<7:29:17,  2.27s/it] 41%|█████████████████████████████████▉                                                 | 8233/20117 [5:11:36<7:29:53,  2.27s/it] 41%|█████████████████████████████████▉                                                 | 8234/20117 [5:11:38<7:28:58,  2.27s/it] 41%|█████████████████████████████████▉                                                 | 8235/20117 [5:11:40<7:29:28,  2.27s/it] 41%|█████████████████████████████████▉                                                 | 8236/20117 [5:11:43<7:27:13,  2.26s/it] 41%|█████████████████████████████████▉                                                 | 8237/20117 [5:11:45<7:29:28,  2.27s/it] 41%|█████████████████████████████████▉                                                 | 8238/20117 [5:11:47<7:27:04,  2.26s/it] 41%|█████████████████████████████████▉                                                 | 8239/20117 [5:11:49<7:25:53,  2.25s/it] 41%|█████████████████████████████████▉                                                 | 8240/20117 [5:11:52<7:28:44,  2.27s/it]                                                                                                                                 {'loss': 0.2809, 'grad_norm': 0.5725305080413818, 'learning_rate': 0.0001289219100190826, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 334.56, 'epoch': 0.82}
 41%|█████████████████████████████████▉                                                 | 8240/20117 [5:11:52<7:28:44,  2.27s/it] 41%|██████████████████████████████████                                                 | 8241/20117 [5:11:54<7:32:32,  2.29s/it] 41%|██████████████████████████████████                                                 | 8242/20117 [5:11:56<7:28:32,  2.27s/it] 41%|██████████████████████████████████                                                 | 8243/20117 [5:11:59<7:30:41,  2.28s/it] 41%|██████████████████████████████████                                                 | 8244/20117 [5:12:01<7:28:11,  2.26s/it] 41%|██████████████████████████████████                                                 | 8245/20117 [5:12:03<7:30:34,  2.28s/it] 41%|██████████████████████████████████                                                 | 8246/20117 [5:12:05<7:32:27,  2.29s/it] 41%|██████████████████████████████████                                                 | 8247/20117 [5:12:08<7:29:18,  2.27s/it] 41%|██████████████████████████████████                                                 | 8248/20117 [5:12:10<7:28:39,  2.27s/it] 41%|██████████████████████████████████                                                 | 8249/20117 [5:12:12<7:27:46,  2.26s/it] 41%|██████████████████████████████████                                                 | 8250/20117 [5:12:14<7:24:54,  2.25s/it]                                                                                                                                 {'loss': 0.1873, 'grad_norm': 0.4698241055011749, 'learning_rate': 0.00012877163565435114, 'memory/max_active (GiB)': 19.66, 'memory/max_allocated (GiB)': 19.66, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 319.13, 'epoch': 0.82}
 41%|██████████████████████████████████                                                 | 8250/20117 [5:12:14<7:24:54,  2.25s/it] 41%|██████████████████████████████████                                                 | 8251/20117 [5:12:17<7:30:04,  2.28s/it] 41%|██████████████████████████████████                                                 | 8252/20117 [5:12:19<7:27:53,  2.26s/it] 41%|██████████████████████████████████                                                 | 8253/20117 [5:12:21<7:26:12,  2.26s/it] 41%|██████████████████████████████████                                                 | 8254/20117 [5:12:23<7:30:53,  2.28s/it] 41%|██████████████████████████████████                                                 | 8255/20117 [5:12:26<7:27:07,  2.26s/it] 41%|██████████████████████████████████                                                 | 8256/20117 [5:12:28<7:29:32,  2.27s/it] 41%|██████████████████████████████████                                                 | 8257/20117 [5:12:30<7:30:38,  2.28s/it] 41%|██████████████████████████████████                                                 | 8258/20117 [5:12:33<7:34:33,  2.30s/it] 41%|██████████████████████████████████                                                 | 8259/20117 [5:12:35<7:34:02,  2.30s/it] 41%|██████████████████████████████████                                                 | 8260/20117 [5:12:37<7:38:16,  2.32s/it]                                                                                                                                 {'loss': 0.2172, 'grad_norm': 0.33453720808029175, 'learning_rate': 0.000128621290419, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 306.65, 'epoch': 0.82}
 41%|██████████████████████████████████                                                 | 8260/20117 [5:12:37<7:38:16,  2.32s/it] 41%|██████████████████████████████████                                                 | 8261/20117 [5:12:40<7:38:26,  2.32s/it] 41%|██████████████████████████████████                                                 | 8262/20117 [5:12:42<7:37:09,  2.31s/it] 41%|██████████████████████████████████                                                 | 8263/20117 [5:12:44<7:32:56,  2.29s/it] 41%|██████████████████████████████████                                                 | 8264/20117 [5:12:46<7:30:46,  2.28s/it] 41%|██████████████████████████████████                                                 | 8265/20117 [5:12:49<7:32:26,  2.29s/it] 41%|██████████████████████████████████                                                 | 8266/20117 [5:12:51<7:49:18,  2.38s/it] 41%|██████████████████████████████████                                                 | 8267/20117 [5:12:54<7:44:17,  2.35s/it] 41%|██████████████████████████████████                                                 | 8268/20117 [5:12:56<7:40:04,  2.33s/it] 41%|██████████████████████████████████                                                 | 8269/20117 [5:12:58<7:35:19,  2.31s/it] 41%|██████████████████████████████████                                                 | 8270/20117 [5:13:00<7:36:50,  2.31s/it]                                                                                                                                 {'loss': 0.2102, 'grad_norm': 0.20304298400878906, 'learning_rate': 0.00012847087468336135, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 311.33, 'epoch': 0.82}
 41%|██████████████████████████████████                                                 | 8270/20117 [5:13:00<7:36:50,  2.31s/it] 41%|██████████████████████████████████▏                                                | 8271/20117 [5:13:03<7:37:22,  2.32s/it] 41%|██████████████████████████████████▏                                                | 8272/20117 [5:13:05<7:38:56,  2.32s/it] 41%|██████████████████████████████████▏                                                | 8273/20117 [5:13:08<7:46:24,  2.36s/it] 41%|██████████████████████████████████▏                                                | 8274/20117 [5:13:10<7:40:30,  2.33s/it] 41%|██████████████████████████████████▏                                                | 8275/20117 [5:13:12<7:38:41,  2.32s/it] 41%|██████████████████████████████████▏                                                | 8276/20117 [5:13:14<7:36:13,  2.31s/it] 41%|██████████████████████████████████▏                                                | 8277/20117 [5:13:17<7:38:14,  2.32s/it] 41%|██████████████████████████████████▏                                                | 8278/20117 [5:13:19<7:38:34,  2.32s/it] 41%|██████████████████████████████████▏                                                | 8279/20117 [5:13:21<7:33:42,  2.30s/it] 41%|██████████████████████████████████▏                                                | 8280/20117 [5:13:24<7:41:14,  2.34s/it]                                                                                                                                 {'loss': 0.2437, 'grad_norm': 0.27367445826530457, 'learning_rate': 0.00012832038881794086, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 417.4, 'epoch': 0.82}
 41%|██████████████████████████████████▏                                                | 8280/20117 [5:13:24<7:41:14,  2.34s/it] 41%|██████████████████████████████████▏                                                | 8281/20117 [5:13:26<7:39:18,  2.33s/it] 41%|██████████████████████████████████▏                                                | 8282/20117 [5:13:28<7:35:02,  2.31s/it] 41%|██████████████████████████████████▏                                                | 8283/20117 [5:13:31<7:30:10,  2.28s/it] 41%|██████████████████████████████████▏                                                | 8284/20117 [5:13:33<7:28:51,  2.28s/it] 41%|██████████████████████████████████▏                                                | 8285/20117 [5:13:35<7:26:10,  2.26s/it] 41%|██████████████████████████████████▏                                                | 8286/20117 [5:13:37<7:29:26,  2.28s/it] 41%|██████████████████████████████████▏                                                | 8287/20117 [5:13:40<7:30:14,  2.28s/it] 41%|██████████████████████████████████▏                                                | 8288/20117 [5:13:42<7:29:58,  2.28s/it] 41%|██████████████████████████████████▏                                                | 8289/20117 [5:13:44<7:37:47,  2.32s/it] 41%|██████████████████████████████████▏                                                | 8290/20117 [5:13:47<7:34:36,  2.31s/it]                                                                                                                                 {'loss': 0.2692, 'grad_norm': 0.3914351463317871, 'learning_rate': 0.00012816983319341712, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 342.88, 'epoch': 0.82}
 41%|██████████████████████████████████▏                                                | 8290/20117 [5:13:47<7:34:36,  2.31s/it] 41%|██████████████████████████████████▏                                                | 8291/20117 [5:13:49<7:34:17,  2.30s/it] 41%|██████████████████████████████████▏                                                | 8292/20117 [5:13:51<7:32:15,  2.29s/it] 41%|██████████████████████████████████▏                                                | 8293/20117 [5:13:53<7:23:54,  2.25s/it] 41%|██████████████████████████████████▏                                                | 8294/20117 [5:13:56<7:18:03,  2.22s/it] 41%|██████████████████████████████████▏                                                | 8295/20117 [5:13:58<7:12:33,  2.20s/it] 41%|██████████████████████████████████▏                                                | 8296/20117 [5:14:00<7:08:42,  2.18s/it] 41%|██████████████████████████████████▏                                                | 8297/20117 [5:14:02<7:11:21,  2.19s/it] 41%|██████████████████████████████████▏                                                | 8298/20117 [5:14:04<7:16:00,  2.21s/it] 41%|██████████████████████████████████▏                                                | 8299/20117 [5:14:07<7:34:59,  2.31s/it] 41%|██████████████████████████████████▏                                                | 8300/20117 [5:14:09<7:38:22,  2.33s/it]                                                                                                                                 {'loss': 0.2341, 'grad_norm': 0.5103800296783447, 'learning_rate': 0.00012801920818064034, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 362.66, 'epoch': 0.83}
 41%|██████████████████████████████████▏                                                | 8300/20117 [5:14:09<7:38:22,  2.33s/it] 41%|██████████████████████████████████▏                                                | 8301/20117 [5:14:12<7:44:06,  2.36s/it] 41%|██████████████████████████████████▎                                                | 8302/20117 [5:14:14<7:40:19,  2.34s/it] 41%|██████████████████████████████████▎                                                | 8303/20117 [5:14:16<7:38:37,  2.33s/it] 41%|██████████████████████████████████▎                                                | 8304/20117 [5:14:19<7:37:51,  2.33s/it] 41%|██████████████████████████████████▎                                                | 8305/20117 [5:14:21<7:37:05,  2.32s/it] 41%|██████████████████████████████████▎                                                | 8306/20117 [5:14:23<7:34:19,  2.31s/it] 41%|██████████████████████████████████▎                                                | 8307/20117 [5:14:25<7:35:09,  2.31s/it] 41%|██████████████████████████████████▎                                                | 8308/20117 [5:14:28<7:25:59,  2.27s/it] 41%|██████████████████████████████████▎                                                | 8309/20117 [5:14:30<7:21:54,  2.25s/it] 41%|██████████████████████████████████▎                                                | 8310/20117 [5:14:32<7:24:26,  2.26s/it]                                                                                                                                 {'loss': 0.2141, 'grad_norm': 0.1988827884197235, 'learning_rate': 0.00012786851415063185, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 404.61, 'epoch': 0.83}
 41%|██████████████████████████████████▎                                                | 8310/20117 [5:14:32<7:24:26,  2.26s/it] 41%|██████████████████████████████████▎                                                | 8311/20117 [5:14:34<7:29:20,  2.28s/it] 41%|██████████████████████████████████▎                                                | 8312/20117 [5:14:37<7:33:36,  2.31s/it] 41%|██████████████████████████████████▎                                                | 8313/20117 [5:14:39<7:40:22,  2.34s/it] 41%|██████████████████████████████████▎                                                | 8314/20117 [5:14:41<7:37:31,  2.33s/it] 41%|██████████████████████████████████▎                                                | 8315/20117 [5:14:44<7:32:06,  2.30s/it] 41%|██████████████████████████████████▎                                                | 8316/20117 [5:14:46<7:29:14,  2.28s/it] 41%|██████████████████████████████████▎                                                | 8317/20117 [5:14:48<7:33:17,  2.30s/it] 41%|██████████████████████████████████▎                                                | 8318/20117 [5:14:51<7:30:32,  2.29s/it] 41%|██████████████████████████████████▎                                                | 8319/20117 [5:14:53<7:52:40,  2.40s/it] 41%|██████████████████████████████████▎                                                | 8320/20117 [5:14:56<7:43:48,  2.36s/it]                                                                                                                                 {'loss': 0.2341, 'grad_norm': 0.482526570558548, 'learning_rate': 0.00012771775147458288, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 366.94, 'epoch': 0.83}
 41%|██████████████████████████████████▎                                                | 8320/20117 [5:14:56<7:43:48,  2.36s/it] 41%|██████████████████████████████████▎                                                | 8321/20117 [5:14:58<7:36:59,  2.32s/it] 41%|██████████████████████████████████▎                                                | 8322/20117 [5:15:00<7:36:36,  2.32s/it] 41%|██████████████████████████████████▎                                                | 8323/20117 [5:15:02<7:33:46,  2.31s/it] 41%|██████████████████████████████████▎                                                | 8324/20117 [5:15:05<7:29:18,  2.29s/it] 41%|██████████████████████████████████▎                                                | 8325/20117 [5:15:07<7:30:11,  2.29s/it] 41%|██████████████████████████████████▎                                                | 8326/20117 [5:15:09<7:26:57,  2.27s/it] 41%|██████████████████████████████████▎                                                | 8327/20117 [5:15:11<7:30:04,  2.29s/it] 41%|██████████████████████████████████▎                                                | 8328/20117 [5:15:14<7:34:01,  2.31s/it] 41%|██████████████████████████████████▎                                                | 8329/20117 [5:15:16<7:29:52,  2.29s/it] 41%|██████████████████████████████████▎                                                | 8330/20117 [5:15:18<7:30:58,  2.30s/it]                                                                                                                                 {'loss': 0.2458, 'grad_norm': 0.5179364085197449, 'learning_rate': 0.0001275669205238537, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 308.79, 'epoch': 0.83}
 41%|██████████████████████████████████▎                                                | 8330/20117 [5:15:18<7:30:58,  2.30s/it] 41%|██████████████████████████████████▎                                                | 8331/20117 [5:15:21<7:33:27,  2.31s/it] 41%|██████████████████████████████████▍                                                | 8332/20117 [5:15:23<7:30:16,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8333/20117 [5:15:25<7:28:38,  2.28s/it] 41%|██████████████████████████████████▍                                                | 8334/20117 [5:15:27<7:26:03,  2.27s/it] 41%|██████████████████████████████████▍                                                | 8335/20117 [5:15:30<7:24:18,  2.26s/it] 41%|██████████████████████████████████▍                                                | 8336/20117 [5:15:32<7:28:40,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8337/20117 [5:15:34<7:29:51,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8338/20117 [5:15:37<7:28:38,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8339/20117 [5:15:39<7:29:40,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8340/20117 [5:15:41<7:31:36,  2.30s/it]                                                                                                                                 {'loss': 0.2324, 'grad_norm': 0.4961225390434265, 'learning_rate': 0.00012741602166997288, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 312.63, 'epoch': 0.83}
 41%|██████████████████████████████████▍                                                | 8340/20117 [5:15:41<7:31:36,  2.30s/it] 41%|██████████████████████████████████▍                                                | 8341/20117 [5:15:44<7:33:58,  2.31s/it] 41%|██████████████████████████████████▍                                                | 8342/20117 [5:15:46<7:31:02,  2.30s/it] 41%|██████████████████████████████████▍                                                | 8343/20117 [5:15:48<7:28:39,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8344/20117 [5:15:50<7:32:40,  2.31s/it] 41%|██████████████████████████████████▍                                                | 8345/20117 [5:15:53<7:29:39,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8346/20117 [5:15:55<7:28:36,  2.29s/it] 41%|██████████████████████████████████▍                                                | 8347/20117 [5:15:57<7:33:21,  2.31s/it] 41%|██████████████████████████████████▍                                                | 8348/20117 [5:16:00<7:30:00,  2.29s/it] 42%|██████████████████████████████████▍                                                | 8349/20117 [5:16:02<7:33:20,  2.31s/it] 42%|██████████████████████████████████▍                                                | 8350/20117 [5:16:04<7:27:30,  2.28s/it]                                                                                                                                 {'loss': 0.2808, 'grad_norm': 0.6281317472457886, 'learning_rate': 0.0001272650552846362, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 365.38, 'epoch': 0.83}
 42%|██████████████████████████████████▍                                                | 8350/20117 [5:16:04<7:27:30,  2.28s/it] 42%|██████████████████████████████████▍                                                | 8351/20117 [5:16:07<7:31:07,  2.30s/it] 42%|██████████████████████████████████▍                                                | 8352/20117 [5:16:09<7:30:19,  2.30s/it] 42%|██████████████████████████████████▍                                                | 8353/20117 [5:16:11<7:34:05,  2.32s/it] 42%|██████████████████████████████████▍                                                | 8354/20117 [5:16:14<7:36:10,  2.33s/it] 42%|██████████████████████████████████▍                                                | 8355/20117 [5:16:16<7:34:36,  2.32s/it] 42%|██████████████████████████████████▍                                                | 8356/20117 [5:16:18<7:32:05,  2.31s/it] 42%|██████████████████████████████████▍                                                | 8357/20117 [5:16:20<7:31:13,  2.30s/it] 42%|██████████████████████████████████▍                                                | 8358/20117 [5:16:23<7:30:19,  2.30s/it] 42%|██████████████████████████████████▍                                                | 8359/20117 [5:16:25<7:28:39,  2.29s/it] 42%|██████████████████████████████████▍                                                | 8360/20117 [5:16:27<7:31:06,  2.30s/it]                                                                                                                                 {'loss': 0.2125, 'grad_norm': 0.3268338739871979, 'learning_rate': 0.00012711402173970574, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.45, 'epoch': 0.83}
 42%|██████████████████████████████████▍                                                | 8360/20117 [5:16:27<7:31:06,  2.30s/it] 42%|██████████████████████████████████▍                                                | 8361/20117 [5:16:30<7:29:02,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8362/20117 [5:16:32<7:31:12,  2.30s/it] 42%|██████████████████████████████████▌                                                | 8363/20117 [5:16:34<7:31:40,  2.31s/it] 42%|██████████████████████████████████▌                                                | 8364/20117 [5:16:36<7:29:30,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8365/20117 [5:16:39<7:31:36,  2.31s/it] 42%|██████████████████████████████████▌                                                | 8366/20117 [5:16:41<7:29:25,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8367/20117 [5:16:43<7:27:40,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8368/20117 [5:16:46<7:27:53,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8369/20117 [5:16:48<7:23:01,  2.26s/it] 42%|██████████████████████████████████▌                                                | 8370/20117 [5:16:50<7:28:22,  2.29s/it]                                                                                                                                 {'loss': 0.3039, 'grad_norm': 0.5050214529037476, 'learning_rate': 0.00012696292140720907, 'memory/max_active (GiB)': 19.67, 'memory/max_allocated (GiB)': 19.67, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 394.88, 'epoch': 0.83}
 42%|██████████████████████████████████▌                                                | 8370/20117 [5:16:50<7:28:22,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8371/20117 [5:16:52<7:28:47,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8372/20117 [5:16:55<7:24:46,  2.27s/it] 42%|██████████████████████████████████▌                                                | 8373/20117 [5:16:57<7:50:35,  2.40s/it] 42%|██████████████████████████████████▌                                                | 8374/20117 [5:17:00<7:43:46,  2.37s/it] 42%|██████████████████████████████████▌                                                | 8375/20117 [5:17:02<7:38:04,  2.34s/it] 42%|██████████████████████████████████▌                                                | 8376/20117 [5:17:04<7:39:46,  2.35s/it] 42%|██████████████████████████████████▌                                                | 8377/20117 [5:17:07<7:35:46,  2.33s/it] 42%|██████████████████████████████████▌                                                | 8378/20117 [5:17:09<7:27:59,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8379/20117 [5:17:11<7:28:36,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8380/20117 [5:17:13<7:23:22,  2.27s/it]                                                                                                                                 {'loss': 0.1876, 'grad_norm': 0.3680170178413391, 'learning_rate': 0.00012681175465933822, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 310.52, 'epoch': 0.83}
 42%|██████████████████████████████████▌                                                | 8380/20117 [5:17:13<7:23:22,  2.27s/it] 42%|██████████████████████████████████▌                                                | 8381/20117 [5:17:16<7:24:47,  2.27s/it] 42%|██████████████████████████████████▌                                                | 8382/20117 [5:17:18<7:30:32,  2.30s/it] 42%|██████████████████████████████████▌                                                | 8383/20117 [5:17:20<7:25:50,  2.28s/it] 42%|██████████████████████████████████▌                                                | 8384/20117 [5:17:23<7:29:24,  2.30s/it] 42%|██████████████████████████████████▌                                                | 8385/20117 [5:17:25<7:24:46,  2.27s/it] 42%|██████████████████████████████████▌                                                | 8386/20117 [5:17:27<7:27:21,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8387/20117 [5:17:29<7:29:29,  2.30s/it] 42%|██████████████████████████████████▌                                                | 8388/20117 [5:17:32<7:30:19,  2.30s/it] 42%|██████████████████████████████████▌                                                | 8389/20117 [5:17:34<7:31:07,  2.31s/it] 42%|██████████████████████████████████▌                                                | 8390/20117 [5:17:36<7:28:20,  2.29s/it]                                                                                                                                 {'loss': 0.2137, 'grad_norm': 0.6084402799606323, 'learning_rate': 0.00012666052186844883, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 308.84, 'epoch': 0.83}
 42%|██████████████████████████████████▌                                                | 8390/20117 [5:17:36<7:28:20,  2.29s/it] 42%|██████████████████████████████████▌                                                | 8391/20117 [5:17:39<7:31:00,  2.31s/it] 42%|██████████████████████████████████▌                                                | 8392/20117 [5:17:41<7:29:57,  2.30s/it] 42%|██████████████████████████████████▋                                                | 8393/20117 [5:17:43<7:28:07,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8394/20117 [5:17:45<7:26:14,  2.28s/it] 42%|██████████████████████████████████▋                                                | 8395/20117 [5:17:48<7:26:18,  2.28s/it] 42%|██████████████████████████████████▋                                                | 8396/20117 [5:17:50<7:28:28,  2.30s/it] 42%|██████████████████████████████████▋                                                | 8397/20117 [5:17:52<7:30:05,  2.30s/it] 42%|██████████████████████████████████▋                                                | 8398/20117 [5:17:55<7:29:18,  2.30s/it] 42%|██████████████████████████████████▋                                                | 8399/20117 [5:17:57<7:29:46,  2.30s/it] 42%|██████████████████████████████████▋                                                | 8400/20117 [5:17:59<7:26:39,  2.29s/it]                                                                                                                                 {'loss': 0.2423, 'grad_norm': 0.41174978017807007, 'learning_rate': 0.00012650922340705925, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 389.04, 'epoch': 0.84}
 42%|██████████████████████████████████▋                                                | 8400/20117 [5:17:59<7:26:39,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8401/20117 [5:18:01<7:23:58,  2.27s/it] 42%|██████████████████████████████████▋                                                | 8402/20117 [5:18:04<7:22:38,  2.27s/it] 42%|██████████████████████████████████▋                                                | 8403/20117 [5:18:06<7:25:16,  2.28s/it] 42%|██████████████████████████████████▋                                                | 8404/20117 [5:18:08<7:24:27,  2.28s/it] 42%|██████████████████████████████████▋                                                | 8405/20117 [5:18:11<7:23:30,  2.27s/it] 42%|██████████████████████████████████▋                                                | 8406/20117 [5:18:13<7:28:51,  2.30s/it] 42%|██████████████████████████████████▋                                                | 8407/20117 [5:18:15<7:27:28,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8408/20117 [5:18:18<7:27:13,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8409/20117 [5:18:20<7:30:19,  2.31s/it] 42%|██████████████████████████████████▋                                                | 8410/20117 [5:18:22<7:26:21,  2.29s/it]                                                                                                                                 {'loss': 0.2144, 'grad_norm': 0.3895050883293152, 'learning_rate': 0.0001263578596478496, 'memory/max_active (GiB)': 19.99, 'memory/max_allocated (GiB)': 19.99, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 325.56, 'epoch': 0.84}
 42%|██████████████████████████████████▋                                                | 8410/20117 [5:18:22<7:26:21,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8411/20117 [5:18:24<7:25:05,  2.28s/it] 42%|██████████████████████████████████▋                                                | 8412/20117 [5:18:27<7:23:49,  2.28s/it] 42%|██████████████████████████████████▋                                                | 8413/20117 [5:18:29<7:19:37,  2.25s/it] 42%|██████████████████████████████████▋                                                | 8414/20117 [5:18:31<7:26:29,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8415/20117 [5:18:34<7:26:52,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8416/20117 [5:18:36<7:28:42,  2.30s/it] 42%|██████████████████████████████████▋                                                | 8417/20117 [5:18:38<7:32:28,  2.32s/it] 42%|██████████████████████████████████▋                                                | 8418/20117 [5:18:40<7:29:35,  2.31s/it] 42%|██████████████████████████████████▋                                                | 8419/20117 [5:18:43<7:26:17,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8420/20117 [5:18:45<7:27:11,  2.29s/it]                                                                                                                                 {'loss': 0.2292, 'grad_norm': 0.2492019683122635, 'learning_rate': 0.00012620643096366077, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 312.76, 'epoch': 0.84}
 42%|██████████████████████████████████▋                                                | 8420/20117 [5:18:45<7:27:11,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8421/20117 [5:18:47<7:25:44,  2.29s/it] 42%|██████████████████████████████████▋                                                | 8422/20117 [5:18:50<7:27:43,  2.30s/it] 42%|██████████████████████████████████▊                                                | 8423/20117 [5:18:52<7:22:43,  2.27s/it] 42%|██████████████████████████████████▊                                                | 8424/20117 [5:18:54<7:22:49,  2.27s/it] 42%|██████████████████████████████████▊                                                | 8425/20117 [5:18:57<7:47:55,  2.40s/it] 42%|██████████████████████████████████▊                                                | 8426/20117 [5:18:59<7:43:33,  2.38s/it] 42%|██████████████████████████████████▊                                                | 8427/20117 [5:19:01<7:42:05,  2.37s/it] 42%|██████████████████████████████████▊                                                | 8428/20117 [5:19:04<7:37:11,  2.35s/it] 42%|██████████████████████████████████▊                                                | 8429/20117 [5:19:06<7:31:46,  2.32s/it] 42%|██████████████████████████████████▊                                                | 8430/20117 [5:19:08<7:28:39,  2.30s/it]                                                                                                                                 {'loss': 0.2566, 'grad_norm': 0.5276951789855957, 'learning_rate': 0.0001260549377274936, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 341.75, 'epoch': 0.84}
 42%|██████████████████████████████████▊                                                | 8430/20117 [5:19:08<7:28:39,  2.30s/it] 42%|██████████████████████████████████▊                                                | 8431/20117 [5:19:11<7:25:11,  2.29s/it] 42%|██████████████████████████████████▊                                                | 8432/20117 [5:19:13<7:23:06,  2.28s/it] 42%|██████████████████████████████████▊                                                | 8433/20117 [5:19:15<7:28:10,  2.30s/it] 42%|██████████████████████████████████▊                                                | 8434/20117 [5:19:17<7:22:52,  2.27s/it] 42%|██████████████████████████████████▊                                                | 8435/20117 [5:19:20<7:20:44,  2.26s/it] 42%|██████████████████████████████████▊                                                | 8436/20117 [5:19:22<7:21:35,  2.27s/it] 42%|██████████████████████████████████▊                                                | 8437/20117 [5:19:24<7:20:15,  2.26s/it] 42%|██████████████████████████████████▊                                                | 8438/20117 [5:19:26<7:20:11,  2.26s/it] 42%|██████████████████████████████████▊                                                | 8439/20117 [5:19:29<7:19:16,  2.26s/it] 42%|██████████████████████████████████▊                                                | 8440/20117 [5:19:31<7:19:36,  2.26s/it]                                                                                                                                 {'loss': 0.2108, 'grad_norm': 0.3949599266052246, 'learning_rate': 0.00012590338031250796, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 370.88, 'epoch': 0.84}
 42%|██████████████████████████████████▊                                                | 8440/20117 [5:19:31<7:19:36,  2.26s/it] 42%|██████████████████████████████████▊                                                | 8441/20117 [5:19:33<7:20:25,  2.26s/it] 42%|██████████████████████████████████▊                                                | 8442/20117 [5:19:36<7:24:36,  2.28s/it] 42%|██████████████████████████████████▊                                                | 8443/20117 [5:19:38<7:24:02,  2.28s/it] 42%|██████████████████████████████████▊                                                | 8444/20117 [5:19:40<7:26:36,  2.30s/it] 42%|██████████████████████████████████▊                                                | 8445/20117 [5:19:42<7:23:12,  2.28s/it] 42%|██████████████████████████████████▊                                                | 8446/20117 [5:19:45<7:26:30,  2.30s/it] 42%|██████████████████████████████████▊                                                | 8447/20117 [5:19:47<7:25:28,  2.29s/it] 42%|██████████████████████████████████▊                                                | 8448/20117 [5:19:49<7:22:54,  2.28s/it] 42%|██████████████████████████████████▊                                                | 8449/20117 [5:19:52<7:27:51,  2.30s/it] 42%|██████████████████████████████████▊                                                | 8450/20117 [5:19:54<7:25:19,  2.29s/it]                                                                                                                                 {'loss': 0.1811, 'grad_norm': 0.3476475179195404, 'learning_rate': 0.00012575175909202186, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.59, 'epoch': 0.84}
 42%|██████████████████████████████████▊                                                | 8450/20117 [5:19:54<7:25:19,  2.29s/it] 42%|██████████████████████████████████▊                                                | 8451/20117 [5:19:56<7:22:02,  2.27s/it] 42%|██████████████████████████████████▊                                                | 8452/20117 [5:19:58<7:23:21,  2.28s/it] 42%|██████████████████████████████████▉                                                | 8453/20117 [5:20:01<7:20:49,  2.27s/it] 42%|██████████████████████████████████▉                                                | 8454/20117 [5:20:03<7:22:25,  2.28s/it] 42%|██████████████████████████████████▉                                                | 8455/20117 [5:20:05<7:19:41,  2.26s/it] 42%|██████████████████████████████████▉                                                | 8456/20117 [5:20:07<7:16:29,  2.25s/it] 42%|██████████████████████████████████▉                                                | 8457/20117 [5:20:10<7:17:45,  2.25s/it] 42%|██████████████████████████████████▉                                                | 8458/20117 [5:20:12<7:15:16,  2.24s/it] 42%|██████████████████████████████████▉                                                | 8459/20117 [5:20:14<7:15:36,  2.24s/it] 42%|██████████████████████████████████▉                                                | 8460/20117 [5:20:16<7:16:35,  2.25s/it]                                                                                                                                 {'loss': 0.2144, 'grad_norm': 0.26099368929862976, 'learning_rate': 0.00012560007443951032, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 353.23, 'epoch': 0.84}
 42%|██████████████████████████████████▉                                                | 8460/20117 [5:20:16<7:16:35,  2.25s/it] 42%|██████████████████████████████████▉                                                | 8461/20117 [5:20:19<7:19:01,  2.26s/it] 42%|██████████████████████████████████▉                                                | 8462/20117 [5:20:21<7:19:37,  2.26s/it] 42%|██████████████████████████████████▉                                                | 8463/20117 [5:20:23<7:18:36,  2.26s/it] 42%|██████████████████████████████████▉                                                | 8464/20117 [5:20:25<7:22:39,  2.28s/it] 42%|██████████████████████████████████▉                                                | 8465/20117 [5:20:28<7:26:54,  2.30s/it] 42%|██████████████████████████████████▉                                                | 8466/20117 [5:20:30<7:31:23,  2.32s/it] 42%|██████████████████████████████████▉                                                | 8467/20117 [5:20:33<7:34:20,  2.34s/it] 42%|██████████████████████████████████▉                                                | 8468/20117 [5:20:35<7:36:33,  2.35s/it] 42%|██████████████████████████████████▉                                                | 8469/20117 [5:20:37<7:33:11,  2.33s/it] 42%|██████████████████████████████████▉                                                | 8470/20117 [5:20:39<7:28:22,  2.31s/it]                                                                                                                                 {'loss': 0.1781, 'grad_norm': 0.4799298346042633, 'learning_rate': 0.00012544832672860474, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 360.11, 'epoch': 0.84}
 42%|██████████████████████████████████▉                                                | 8470/20117 [5:20:39<7:28:22,  2.31s/it] 42%|██████████████████████████████████▉                                                | 8471/20117 [5:20:42<7:26:57,  2.30s/it] 42%|██████████████████████████████████▉                                                | 8472/20117 [5:20:44<7:26:55,  2.30s/it] 42%|██████████████████████████████████▉                                                | 8473/20117 [5:20:46<7:25:46,  2.30s/it] 42%|██████████████████████████████████▉                                                | 8474/20117 [5:20:49<7:22:29,  2.28s/it] 42%|██████████████████████████████████▉                                                | 8475/20117 [5:20:51<7:27:23,  2.31s/it] 42%|██████████████████████████████████▉                                                | 8476/20117 [5:20:53<7:28:14,  2.31s/it] 42%|██████████████████████████████████▉                                                | 8477/20117 [5:20:56<7:47:33,  2.41s/it] 42%|██████████████████████████████████▉                                                | 8478/20117 [5:20:58<7:36:05,  2.35s/it] 42%|██████████████████████████████████▉                                                | 8479/20117 [5:21:00<7:27:39,  2.31s/it] 42%|██████████████████████████████████▉                                                | 8480/20117 [5:21:02<7:18:06,  2.26s/it]                                                                                                                                 {'loss': 0.209, 'grad_norm': 0.46986958384513855, 'learning_rate': 0.0001252965163330918, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 393.35, 'epoch': 0.84}
 42%|██████████████████████████████████▉                                                | 8480/20117 [5:21:02<7:18:06,  2.26s/it] 42%|██████████████████████████████████▉                                                | 8481/20117 [5:21:05<7:15:35,  2.25s/it] 42%|██████████████████████████████████▉                                                | 8482/20117 [5:21:07<7:12:35,  2.23s/it] 42%|██████████████████████████████████▉                                                | 8483/20117 [5:21:09<7:15:27,  2.25s/it] 42%|███████████████████████████████████                                                | 8484/20117 [5:21:12<7:24:15,  2.29s/it] 42%|███████████████████████████████████                                                | 8485/20117 [5:21:14<7:28:21,  2.31s/it] 42%|███████████████████████████████████                                                | 8486/20117 [5:21:16<7:31:10,  2.33s/it] 42%|███████████████████████████████████                                                | 8487/20117 [5:21:19<7:40:14,  2.37s/it] 42%|███████████████████████████████████                                                | 8488/20117 [5:21:21<7:39:32,  2.37s/it] 42%|███████████████████████████████████                                                | 8489/20117 [5:21:23<7:34:19,  2.34s/it] 42%|███████████████████████████████████                                                | 8490/20117 [5:21:26<7:34:21,  2.34s/it]                                                                                                                                 {'loss': 0.2061, 'grad_norm': 1.179408311843872, 'learning_rate': 0.00012514464362691258, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 377.29, 'epoch': 0.84}
 42%|███████████████████████████████████                                                | 8490/20117 [5:21:26<7:34:21,  2.34s/it] 42%|███████████████████████████████████                                                | 8491/20117 [5:21:28<7:32:49,  2.34s/it] 42%|███████████████████████████████████                                                | 8492/20117 [5:21:30<7:22:18,  2.28s/it] 42%|███████████████████████████████████                                                | 8493/20117 [5:21:32<7:16:10,  2.25s/it] 42%|███████████████████████████████████                                                | 8494/20117 [5:21:35<7:09:05,  2.22s/it] 42%|███████████████████████████████████                                                | 8495/20117 [5:21:37<7:08:13,  2.21s/it] 42%|███████████████████████████████████                                                | 8496/20117 [5:21:39<7:19:27,  2.27s/it] 42%|███████████████████████████████████                                                | 8497/20117 [5:21:42<7:28:15,  2.31s/it] 42%|███████████████████████████████████                                                | 8498/20117 [5:21:44<7:26:39,  2.31s/it] 42%|███████████████████████████████████                                                | 8499/20117 [5:21:46<7:24:28,  2.30s/it] 42%|███████████████████████████████████                                                | 8500/20117 [5:21:48<7:22:10,  2.28s/it]                                                                                                                                 {'loss': 0.2643, 'grad_norm': 0.2782179117202759, 'learning_rate': 0.0001249927089841617, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 352.65, 'epoch': 0.85}
 42%|███████████████████████████████████                                                | 8500/20117 [5:21:48<7:22:10,  2.28s/it] 42%|███████████████████████████████████                                                | 8501/20117 [5:21:51<7:21:45,  2.28s/it] 42%|███████████████████████████████████                                                | 8502/20117 [5:21:53<7:20:24,  2.28s/it] 42%|███████████████████████████████████                                                | 8503/20117 [5:21:55<7:17:52,  2.26s/it] 42%|███████████████████████████████████                                                | 8504/20117 [5:21:57<7:18:08,  2.26s/it] 42%|███████████████████████████████████                                                | 8505/20117 [5:22:00<7:14:55,  2.25s/it] 42%|███████████████████████████████████                                                | 8506/20117 [5:22:02<7:15:03,  2.25s/it] 42%|███████████████████████████████████                                                | 8507/20117 [5:22:04<7:13:12,  2.24s/it] 42%|███████████████████████████████████                                                | 8508/20117 [5:22:06<7:10:49,  2.23s/it] 42%|███████████████████████████████████                                                | 8509/20117 [5:22:09<7:15:06,  2.25s/it] 42%|███████████████████████████████████                                                | 8510/20117 [5:22:11<7:13:37,  2.24s/it]                                                                                                                                 {'loss': 0.2086, 'grad_norm': 0.5103908181190491, 'learning_rate': 0.00012484071277908622, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 313.7, 'epoch': 0.85}
 42%|███████████████████████████████████                                                | 8510/20117 [5:22:11<7:13:37,  2.24s/it] 42%|███████████████████████████████████                                                | 8511/20117 [5:22:13<7:14:04,  2.24s/it] 42%|███████████████████████████████████                                                | 8512/20117 [5:22:15<7:15:00,  2.25s/it] 42%|███████████████████████████████████                                                | 8513/20117 [5:22:18<7:13:05,  2.24s/it] 42%|███████████████████████████████████▏                                               | 8514/20117 [5:22:20<7:15:00,  2.25s/it] 42%|███████████████████████████████████▏                                               | 8515/20117 [5:22:22<7:16:19,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8516/20117 [5:22:24<7:16:17,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8517/20117 [5:22:27<7:17:22,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8518/20117 [5:22:29<7:15:27,  2.25s/it] 42%|███████████████████████████████████▏                                               | 8519/20117 [5:22:31<7:17:57,  2.27s/it] 42%|███████████████████████████████████▏                                               | 8520/20117 [5:22:33<7:19:07,  2.27s/it]                                                                                                                                 {'loss': 0.2545, 'grad_norm': 0.34362590312957764, 'learning_rate': 0.000124688655386085, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 357.23, 'epoch': 0.85}
 42%|███████████████████████████████████▏                                               | 8520/20117 [5:22:33<7:19:07,  2.27s/it] 42%|███████████████████████████████████▏                                               | 8521/20117 [5:22:36<7:15:58,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8522/20117 [5:22:38<7:13:24,  2.24s/it] 42%|███████████████████████████████████▏                                               | 8523/20117 [5:22:40<7:14:56,  2.25s/it] 42%|███████████████████████████████████▏                                               | 8524/20117 [5:22:42<7:17:06,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8525/20117 [5:22:45<7:16:25,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8526/20117 [5:22:47<7:17:50,  2.27s/it] 42%|███████████████████████████████████▏                                               | 8527/20117 [5:22:49<7:13:02,  2.24s/it] 42%|███████████████████████████████████▏                                               | 8528/20117 [5:22:51<7:14:36,  2.25s/it] 42%|███████████████████████████████████▏                                               | 8529/20117 [5:22:54<7:16:08,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8530/20117 [5:22:56<7:35:36,  2.36s/it]                                                                                                                                 {'loss': 0.2191, 'grad_norm': 0.5372098088264465, 'learning_rate': 0.00012453653717970747, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 275.82, 'epoch': 0.85}
 42%|███████████████████████████████████▏                                               | 8530/20117 [5:22:56<7:35:36,  2.36s/it] 42%|███████████████████████████████████▏                                               | 8531/20117 [5:22:59<7:28:51,  2.32s/it] 42%|███████████████████████████████████▏                                               | 8532/20117 [5:23:01<7:26:09,  2.31s/it] 42%|███████████████████████████████████▏                                               | 8533/20117 [5:23:03<7:24:56,  2.30s/it] 42%|███████████████████████████████████▏                                               | 8534/20117 [5:23:05<7:24:44,  2.30s/it] 42%|███████████████████████████████████▏                                               | 8535/20117 [5:23:08<7:20:40,  2.28s/it] 42%|███████████████████████████████████▏                                               | 8536/20117 [5:23:10<7:18:27,  2.27s/it] 42%|███████████████████████████████████▏                                               | 8537/20117 [5:23:12<7:16:47,  2.26s/it] 42%|███████████████████████████████████▏                                               | 8538/20117 [5:23:14<7:13:40,  2.25s/it] 42%|███████████████████████████████████▏                                               | 8539/20117 [5:23:17<7:12:58,  2.24s/it] 42%|███████████████████████████████████▏                                               | 8540/20117 [5:23:19<7:10:39,  2.23s/it]                                                                                                                                 {'loss': 0.2291, 'grad_norm': 0.21182872354984283, 'learning_rate': 0.00012438435853465296, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 394.59, 'epoch': 0.85}
 42%|███████████████████████████████████▏                                               | 8540/20117 [5:23:19<7:10:39,  2.23s/it] 42%|███████████████████████████████████▏                                               | 8541/20117 [5:23:21<7:08:59,  2.22s/it] 42%|███████████████████████████████████▏                                               | 8542/20117 [5:23:23<7:08:30,  2.22s/it] 42%|███████████████████████████████████▏                                               | 8543/20117 [5:23:25<7:10:12,  2.23s/it] 42%|███████████████████████████████████▎                                               | 8544/20117 [5:23:28<7:13:08,  2.25s/it] 42%|███████████████████████████████████▎                                               | 8545/20117 [5:23:30<7:20:28,  2.28s/it] 42%|███████████████████████████████████▎                                               | 8546/20117 [5:23:32<7:15:40,  2.26s/it] 42%|███████████████████████████████████▎                                               | 8547/20117 [5:23:35<7:13:31,  2.25s/it] 42%|███████████████████████████████████▎                                               | 8548/20117 [5:23:37<7:14:32,  2.25s/it] 42%|███████████████████████████████████▎                                               | 8549/20117 [5:23:39<7:12:17,  2.24s/it] 43%|███████████████████████████████████▎                                               | 8550/20117 [5:23:41<7:13:50,  2.25s/it]                                                                                                                                 {'loss': 0.2142, 'grad_norm': 0.4921363890171051, 'learning_rate': 0.0001242321198257696, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 342.56, 'epoch': 0.85}
 43%|███████████████████████████████████▎                                               | 8550/20117 [5:23:41<7:13:50,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8551/20117 [5:23:44<7:13:27,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8552/20117 [5:23:46<7:09:42,  2.23s/it] 43%|███████████████████████████████████▎                                               | 8553/20117 [5:23:48<7:10:35,  2.23s/it] 43%|███████████████████████████████████▎                                               | 8554/20117 [5:23:50<7:12:50,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8555/20117 [5:23:52<7:10:34,  2.23s/it] 43%|███████████████████████████████████▎                                               | 8556/20117 [5:23:55<7:10:20,  2.23s/it] 43%|███████████████████████████████████▎                                               | 8557/20117 [5:23:57<7:14:43,  2.26s/it] 43%|███████████████████████████████████▎                                               | 8558/20117 [5:23:59<7:16:17,  2.26s/it] 43%|███████████████████████████████████▎                                               | 8559/20117 [5:24:02<7:14:07,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8560/20117 [5:24:04<7:13:29,  2.25s/it]                                                                                                                                 {'loss': 0.2034, 'grad_norm': 0.3340144157409668, 'learning_rate': 0.00012407982142805356, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 381.5, 'epoch': 0.85}
 43%|███████████████████████████████████▎                                               | 8560/20117 [5:24:04<7:13:29,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8561/20117 [5:24:06<7:14:17,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8562/20117 [5:24:08<7:14:28,  2.26s/it] 43%|███████████████████████████████████▎                                               | 8563/20117 [5:24:11<7:17:18,  2.27s/it] 43%|███████████████████████████████████▎                                               | 8564/20117 [5:24:13<7:16:09,  2.27s/it] 43%|███████████████████████████████████▎                                               | 8565/20117 [5:24:15<7:12:29,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8566/20117 [5:24:17<7:10:58,  2.24s/it] 43%|███████████████████████████████████▎                                               | 8567/20117 [5:24:20<7:13:38,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8568/20117 [5:24:22<7:13:50,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8569/20117 [5:24:24<7:13:53,  2.25s/it] 43%|███████████████████████████████████▎                                               | 8570/20117 [5:24:26<7:14:20,  2.26s/it]                                                                                                                                 {'loss': 0.2031, 'grad_norm': 0.45434701442718506, 'learning_rate': 0.00012392746371664797, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 337.22, 'epoch': 0.85}
 43%|███████████████████████████████████▎                                               | 8570/20117 [5:24:26<7:14:20,  2.26s/it] 43%|███████████████████████████████████▎                                               | 8571/20117 [5:24:29<7:14:50,  2.26s/it] 43%|███████████████████████████████████▎                                               | 8572/20117 [5:24:31<7:16:22,  2.27s/it] 43%|███████████████████████████████████▎                                               | 8573/20117 [5:24:33<7:19:38,  2.29s/it] 43%|███████████████████████████████████▍                                               | 8574/20117 [5:24:35<7:20:35,  2.29s/it] 43%|███████████████████████████████████▍                                               | 8575/20117 [5:24:38<7:21:02,  2.29s/it] 43%|███████████████████████████████████▍                                               | 8576/20117 [5:24:40<7:22:58,  2.30s/it] 43%|███████████████████████████████████▍                                               | 8577/20117 [5:24:42<7:20:26,  2.29s/it] 43%|███████████████████████████████████▍                                               | 8578/20117 [5:24:45<7:17:49,  2.28s/it] 43%|███████████████████████████████████▍                                               | 8579/20117 [5:24:47<7:15:14,  2.26s/it] 43%|███████████████████████████████████▍                                               | 8580/20117 [5:24:49<7:15:52,  2.27s/it]                                                                                                                                 {'loss': 0.1807, 'grad_norm': 0.2646021544933319, 'learning_rate': 0.00012377504706684206, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 369.87, 'epoch': 0.85}
 43%|███████████████████████████████████▍                                               | 8580/20117 [5:24:49<7:15:52,  2.27s/it] 43%|███████████████████████████████████▍                                               | 8581/20117 [5:24:51<7:19:17,  2.28s/it] 43%|███████████████████████████████████▍                                               | 8582/20117 [5:24:54<7:18:41,  2.28s/it] 43%|███████████████████████████████████▍                                               | 8583/20117 [5:24:56<7:19:37,  2.29s/it] 43%|███████████████████████████████████▍                                               | 8584/20117 [5:24:59<7:38:34,  2.39s/it] 43%|███████████████████████████████████▍                                               | 8585/20117 [5:25:01<7:30:04,  2.34s/it] 43%|███████████████████████████████████▍                                               | 8586/20117 [5:25:03<7:25:42,  2.32s/it] 43%|███████████████████████████████████▍                                               | 8587/20117 [5:25:05<7:23:59,  2.31s/it] 43%|███████████████████████████████████▍                                               | 8588/20117 [5:25:08<7:21:47,  2.30s/it] 43%|███████████████████████████████████▍                                               | 8589/20117 [5:25:10<7:26:23,  2.32s/it] 43%|███████████████████████████████████▍                                               | 8590/20117 [5:25:12<7:22:13,  2.30s/it]                                                                                                                                 {'loss': 0.2258, 'grad_norm': 0.4125272333621979, 'learning_rate': 0.00012362257185407022, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 374.23, 'epoch': 0.85}
 43%|███████████████████████████████████▍                                               | 8590/20117 [5:25:12<7:22:13,  2.30s/it] 43%|███████████████████████████████████▍                                               | 8591/20117 [5:25:15<7:26:43,  2.33s/it] 43%|███████████████████████████████████▍                                               | 8592/20117 [5:25:17<7:24:46,  2.32s/it] 43%|███████████████████████████████████▍                                               | 8593/20117 [5:25:19<7:23:24,  2.31s/it] 43%|███████████████████████████████████▍                                               | 8594/20117 [5:25:22<7:22:33,  2.30s/it] 43%|███████████████████████████████████▍                                               | 8595/20117 [5:25:24<7:19:56,  2.29s/it] 43%|███████████████████████████████████▍                                               | 8596/20117 [5:25:26<7:20:05,  2.29s/it] 43%|███████████████████████████████████▍                                               | 8597/20117 [5:25:28<7:18:33,  2.28s/it] 43%|███████████████████████████████████▍                                               | 8598/20117 [5:25:31<7:17:14,  2.28s/it] 43%|███████████████████████████████████▍                                               | 8599/20117 [5:25:33<7:16:44,  2.28s/it] 43%|███████████████████████████████████▍                                               | 8600/20117 [5:25:35<7:17:13,  2.28s/it]                                                                                                                                 {'loss': 0.1624, 'grad_norm': 0.3133401572704315, 'learning_rate': 0.00012347003845391118, 'memory/max_active (GiB)': 19.98, 'memory/max_allocated (GiB)': 19.98, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 336.61, 'epoch': 0.85}
 43%|███████████████████████████████████▍                                               | 8600/20117 [5:25:35<7:17:13,  2.28s/it] 43%|███████████████████████████████████▍                                               | 8601/20117 [5:25:37<7:14:50,  2.27s/it] 43%|███████████████████████████████████▍                                               | 8602/20117 [5:25:40<7:14:01,  2.26s/it] 43%|███████████████████████████████████▍                                               | 8603/20117 [5:25:42<7:15:16,  2.27s/it] 43%|███████████████████████████████████▍                                               | 8604/20117 [5:25:44<7:14:03,  2.26s/it] 43%|███████████████████████████████████▌                                               | 8605/20117 [5:25:47<7:16:29,  2.27s/it] 43%|███████████████████████████████████▌                                               | 8606/20117 [5:25:49<7:17:20,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8607/20117 [5:25:51<7:15:04,  2.27s/it] 43%|███████████████████████████████████▌                                               | 8608/20117 [5:25:53<7:18:01,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8609/20117 [5:25:56<7:18:19,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8610/20117 [5:25:58<7:22:55,  2.31s/it]                                                                                                                                 {'loss': 0.207, 'grad_norm': 0.23082919418811798, 'learning_rate': 0.00012331744724208694, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 316.78, 'epoch': 0.86}
 43%|███████████████████████████████████▌                                               | 8610/20117 [5:25:58<7:22:55,  2.31s/it] 43%|███████████████████████████████████▌                                               | 8611/20117 [5:26:00<7:25:37,  2.32s/it] 43%|███████████████████████████████████▌                                               | 8612/20117 [5:26:03<7:25:00,  2.32s/it] 43%|███████████████████████████████████▌                                               | 8613/20117 [5:26:05<7:25:34,  2.32s/it] 43%|███████████████████████████████████▌                                               | 8614/20117 [5:26:07<7:21:00,  2.30s/it] 43%|███████████████████████████████████▌                                               | 8615/20117 [5:26:10<7:21:33,  2.30s/it] 43%|███████████████████████████████████▌                                               | 8616/20117 [5:26:12<7:19:44,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8617/20117 [5:26:14<7:18:11,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8618/20117 [5:26:17<7:21:20,  2.30s/it] 43%|███████████████████████████████████▌                                               | 8619/20117 [5:26:19<7:18:17,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8620/20117 [5:26:21<7:18:32,  2.29s/it]                                                                                                                                 {'loss': 0.2465, 'grad_norm': 0.45513761043548584, 'learning_rate': 0.00012316479859446187, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 355.67, 'epoch': 0.86}
 43%|███████████████████████████████████▌                                               | 8620/20117 [5:26:21<7:18:32,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8621/20117 [5:26:23<7:17:42,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8622/20117 [5:26:26<7:16:53,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8623/20117 [5:26:28<7:18:56,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8624/20117 [5:26:30<7:16:49,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8625/20117 [5:26:32<7:15:49,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8626/20117 [5:26:35<7:16:43,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8627/20117 [5:26:37<7:18:39,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8628/20117 [5:26:39<7:22:54,  2.31s/it] 43%|███████████████████████████████████▌                                               | 8629/20117 [5:26:42<7:24:56,  2.32s/it] 43%|███████████████████████████████████▌                                               | 8630/20117 [5:26:44<7:21:57,  2.31s/it]                                                                                                                                 {'loss': 0.2563, 'grad_norm': 0.4129338562488556, 'learning_rate': 0.00012301209288704184, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 323.02, 'epoch': 0.86}
 43%|███████████████████████████████████▌                                               | 8630/20117 [5:26:44<7:21:57,  2.31s/it] 43%|███████████████████████████████████▌                                               | 8631/20117 [5:26:46<7:15:28,  2.27s/it] 43%|███████████████████████████████████▌                                               | 8632/20117 [5:26:49<7:17:05,  2.28s/it] 43%|███████████████████████████████████▌                                               | 8633/20117 [5:26:51<7:19:03,  2.29s/it] 43%|███████████████████████████████████▌                                               | 8634/20117 [5:26:53<7:15:18,  2.27s/it] 43%|███████████████████████████████████▋                                               | 8635/20117 [5:26:55<7:15:59,  2.28s/it] 43%|███████████████████████████████████▋                                               | 8636/20117 [5:26:58<7:34:09,  2.37s/it] 43%|███████████████████████████████████▋                                               | 8637/20117 [5:27:00<7:26:12,  2.33s/it] 43%|███████████████████████████████████▋                                               | 8638/20117 [5:27:02<7:23:03,  2.32s/it] 43%|███████████████████████████████████▋                                               | 8639/20117 [5:27:05<7:17:33,  2.29s/it] 43%|███████████████████████████████████▋                                               | 8640/20117 [5:27:07<7:18:44,  2.29s/it]                                                                                                                                 {'loss': 0.154, 'grad_norm': 0.37343230843544006, 'learning_rate': 0.00012285933049597335, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 326.66, 'epoch': 0.86}
 43%|███████████████████████████████████▋                                               | 8640/20117 [5:27:07<7:18:44,  2.29s/it] 43%|███████████████████████████████████▋                                               | 8641/20117 [5:27:09<7:12:53,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8642/20117 [5:27:11<7:14:55,  2.27s/it] 43%|███████████████████████████████████▋                                               | 8643/20117 [5:27:14<7:15:53,  2.28s/it] 43%|███████████████████████████████████▋                                               | 8644/20117 [5:27:16<7:17:03,  2.29s/it] 43%|███████████████████████████████████▋                                               | 8645/20117 [5:27:18<7:12:09,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8646/20117 [5:27:21<7:17:35,  2.29s/it] 43%|███████████████████████████████████▋                                               | 8647/20117 [5:27:23<7:16:05,  2.28s/it] 43%|███████████████████████████████████▋                                               | 8648/20117 [5:27:25<7:13:11,  2.27s/it] 43%|███████████████████████████████████▋                                               | 8649/20117 [5:27:27<7:13:20,  2.27s/it] 43%|███████████████████████████████████▋                                               | 8650/20117 [5:27:30<7:12:55,  2.27s/it]                                                                                                                                 {'loss': 0.2135, 'grad_norm': 0.32739633321762085, 'learning_rate': 0.00012270651179754243, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 340.36, 'epoch': 0.86}
 43%|███████████████████████████████████▋                                               | 8650/20117 [5:27:30<7:12:55,  2.27s/it] 43%|███████████████████████████████████▋                                               | 8651/20117 [5:27:32<7:15:28,  2.28s/it] 43%|███████████████████████████████████▋                                               | 8652/20117 [5:27:34<7:14:52,  2.28s/it] 43%|███████████████████████████████████▋                                               | 8653/20117 [5:27:36<7:10:59,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8654/20117 [5:27:39<7:13:30,  2.27s/it] 43%|███████████████████████████████████▋                                               | 8655/20117 [5:27:41<7:13:24,  2.27s/it] 43%|███████████████████████████████████▋                                               | 8656/20117 [5:27:43<7:12:12,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8657/20117 [5:27:46<7:12:02,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8658/20117 [5:27:48<7:12:06,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8659/20117 [5:27:50<7:11:58,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8660/20117 [5:27:52<7:11:35,  2.26s/it]                                                                                                                                 {'loss': 0.2221, 'grad_norm': 0.5008605718612671, 'learning_rate': 0.0001225536371681738, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 295.75, 'epoch': 0.86}
 43%|███████████████████████████████████▋                                               | 8660/20117 [5:27:52<7:11:35,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8661/20117 [5:27:55<7:10:55,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8662/20117 [5:27:57<7:14:45,  2.28s/it] 43%|███████████████████████████████████▋                                               | 8663/20117 [5:27:59<7:12:03,  2.26s/it] 43%|███████████████████████████████████▋                                               | 8664/20117 [5:28:01<7:05:32,  2.23s/it] 43%|███████████████████████████████████▊                                               | 8665/20117 [5:28:04<7:06:19,  2.23s/it] 43%|███████████████████████████████████▊                                               | 8666/20117 [5:28:06<7:02:02,  2.21s/it] 43%|███████████████████████████████████▊                                               | 8667/20117 [5:28:08<6:58:43,  2.19s/it] 43%|███████████████████████████████████▊                                               | 8668/20117 [5:28:10<6:59:29,  2.20s/it] 43%|███████████████████████████████████▊                                               | 8669/20117 [5:28:12<7:02:55,  2.22s/it] 43%|███████████████████████████████████▊                                               | 8670/20117 [5:28:15<7:08:45,  2.25s/it]                                                                                                                                 {'loss': 0.2402, 'grad_norm': 2.8313498497009277, 'learning_rate': 0.00012240070698443, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.45, 'tokens_per_second_per_gpu': 343.07, 'epoch': 0.86}
 43%|███████████████████████████████████▊                                               | 8670/20117 [5:28:15<7:08:45,  2.25s/it] 43%|███████████████████████████████████▊                                               | 8671/20117 [5:28:17<7:18:30,  2.30s/it] 43%|███████████████████████████████████▊                                               | 8672/20117 [5:28:19<7:21:44,  2.32s/it] 43%|███████████████████████████████████▊                                               | 8673/20117 [5:28:22<7:19:09,  2.30s/it] 43%|███████████████████████████████████▊                                               | 8674/20117 [5:28:24<7:20:12,  2.31s/it] 43%|███████████████████████████████████▊                                               | 8675/20117 [5:28:26<7:19:36,  2.31s/it] 43%|███████████████████████████████████▊                                               | 8676/20117 [5:28:29<7:19:13,  2.30s/it] 43%|███████████████████████████████████▊                                               | 8677/20117 [5:28:31<7:25:23,  2.34s/it] 43%|███████████████████████████████████▊                                               | 8678/20117 [5:28:33<7:20:12,  2.31s/it] 43%|███████████████████████████████████▊                                               | 8679/20117 [5:28:35<7:13:30,  2.27s/it] 43%|███████████████████████████████████▊                                               | 8680/20117 [5:28:38<7:10:39,  2.26s/it]                                                                                                                                 {'loss': 0.2588, 'grad_norm': 3.499013662338257, 'learning_rate': 0.00012224772162301042, 'memory/max_active (GiB)': 21.58, 'memory/max_allocated (GiB)': 21.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.94, 'epoch': 0.86}
 43%|███████████████████████████████████▊                                               | 8680/20117 [5:28:38<7:10:39,  2.26s/it] 43%|███████████████████████████████████▊                                               | 8681/20117 [5:28:40<7:06:05,  2.24s/it] 43%|███████████████████████████████████▊                                               | 8682/20117 [5:28:42<7:13:37,  2.28s/it] 43%|███████████████████████████████████▊                                               | 8683/20117 [5:28:45<7:19:15,  2.31s/it] 43%|███████████████████████████████████▊                                               | 8684/20117 [5:28:47<7:19:36,  2.31s/it] 43%|███████████████████████████████████▊                                               | 8685/20117 [5:28:49<7:16:08,  2.29s/it] 43%|███████████████████████████████████▊                                               | 8686/20117 [5:28:52<7:20:24,  2.31s/it] 43%|███████████████████████████████████▊                                               | 8687/20117 [5:28:54<7:25:44,  2.34s/it] 43%|███████████████████████████████████▊                                               | 8688/20117 [5:28:56<7:21:53,  2.32s/it] 43%|███████████████████████████████████▊                                               | 8689/20117 [5:28:59<7:45:34,  2.44s/it] 43%|███████████████████████████████████▊                                               | 8690/20117 [5:29:01<7:33:39,  2.38s/it]                                                                                                                                 {'loss': 0.2485, 'grad_norm': 0.4069509208202362, 'learning_rate': 0.0001220946814607503, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.05, 'epoch': 0.86}
 43%|███████████████████████████████████▊                                               | 8690/20117 [5:29:01<7:33:39,  2.38s/it] 43%|███████████████████████████████████▊                                               | 8691/20117 [5:29:03<7:27:17,  2.35s/it] 43%|███████████████████████████████████▊                                               | 8692/20117 [5:29:06<7:24:04,  2.33s/it] 43%|███████████████████████████████████▊                                               | 8693/20117 [5:29:08<7:21:04,  2.32s/it] 43%|███████████████████████████████████▊                                               | 8694/20117 [5:29:10<7:19:06,  2.31s/it] 43%|███████████████████████████████████▊                                               | 8695/20117 [5:29:13<7:16:40,  2.29s/it] 43%|███████████████████████████████████▉                                               | 8696/20117 [5:29:15<7:14:07,  2.28s/it] 43%|███████████████████████████████████▉                                               | 8697/20117 [5:29:17<7:16:12,  2.29s/it] 43%|███████████████████████████████████▉                                               | 8698/20117 [5:29:19<7:14:22,  2.28s/it] 43%|███████████████████████████████████▉                                               | 8699/20117 [5:29:22<7:11:44,  2.27s/it] 43%|███████████████████████████████████▉                                               | 8700/20117 [5:29:24<7:11:12,  2.27s/it]                                                                                                                                 {'loss': 0.2119, 'grad_norm': 0.5977867841720581, 'learning_rate': 0.00012194158687461992, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.01, 'epoch': 0.86}
 43%|███████████████████████████████████▉                                               | 8700/20117 [5:29:24<7:11:12,  2.27s/it] 43%|███████████████████████████████████▉                                               | 8701/20117 [5:29:26<7:14:55,  2.29s/it] 43%|███████████████████████████████████▉                                               | 8702/20117 [5:29:28<7:11:59,  2.27s/it] 43%|███████████████████████████████████▉                                               | 8703/20117 [5:29:31<7:14:37,  2.28s/it] 43%|███████████████████████████████████▉                                               | 8704/20117 [5:29:33<7:10:16,  2.26s/it] 43%|███████████████████████████████████▉                                               | 8705/20117 [5:29:35<7:17:35,  2.30s/it] 43%|███████████████████████████████████▉                                               | 8706/20117 [5:29:38<7:19:54,  2.31s/it] 43%|███████████████████████████████████▉                                               | 8707/20117 [5:29:40<7:15:14,  2.29s/it] 43%|███████████████████████████████████▉                                               | 8708/20117 [5:29:42<7:14:54,  2.29s/it] 43%|███████████████████████████████████▉                                               | 8709/20117 [5:29:45<7:15:01,  2.29s/it] 43%|███████████████████████████████████▉                                               | 8710/20117 [5:29:47<7:13:54,  2.28s/it]                                                                                                                                 {'loss': 0.2719, 'grad_norm': 6.747653961181641, 'learning_rate': 0.00012178843824172361, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 390.15, 'epoch': 0.87}
 43%|███████████████████████████████████▉                                               | 8710/20117 [5:29:47<7:13:54,  2.28s/it] 43%|███████████████████████████████████▉                                               | 8711/20117 [5:29:49<7:20:39,  2.32s/it] 43%|███████████████████████████████████▉                                               | 8712/20117 [5:29:52<7:44:48,  2.45s/it] 43%|███████████████████████████████████▉                                               | 8713/20117 [5:29:55<7:53:52,  2.49s/it] 43%|███████████████████████████████████▉                                               | 8714/20117 [5:29:57<7:41:19,  2.43s/it] 43%|███████████████████████████████████▉                                               | 8715/20117 [5:29:59<7:33:26,  2.39s/it] 43%|███████████████████████████████████▉                                               | 8716/20117 [5:30:01<7:27:14,  2.35s/it] 43%|███████████████████████████████████▉                                               | 8717/20117 [5:30:04<7:21:15,  2.32s/it] 43%|███████████████████████████████████▉                                               | 8718/20117 [5:30:06<7:17:32,  2.30s/it] 43%|███████████████████████████████████▉                                               | 8719/20117 [5:30:08<7:15:20,  2.29s/it] 43%|███████████████████████████████████▉                                               | 8720/20117 [5:30:10<7:12:30,  2.28s/it]                                                                                                                                 {'loss': 0.1836, 'grad_norm': 0.46154770255088806, 'learning_rate': 0.00012163523593929884, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.33, 'epoch': 0.87}
 43%|███████████████████████████████████▉                                               | 8720/20117 [5:30:10<7:12:30,  2.28s/it] 43%|███████████████████████████████████▉                                               | 8721/20117 [5:30:13<7:21:56,  2.33s/it] 43%|███████████████████████████████████▉                                               | 8722/20117 [5:30:15<7:21:16,  2.32s/it] 43%|███████████████████████████████████▉                                               | 8723/20117 [5:30:17<7:15:53,  2.30s/it] 43%|███████████████████████████████████▉                                               | 8724/20117 [5:30:20<7:16:11,  2.30s/it] 43%|███████████████████████████████████▉                                               | 8725/20117 [5:30:22<7:16:43,  2.30s/it] 43%|████████████████████████████████████                                               | 8726/20117 [5:30:24<7:11:18,  2.27s/it] 43%|████████████████████████████████████                                               | 8727/20117 [5:30:27<7:16:20,  2.30s/it] 43%|████████████████████████████████████                                               | 8728/20117 [5:30:29<7:17:18,  2.30s/it] 43%|████████████████████████████████████                                               | 8729/20117 [5:30:31<7:17:25,  2.30s/it] 43%|████████████████████████████████████                                               | 8730/20117 [5:30:34<7:20:22,  2.32s/it]                                                                                                                                 {'loss': 0.2419, 'grad_norm': 0.272504985332489, 'learning_rate': 0.00012148198034471524, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.62, 'epoch': 0.87}
 43%|████████████████████████████████████                                               | 8730/20117 [5:30:34<7:20:22,  2.32s/it] 43%|████████████████████████████████████                                               | 8731/20117 [5:30:36<7:18:39,  2.31s/it] 43%|████████████████████████████████████                                               | 8732/20117 [5:30:38<7:19:23,  2.32s/it] 43%|████████████████████████████████████                                               | 8733/20117 [5:30:40<7:21:07,  2.32s/it] 43%|████████████████████████████████████                                               | 8734/20117 [5:30:43<7:17:46,  2.31s/it] 43%|████████████████████████████████████                                               | 8735/20117 [5:30:45<7:16:32,  2.30s/it] 43%|████████████████████████████████████                                               | 8736/20117 [5:30:47<7:18:18,  2.31s/it] 43%|████████████████████████████████████                                               | 8737/20117 [5:30:50<7:14:49,  2.29s/it] 43%|████████████████████████████████████                                               | 8738/20117 [5:30:52<7:14:56,  2.29s/it] 43%|████████████████████████████████████                                               | 8739/20117 [5:30:54<7:16:27,  2.30s/it] 43%|████████████████████████████████████                                               | 8740/20117 [5:30:57<7:16:08,  2.30s/it]                                                                                                                                 {'loss': 0.2379, 'grad_norm': 0.4848499596118927, 'learning_rate': 0.00012132867183547372, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 315.42, 'epoch': 0.87}
 43%|████████████████████████████████████                                               | 8740/20117 [5:30:57<7:16:08,  2.30s/it] 43%|████████████████████████████████████                                               | 8741/20117 [5:30:59<7:25:04,  2.35s/it] 43%|████████████████████████████████████                                               | 8742/20117 [5:31:01<7:22:29,  2.33s/it] 43%|████████████████████████████████████                                               | 8743/20117 [5:31:04<7:47:14,  2.46s/it] 43%|████████████████████████████████████                                               | 8744/20117 [5:31:06<7:33:39,  2.39s/it] 43%|████████████████████████████████████                                               | 8745/20117 [5:31:09<7:26:04,  2.35s/it] 43%|████████████████████████████████████                                               | 8746/20117 [5:31:11<7:26:51,  2.36s/it] 43%|████████████████████████████████████                                               | 8747/20117 [5:31:13<7:21:09,  2.33s/it] 43%|████████████████████████████████████                                               | 8748/20117 [5:31:16<7:24:06,  2.34s/it] 43%|████████████████████████████████████                                               | 8749/20117 [5:31:18<7:23:24,  2.34s/it] 43%|████████████████████████████████████                                               | 8750/20117 [5:31:20<7:18:45,  2.32s/it]                                                                                                                                 {'loss': 0.2358, 'grad_norm': 0.42282259464263916, 'learning_rate': 0.00012117531078920556, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 404.19, 'epoch': 0.87}
 43%|████████████████████████████████████                                               | 8750/20117 [5:31:20<7:18:45,  2.32s/it] 44%|████████████████████████████████████                                               | 8751/20117 [5:31:22<7:20:35,  2.33s/it] 44%|████████████████████████████████████                                               | 8752/20117 [5:31:25<7:16:21,  2.30s/it] 44%|████████████████████████████████████                                               | 8753/20117 [5:31:27<7:19:15,  2.32s/it] 44%|████████████████████████████████████                                               | 8754/20117 [5:31:29<7:18:14,  2.31s/it] 44%|████████████████████████████████████                                               | 8755/20117 [5:31:32<7:16:16,  2.30s/it] 44%|████████████████████████████████████▏                                              | 8756/20117 [5:31:34<7:12:45,  2.29s/it] 44%|████████████████████████████████████▏                                              | 8757/20117 [5:31:36<7:14:34,  2.30s/it] 44%|████████████████████████████████████▏                                              | 8758/20117 [5:31:38<7:11:23,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8759/20117 [5:31:41<7:11:40,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8760/20117 [5:31:43<7:09:48,  2.27s/it]                                                                                                                                 {'loss': 0.2602, 'grad_norm': 0.5138429999351501, 'learning_rate': 0.00012102189758367142, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.33, 'epoch': 0.87}
 44%|████████████████████████████████████▏                                              | 8760/20117 [5:31:43<7:09:48,  2.27s/it] 44%|████████████████████████████████████▏                                              | 8761/20117 [5:31:45<7:13:00,  2.29s/it] 44%|████████████████████████████████████▏                                              | 8762/20117 [5:31:48<7:15:56,  2.30s/it] 44%|████████████████████████████████████▏                                              | 8763/20117 [5:31:50<7:14:03,  2.29s/it] 44%|████████████████████████████████████▏                                              | 8764/20117 [5:31:52<7:15:53,  2.30s/it] 44%|████████████████████████████████████▏                                              | 8765/20117 [5:31:55<7:18:52,  2.32s/it] 44%|████████████████████████████████████▏                                              | 8766/20117 [5:31:57<7:15:02,  2.30s/it] 44%|████████████████████████████████████▏                                              | 8767/20117 [5:31:59<7:13:12,  2.29s/it] 44%|████████████████████████████████████▏                                              | 8768/20117 [5:32:01<7:14:07,  2.30s/it] 44%|████████████████████████████████████▏                                              | 8769/20117 [5:32:04<7:09:13,  2.27s/it] 44%|████████████████████████████████████▏                                              | 8770/20117 [5:32:06<7:11:13,  2.28s/it]                                                                                                                                 {'loss': 0.303, 'grad_norm': 0.2951951324939728, 'learning_rate': 0.00012086843259676041, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.42, 'epoch': 0.87}
 44%|████████████████████████████████████▏                                              | 8770/20117 [5:32:06<7:11:13,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8771/20117 [5:32:08<7:09:03,  2.27s/it] 44%|████████████████████████████████████▏                                              | 8772/20117 [5:32:11<7:12:11,  2.29s/it] 44%|████████████████████████████████████▏                                              | 8773/20117 [5:32:13<7:12:21,  2.29s/it] 44%|████████████████████████████████████▏                                              | 8774/20117 [5:32:15<7:10:06,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8775/20117 [5:32:17<7:09:07,  2.27s/it] 44%|████████████████████████████████████▏                                              | 8776/20117 [5:32:20<7:10:25,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8777/20117 [5:32:22<7:10:14,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8778/20117 [5:32:24<7:06:41,  2.26s/it] 44%|████████████████████████████████████▏                                              | 8779/20117 [5:32:26<7:11:14,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8780/20117 [5:32:29<7:12:23,  2.29s/it]                                                                                                                                 {'loss': 0.2519, 'grad_norm': 0.4597817361354828, 'learning_rate': 0.00012071491620648934, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.59, 'epoch': 0.87}
 44%|████████████████████████████████████▏                                              | 8780/20117 [5:32:29<7:12:23,  2.29s/it] 44%|████████████████████████████████████▏                                              | 8781/20117 [5:32:31<7:10:47,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8782/20117 [5:32:33<7:10:19,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8783/20117 [5:32:36<7:10:20,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8784/20117 [5:32:38<7:09:48,  2.28s/it] 44%|████████████████████████████████████▏                                              | 8785/20117 [5:32:40<7:13:56,  2.30s/it] 44%|████████████████████████████████████▏                                              | 8786/20117 [5:32:42<7:11:30,  2.28s/it] 44%|████████████████████████████████████▎                                              | 8787/20117 [5:32:45<7:10:21,  2.28s/it] 44%|████████████████████████████████████▎                                              | 8788/20117 [5:32:47<7:07:49,  2.27s/it] 44%|████████████████████████████████████▎                                              | 8789/20117 [5:32:49<7:07:45,  2.27s/it] 44%|████████████████████████████████████▎                                              | 8790/20117 [5:32:52<7:11:02,  2.28s/it]                                                                                                                                 {'loss': 0.2235, 'grad_norm': 0.13618378341197968, 'learning_rate': 0.00012056134879100138, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.39, 'epoch': 0.87}
 44%|████████████████████████████████████▎                                              | 8790/20117 [5:32:52<7:11:02,  2.28s/it] 44%|████████████████████████████████████▎                                              | 8791/20117 [5:32:54<7:07:29,  2.26s/it] 44%|████████████████████████████████████▎                                              | 8792/20117 [5:32:56<7:09:18,  2.27s/it] 44%|████████████████████████████████████▎                                              | 8793/20117 [5:32:58<7:08:29,  2.27s/it] 44%|████████████████████████████████████▎                                              | 8794/20117 [5:33:01<7:27:25,  2.37s/it] 44%|████████████████████████████████████▎                                              | 8795/20117 [5:33:03<7:23:01,  2.35s/it] 44%|████████████████████████████████████▎                                              | 8796/20117 [5:33:06<7:19:26,  2.33s/it] 44%|████████████████████████████████████▎                                              | 8797/20117 [5:33:08<7:18:04,  2.32s/it] 44%|████████████████████████████████████▎                                              | 8798/20117 [5:33:10<7:16:10,  2.31s/it] 44%|████████████████████████████████████▎                                              | 8799/20117 [5:33:12<7:11:16,  2.29s/it] 44%|████████████████████████████████████▎                                              | 8800/20117 [5:33:15<7:16:07,  2.31s/it]                                                                                                                                 {'loss': 0.2563, 'grad_norm': 0.5435792207717896, 'learning_rate': 0.00012040773072856566, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.95, 'epoch': 0.87}
 44%|████████████████████████████████████▎                                              | 8800/20117 [5:33:15<7:16:07,  2.31s/it] 44%|████████████████████████████████████▎                                              | 8801/20117 [5:33:17<7:11:20,  2.29s/it] 44%|████████████████████████████████████▎                                              | 8802/20117 [5:33:19<7:06:25,  2.26s/it] 44%|████████████████████████████████████▎                                              | 8803/20117 [5:33:21<7:09:44,  2.28s/it] 44%|████████████████████████████████████▎                                              | 8804/20117 [5:33:24<7:09:37,  2.28s/it] 44%|████████████████████████████████████▎                                              | 8805/20117 [5:33:26<7:11:11,  2.29s/it] 44%|████████████████████████████████████▎                                              | 8806/20117 [5:33:28<7:10:44,  2.28s/it] 44%|████████████████████████████████████▎                                              | 8807/20117 [5:33:31<7:08:38,  2.27s/it] 44%|████████████████████████████████████▎                                              | 8808/20117 [5:33:33<7:09:03,  2.28s/it] 44%|████████████████████████████████████▎                                              | 8809/20117 [5:33:35<7:07:16,  2.27s/it] 44%|████████████████████████████████████▎                                              | 8810/20117 [5:33:37<7:06:11,  2.26s/it]                                                                                                                                 {'loss': 0.1721, 'grad_norm': 0.24607907235622406, 'learning_rate': 0.00012025406239757588, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 305.91, 'epoch': 0.88}
 44%|████████████████████████████████████▎                                              | 8810/20117 [5:33:37<7:06:11,  2.26s/it] 44%|████████████████████████████████████▎                                              | 8811/20117 [5:33:40<7:06:13,  2.26s/it] 44%|████████████████████████████████████▎                                              | 8812/20117 [5:33:42<7:05:31,  2.26s/it] 44%|████████████████████████████████████▎                                              | 8813/20117 [5:33:44<7:03:41,  2.25s/it] 44%|████████████████████████████████████▎                                              | 8814/20117 [5:33:46<7:08:29,  2.27s/it] 44%|████████████████████████████████████▎                                              | 8815/20117 [5:33:49<7:11:35,  2.29s/it] 44%|████████████████████████████████████▎                                              | 8816/20117 [5:33:51<7:10:08,  2.28s/it] 44%|████████████████████████████████████▍                                              | 8817/20117 [5:33:53<7:11:46,  2.29s/it] 44%|████████████████████████████████████▍                                              | 8818/20117 [5:33:56<7:08:30,  2.28s/it] 44%|████████████████████████████████████▍                                              | 8819/20117 [5:33:58<7:04:47,  2.26s/it] 44%|████████████████████████████████████▍                                              | 8820/20117 [5:34:00<7:04:44,  2.26s/it]                                                                                                                                 {'loss': 0.2026, 'grad_norm': 0.3879210650920868, 'learning_rate': 0.00012010034417654962, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 305.29, 'epoch': 0.88}
 44%|████████████████████████████████████▍                                              | 8820/20117 [5:34:00<7:04:44,  2.26s/it] 44%|████████████████████████████████████▍                                              | 8821/20117 [5:34:02<7:04:30,  2.25s/it] 44%|████████████████████████████████████▍                                              | 8822/20117 [5:34:05<7:03:45,  2.25s/it] 44%|████████████████████████████████████▍                                              | 8823/20117 [5:34:07<7:01:54,  2.24s/it] 44%|████████████████████████████████████▍                                              | 8824/20117 [5:34:09<6:58:51,  2.23s/it] 44%|████████████████████████████████████▍                                              | 8825/20117 [5:34:11<7:02:04,  2.24s/it] 44%|████████████████████████████████████▍                                              | 8826/20117 [5:34:14<7:07:54,  2.27s/it] 44%|████████████████████████████████████▍                                              | 8827/20117 [5:34:16<7:05:21,  2.26s/it] 44%|████████████████████████████████████▍                                              | 8828/20117 [5:34:18<7:04:16,  2.26s/it] 44%|████████████████████████████████████▍                                              | 8829/20117 [5:34:20<7:01:08,  2.24s/it] 44%|████████████████████████████████████▍                                              | 8830/20117 [5:34:23<7:02:48,  2.25s/it]                                                                                                                                 {'loss': 0.1985, 'grad_norm': 0.4256599545478821, 'learning_rate': 0.00011994657644412734, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.52, 'epoch': 0.88}
 44%|████████████████████████████████████▍                                              | 8830/20117 [5:34:23<7:02:48,  2.25s/it] 44%|████████████████████████████████████▍                                              | 8831/20117 [5:34:25<7:02:52,  2.25s/it] 44%|████████████████████████████████████▍                                              | 8832/20117 [5:34:27<7:02:16,  2.25s/it] 44%|████████████████████████████████████▍                                              | 8833/20117 [5:34:29<7:03:27,  2.25s/it] 44%|████████████████████████████████████▍                                              | 8834/20117 [5:34:32<7:04:29,  2.26s/it] 44%|████████████████████████████████████▍                                              | 8835/20117 [5:34:34<7:08:47,  2.28s/it] 44%|████████████████████████████████████▍                                              | 8836/20117 [5:34:36<7:07:03,  2.27s/it] 44%|████████████████████████████████████▍                                              | 8837/20117 [5:34:38<7:05:34,  2.26s/it] 44%|████████████████████████████████████▍                                              | 8838/20117 [5:34:41<7:10:41,  2.29s/it] 44%|████████████████████████████████████▍                                              | 8839/20117 [5:34:43<7:08:36,  2.28s/it] 44%|████████████████████████████████████▍                                              | 8840/20117 [5:34:45<7:06:23,  2.27s/it]                                                                                                                                 {'loss': 0.2153, 'grad_norm': 0.2512848973274231, 'learning_rate': 0.00011979275957907146, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 321.37, 'epoch': 0.88}
 44%|████████████████████████████████████▍                                              | 8840/20117 [5:34:45<7:06:23,  2.27s/it] 44%|████████████████████████████████████▍                                              | 8841/20117 [5:34:48<7:13:12,  2.31s/it] 44%|████████████████████████████████████▍                                              | 8842/20117 [5:34:50<7:13:26,  2.31s/it] 44%|████████████████████████████████████▍                                              | 8843/20117 [5:34:52<7:14:58,  2.31s/it] 44%|████████████████████████████████████▍                                              | 8844/20117 [5:34:55<7:15:25,  2.32s/it] 44%|████████████████████████████████████▍                                              | 8845/20117 [5:34:57<7:16:16,  2.32s/it] 44%|████████████████████████████████████▍                                              | 8846/20117 [5:35:00<7:37:03,  2.43s/it] 44%|████████████████████████████████████▌                                              | 8847/20117 [5:35:02<7:30:20,  2.40s/it] 44%|████████████████████████████████████▌                                              | 8848/20117 [5:35:04<7:21:43,  2.35s/it] 44%|████████████████████████████████████▌                                              | 8849/20117 [5:35:06<7:13:34,  2.31s/it] 44%|████████████████████████████████████▌                                              | 8850/20117 [5:35:09<7:06:17,  2.27s/it]                                                                                                                                 {'loss': 0.2383, 'grad_norm': 0.38337549567222595, 'learning_rate': 0.00011963889396026547, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.62, 'epoch': 0.88}
 44%|████████████████████████████████████▌                                              | 8850/20117 [5:35:09<7:06:17,  2.27s/it] 44%|████████████████████████████████████▌                                              | 8851/20117 [5:35:11<7:01:38,  2.25s/it] 44%|████████████████████████████████████▌                                              | 8852/20117 [5:35:13<6:57:08,  2.22s/it] 44%|████████████████████████████████████▌                                              | 8853/20117 [5:35:15<6:54:15,  2.21s/it] 44%|████████████████████████████████████▌                                              | 8854/20117 [5:35:17<6:58:55,  2.23s/it] 44%|████████████████████████████████████▌                                              | 8855/20117 [5:35:20<7:04:27,  2.26s/it] 44%|████████████████████████████████████▌                                              | 8856/20117 [5:35:22<7:10:38,  2.29s/it] 44%|████████████████████████████████████▌                                              | 8857/20117 [5:35:24<7:12:04,  2.30s/it] 44%|████████████████████████████████████▌                                              | 8858/20117 [5:35:27<7:10:41,  2.30s/it] 44%|████████████████████████████████████▌                                              | 8859/20117 [5:35:29<7:09:58,  2.29s/it] 44%|████████████████████████████████████▌                                              | 8860/20117 [5:35:31<7:07:44,  2.28s/it]                                                                                                                                 {'loss': 0.2304, 'grad_norm': 0.3447147309780121, 'learning_rate': 0.00011948497996671286, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.81, 'epoch': 0.88}
 44%|████████████████████████████████████▌                                              | 8860/20117 [5:35:31<7:07:44,  2.28s/it] 44%|████████████████████████████████████▌                                              | 8861/20117 [5:35:34<7:10:01,  2.29s/it] 44%|████████████████████████████████████▌                                              | 8862/20117 [5:35:36<7:11:07,  2.30s/it] 44%|████████████████████████████████████▌                                              | 8863/20117 [5:35:38<7:02:04,  2.25s/it] 44%|████████████████████████████████████▌                                              | 8864/20117 [5:35:40<7:00:24,  2.24s/it] 44%|████████████████████████████████████▌                                              | 8865/20117 [5:35:42<7:01:09,  2.25s/it] 44%|████████████████████████████████████▌                                              | 8866/20117 [5:35:45<6:58:18,  2.23s/it] 44%|████████████████████████████████████▌                                              | 8867/20117 [5:35:47<7:05:49,  2.27s/it] 44%|████████████████████████████████████▌                                              | 8868/20117 [5:35:49<7:15:57,  2.33s/it] 44%|████████████████████████████████████▌                                              | 8869/20117 [5:35:52<7:15:40,  2.32s/it] 44%|████████████████████████████████████▌                                              | 8870/20117 [5:35:54<7:11:49,  2.30s/it]                                                                                                                                 {'loss': 0.2297, 'grad_norm': 0.40846845507621765, 'learning_rate': 0.00011933101797753637, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 384.1, 'epoch': 0.88}
 44%|████████████████████████████████████▌                                              | 8870/20117 [5:35:54<7:11:49,  2.30s/it] 44%|████████████████████████████████████▌                                              | 8871/20117 [5:35:56<7:12:28,  2.31s/it] 44%|████████████████████████████████████▌                                              | 8872/20117 [5:35:59<7:12:23,  2.31s/it] 44%|████████████████████████████████████▌                                              | 8873/20117 [5:36:01<7:12:59,  2.31s/it] 44%|████████████████████████████████████▌                                              | 8874/20117 [5:36:03<7:12:39,  2.31s/it] 44%|████████████████████████████████████▌                                              | 8875/20117 [5:36:06<7:09:03,  2.29s/it] 44%|████████████████████████████████████▌                                              | 8876/20117 [5:36:08<7:06:06,  2.27s/it] 44%|████████████████████████████████████▋                                              | 8877/20117 [5:36:10<7:10:32,  2.30s/it] 44%|████████████████████████████████████▋                                              | 8878/20117 [5:36:12<7:14:26,  2.32s/it] 44%|████████████████████████████████████▋                                              | 8879/20117 [5:36:15<7:13:31,  2.31s/it] 44%|████████████████████████████████████▋                                              | 8880/20117 [5:36:17<7:11:24,  2.30s/it]                                                                                                                                 {'loss': 0.2242, 'grad_norm': 0.5454410910606384, 'learning_rate': 0.0001191770083719769, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 387.99, 'epoch': 0.88}
 44%|████████████████████████████████████▋                                              | 8880/20117 [5:36:17<7:11:24,  2.30s/it] 44%|████████████████████████████████████▋                                              | 8881/20117 [5:36:19<7:11:49,  2.31s/it] 44%|████████████████████████████████████▋                                              | 8882/20117 [5:36:22<7:10:30,  2.30s/it] 44%|████████████████████████████████████▋                                              | 8883/20117 [5:36:24<7:08:42,  2.29s/it] 44%|████████████████████████████████████▋                                              | 8884/20117 [5:36:26<7:08:33,  2.29s/it] 44%|████████████████████████████████████▋                                              | 8885/20117 [5:36:29<7:15:06,  2.32s/it] 44%|████████████████████████████████████▋                                              | 8886/20117 [5:36:31<7:20:37,  2.35s/it] 44%|████████████████████████████████████▋                                              | 8887/20117 [5:36:33<7:20:47,  2.36s/it] 44%|████████████████████████████████████▋                                              | 8888/20117 [5:36:36<7:20:35,  2.35s/it] 44%|████████████████████████████████████▋                                              | 8889/20117 [5:36:38<7:18:50,  2.35s/it] 44%|████████████████████████████████████▋                                              | 8890/20117 [5:36:41<7:26:38,  2.39s/it]                                                                                                                                 {'loss': 0.2381, 'grad_norm': 0.4706827700138092, 'learning_rate': 0.00011902295152939262, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.64, 'epoch': 0.88}
 44%|████████████████████████████████████▋                                              | 8890/20117 [5:36:41<7:26:38,  2.39s/it] 44%|████████████████████████████████████▋                                              | 8891/20117 [5:36:43<7:26:31,  2.39s/it] 44%|████████████████████████████████████▋                                              | 8892/20117 [5:36:45<7:30:12,  2.41s/it] 44%|████████████████████████████████████▋                                              | 8893/20117 [5:36:48<7:29:11,  2.40s/it] 44%|████████████████████████████████████▋                                              | 8894/20117 [5:36:50<7:25:09,  2.38s/it] 44%|████████████████████████████████████▋                                              | 8895/20117 [5:36:53<7:27:59,  2.40s/it] 44%|████████████████████████████████████▋                                              | 8896/20117 [5:36:55<7:24:05,  2.37s/it] 44%|████████████████████████████████████▋                                              | 8897/20117 [5:36:57<7:31:04,  2.41s/it] 44%|████████████████████████████████████▋                                              | 8898/20117 [5:37:00<7:43:50,  2.48s/it] 44%|████████████████████████████████████▋                                              | 8899/20117 [5:37:02<7:32:23,  2.42s/it] 44%|████████████████████████████████████▋                                              | 8900/20117 [5:37:05<7:26:53,  2.39s/it]                                                                                                                                 {'loss': 0.2417, 'grad_norm': 0.6073552370071411, 'learning_rate': 0.00011886884782925816, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 315.52, 'epoch': 0.88}
 44%|████████████████████████████████████▋                                              | 8900/20117 [5:37:05<7:26:53,  2.39s/it] 44%|████████████████████████████████████▋                                              | 8901/20117 [5:37:07<7:20:51,  2.36s/it] 44%|████████████████████████████████████▋                                              | 8902/20117 [5:37:09<7:14:39,  2.33s/it] 44%|████████████████████████████████████▋                                              | 8903/20117 [5:37:11<7:13:00,  2.32s/it] 44%|████████████████████████████████████▋                                              | 8904/20117 [5:37:14<7:12:03,  2.31s/it] 44%|████████████████████████████████████▋                                              | 8905/20117 [5:37:16<7:10:04,  2.30s/it] 44%|████████████████████████████████████▋                                              | 8906/20117 [5:37:19<7:22:57,  2.37s/it] 44%|████████████████████████████████████▋                                              | 8907/20117 [5:37:21<7:32:28,  2.42s/it] 44%|████████████████████████████████████▊                                              | 8908/20117 [5:37:23<7:29:31,  2.41s/it] 44%|████████████████████████████████████▊                                              | 8909/20117 [5:37:26<7:25:16,  2.38s/it] 44%|████████████████████████████████████▊                                              | 8910/20117 [5:37:28<7:21:44,  2.36s/it]                                                                                                                                 {'loss': 0.2117, 'grad_norm': 0.3200027644634247, 'learning_rate': 0.00011871469765116346, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.02, 'epoch': 0.89}
 44%|████████████████████████████████████▊                                              | 8910/20117 [5:37:28<7:21:44,  2.36s/it] 44%|████████████████████████████████████▊                                              | 8911/20117 [5:37:30<7:22:31,  2.37s/it] 44%|████████████████████████████████████▊                                              | 8912/20117 [5:37:33<7:23:11,  2.37s/it] 44%|████████████████████████████████████▊                                              | 8913/20117 [5:37:35<7:19:10,  2.35s/it] 44%|████████████████████████████████████▊                                              | 8914/20117 [5:37:37<7:15:32,  2.33s/it] 44%|████████████████████████████████████▊                                              | 8915/20117 [5:37:40<7:12:37,  2.32s/it] 44%|████████████████████████████████████▊                                              | 8916/20117 [5:37:42<7:16:38,  2.34s/it] 44%|████████████████████████████████████▊                                              | 8917/20117 [5:37:44<7:15:24,  2.33s/it] 44%|████████████████████████████████████▊                                              | 8918/20117 [5:37:47<7:14:25,  2.33s/it] 44%|████████████████████████████████████▊                                              | 8919/20117 [5:37:49<7:12:36,  2.32s/it] 44%|████████████████████████████████████▊                                              | 8920/20117 [5:37:51<7:08:39,  2.30s/it]                                                                                                                                 {'loss': 0.2552, 'grad_norm': 0.42333486676216125, 'learning_rate': 0.00011856050137481301, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.23, 'epoch': 0.89}
 44%|████████████████████████████████████▊                                              | 8920/20117 [5:37:51<7:08:39,  2.30s/it] 44%|████████████████████████████████████▊                                              | 8921/20117 [5:37:54<7:09:39,  2.30s/it] 44%|████████████████████████████████████▊                                              | 8922/20117 [5:37:56<7:08:21,  2.30s/it] 44%|████████████████████████████████████▊                                              | 8923/20117 [5:37:58<7:03:27,  2.27s/it] 44%|████████████████████████████████████▊                                              | 8924/20117 [5:38:00<7:00:42,  2.26s/it] 44%|████████████████████████████████████▊                                              | 8925/20117 [5:38:03<6:59:31,  2.25s/it] 44%|████████████████████████████████████▊                                              | 8926/20117 [5:38:05<6:58:09,  2.24s/it] 44%|████████████████████████████████████▊                                              | 8927/20117 [5:38:07<6:58:05,  2.24s/it] 44%|████████████████████████████████████▊                                              | 8928/20117 [5:38:09<6:58:52,  2.25s/it] 44%|████████████████████████████████████▊                                              | 8929/20117 [5:38:12<7:03:52,  2.27s/it] 44%|████████████████████████████████████▊                                              | 8930/20117 [5:38:14<7:02:08,  2.26s/it]                                                                                                                                 {'loss': 0.1743, 'grad_norm': 0.2578692138195038, 'learning_rate': 0.00011840625938002481, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.78, 'epoch': 0.89}
 44%|████████████████████████████████████▊                                              | 8930/20117 [5:38:14<7:02:08,  2.26s/it] 44%|████████████████████████████████████▊                                              | 8931/20117 [5:38:16<7:04:51,  2.28s/it] 44%|████████████████████████████████████▊                                              | 8932/20117 [5:38:18<7:02:30,  2.27s/it] 44%|████████████████████████████████████▊                                              | 8933/20117 [5:38:21<7:03:37,  2.27s/it] 44%|████████████████████████████████████▊                                              | 8934/20117 [5:38:23<7:02:41,  2.27s/it] 44%|████████████████████████████████████▊                                              | 8935/20117 [5:38:25<7:03:45,  2.27s/it] 44%|████████████████████████████████████▊                                              | 8936/20117 [5:38:28<7:08:06,  2.30s/it] 44%|████████████████████████████████████▊                                              | 8937/20117 [5:38:30<7:05:02,  2.28s/it] 44%|████████████████████████████████████▉                                              | 8938/20117 [5:38:32<7:01:29,  2.26s/it] 44%|████████████████████████████████████▉                                              | 8939/20117 [5:38:34<7:01:01,  2.26s/it] 44%|████████████████████████████████████▉                                              | 8940/20117 [5:38:37<7:00:50,  2.26s/it]                                                                                                                                 {'loss': 0.2637, 'grad_norm': 0.3321487605571747, 'learning_rate': 0.00011825197204672952, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 424.07, 'epoch': 0.89}
 44%|████████████████████████████████████▉                                              | 8940/20117 [5:38:37<7:00:50,  2.26s/it] 44%|████████████████████████████████████▉                                              | 8941/20117 [5:38:39<6:58:49,  2.25s/it] 44%|████████████████████████████████████▉                                              | 8942/20117 [5:38:41<7:00:59,  2.26s/it] 44%|████████████████████████████████████▉                                              | 8943/20117 [5:38:43<7:00:01,  2.26s/it] 44%|████████████████████████████████████▉                                              | 8944/20117 [5:38:46<7:01:31,  2.26s/it] 44%|████████████████████████████████████▉                                              | 8945/20117 [5:38:48<6:59:34,  2.25s/it] 44%|████████████████████████████████████▉                                              | 8946/20117 [5:38:50<6:58:33,  2.25s/it] 44%|████████████████████████████████████▉                                              | 8947/20117 [5:38:52<6:58:25,  2.25s/it] 44%|████████████████████████████████████▉                                              | 8948/20117 [5:38:55<7:03:23,  2.27s/it] 44%|████████████████████████████████████▉                                              | 8949/20117 [5:38:57<7:23:20,  2.38s/it] 44%|████████████████████████████████████▉                                              | 8950/20117 [5:39:00<7:16:23,  2.34s/it]                                                                                                                                 {'loss': 0.2272, 'grad_norm': 0.3663140833377838, 'learning_rate': 0.00011809763975496944, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.15, 'epoch': 0.89}
 44%|████████████████████████████████████▉                                              | 8950/20117 [5:39:00<7:16:23,  2.34s/it] 44%|████████████████████████████████████▉                                              | 8951/20117 [5:39:02<7:10:31,  2.31s/it] 44%|████████████████████████████████████▉                                              | 8952/20117 [5:39:04<7:07:46,  2.30s/it] 45%|████████████████████████████████████▉                                              | 8953/20117 [5:39:06<7:07:16,  2.30s/it] 45%|████████████████████████████████████▉                                              | 8954/20117 [5:39:09<7:02:29,  2.27s/it] 45%|████████████████████████████████████▉                                              | 8955/20117 [5:39:11<7:06:48,  2.29s/it] 45%|████████████████████████████████████▉                                              | 8956/20117 [5:39:13<7:05:31,  2.29s/it] 45%|████████████████████████████████████▉                                              | 8957/20117 [5:39:15<7:03:14,  2.28s/it] 45%|████████████████████████████████████▉                                              | 8958/20117 [5:39:18<7:01:43,  2.27s/it] 45%|████████████████████████████████████▉                                              | 8959/20117 [5:39:20<6:59:00,  2.25s/it] 45%|████████████████████████████████████▉                                              | 8960/20117 [5:39:22<6:56:57,  2.24s/it]                                                                                                                                 {'loss': 0.2723, 'grad_norm': 0.2619111239910126, 'learning_rate': 0.00011794326288489761, 'memory/max_active (GiB)': 20.03, 'memory/max_allocated (GiB)': 20.03, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.64, 'epoch': 0.89}
 45%|████████████████████████████████████▉                                              | 8960/20117 [5:39:22<6:56:57,  2.24s/it] 45%|████████████████████████████████████▉                                              | 8961/20117 [5:39:24<6:58:46,  2.25s/it] 45%|████████████████████████████████████▉                                              | 8962/20117 [5:39:27<6:54:51,  2.23s/it] 45%|████████████████████████████████████▉                                              | 8963/20117 [5:39:29<6:56:43,  2.24s/it] 45%|████████████████████████████████████▉                                              | 8964/20117 [5:39:31<6:59:14,  2.26s/it] 45%|████████████████████████████████████▉                                              | 8965/20117 [5:39:33<6:54:50,  2.23s/it] 45%|████████████████████████████████████▉                                              | 8966/20117 [5:39:36<6:57:33,  2.25s/it] 45%|████████████████████████████████████▉                                              | 8967/20117 [5:39:38<7:04:02,  2.28s/it] 45%|█████████████████████████████████████                                              | 8968/20117 [5:39:40<7:01:19,  2.27s/it] 45%|█████████████████████████████████████                                              | 8969/20117 [5:39:43<7:03:29,  2.28s/it] 45%|█████████████████████████████████████                                              | 8970/20117 [5:39:45<7:00:54,  2.27s/it]                                                                                                                                 {'loss': 0.2633, 'grad_norm': 0.4359273612499237, 'learning_rate': 0.0001177888418167769, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.63, 'epoch': 0.89}
 45%|█████████████████████████████████████                                              | 8970/20117 [5:39:45<7:00:54,  2.27s/it] 45%|█████████████████████████████████████                                              | 8971/20117 [5:39:47<6:58:29,  2.25s/it] 45%|█████████████████████████████████████                                              | 8972/20117 [5:39:49<7:01:37,  2.27s/it] 45%|█████████████████████████████████████                                              | 8973/20117 [5:39:52<7:00:15,  2.26s/it] 45%|█████████████████████████████████████                                              | 8974/20117 [5:39:54<7:04:22,  2.29s/it] 45%|█████████████████████████████████████                                              | 8975/20117 [5:39:56<7:02:11,  2.27s/it] 45%|█████████████████████████████████████                                              | 8976/20117 [5:39:58<7:00:37,  2.27s/it] 45%|█████████████████████████████████████                                              | 8977/20117 [5:40:01<7:05:37,  2.29s/it] 45%|█████████████████████████████████████                                              | 8978/20117 [5:40:03<7:03:25,  2.28s/it] 45%|█████████████████████████████████████                                              | 8979/20117 [5:40:05<7:00:46,  2.27s/it] 45%|█████████████████████████████████████                                              | 8980/20117 [5:40:07<7:01:26,  2.27s/it]                                                                                                                                 {'loss': 0.2789, 'grad_norm': 0.29716983437538147, 'learning_rate': 0.00011763437693097903, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 386.97, 'epoch': 0.89}
 45%|█████████████████████████████████████                                              | 8980/20117 [5:40:07<7:01:26,  2.27s/it] 45%|█████████████████████████████████████                                              | 8981/20117 [5:40:10<7:06:10,  2.30s/it] 45%|█████████████████████████████████████                                              | 8982/20117 [5:40:12<7:06:22,  2.30s/it] 45%|█████████████████████████████████████                                              | 8983/20117 [5:40:14<7:02:30,  2.28s/it] 45%|█████████████████████████████████████                                              | 8984/20117 [5:40:17<6:58:33,  2.26s/it] 45%|█████████████████████████████████████                                              | 8985/20117 [5:40:19<7:00:08,  2.26s/it] 45%|█████████████████████████████████████                                              | 8986/20117 [5:40:21<6:58:30,  2.26s/it] 45%|█████████████████████████████████████                                              | 8987/20117 [5:40:23<6:54:14,  2.23s/it] 45%|█████████████████████████████████████                                              | 8988/20117 [5:40:25<6:53:48,  2.23s/it] 45%|█████████████████████████████████████                                              | 8989/20117 [5:40:28<6:56:50,  2.25s/it] 45%|█████████████████████████████████████                                              | 8990/20117 [5:40:30<6:56:05,  2.24s/it]                                                                                                                                 {'loss': 0.2057, 'grad_norm': 0.6009801626205444, 'learning_rate': 0.00011747986860798368, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.77, 'epoch': 0.89}
 45%|█████████████████████████████████████                                              | 8990/20117 [5:40:30<6:56:05,  2.24s/it] 45%|█████████████████████████████████████                                              | 8991/20117 [5:40:32<6:57:25,  2.25s/it] 45%|█████████████████████████████████████                                              | 8992/20117 [5:40:35<7:00:57,  2.27s/it] 45%|█████████████████████████████████████                                              | 8993/20117 [5:40:37<6:57:45,  2.25s/it] 45%|█████████████████████████████████████                                              | 8994/20117 [5:40:39<6:59:45,  2.26s/it] 45%|█████████████████████████████████████                                              | 8995/20117 [5:40:41<6:58:46,  2.26s/it] 45%|█████████████████████████████████████                                              | 8996/20117 [5:40:44<6:59:05,  2.26s/it] 45%|█████████████████████████████████████                                              | 8997/20117 [5:40:46<7:03:12,  2.28s/it] 45%|█████████████████████████████████████                                              | 8998/20117 [5:40:48<6:59:41,  2.26s/it] 45%|█████████████████████████████████████▏                                             | 8999/20117 [5:40:50<7:01:02,  2.27s/it] 45%|█████████████████████████████████████▏                                             | 9000/20117 [5:40:53<7:21:16,  2.38s/it]                                                                                                                                 {'loss': 0.2232, 'grad_norm': 0.4606180191040039, 'learning_rate': 0.0001173253172283775, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 283.3, 'epoch': 0.89}
 45%|█████████████████████████████████████▏                                             | 9000/20117 [5:40:53<7:21:16,  2.38s/it] 45%|█████████████████████████████████████▏                                             | 9001/20117 [5:40:55<7:12:52,  2.34s/it] 45%|█████████████████████████████████████▏                                             | 9002/20117 [5:40:58<7:10:40,  2.32s/it] 45%|█████████████████████████████████████▏                                             | 9003/20117 [5:41:00<7:04:44,  2.29s/it] 45%|█████████████████████████████████████▏                                             | 9004/20117 [5:41:02<7:07:00,  2.31s/it] 45%|█████████████████████████████████████▏                                             | 9005/20117 [5:41:04<7:04:39,  2.29s/it] 45%|█████████████████████████████████████▏                                             | 9006/20117 [5:41:07<7:02:57,  2.28s/it] 45%|█████████████████████████████████████▏                                             | 9007/20117 [5:41:09<7:01:51,  2.28s/it] 45%|█████████████████████████████████████▏                                             | 9008/20117 [5:41:11<7:01:01,  2.27s/it] 45%|█████████████████████████████████████▏                                             | 9009/20117 [5:41:13<7:00:05,  2.27s/it] 45%|█████████████████████████████████████▏                                             | 9010/20117 [5:41:16<7:13:06,  2.34s/it]                                                                                                                                 {'loss': 0.2175, 'grad_norm': 0.3887878358364105, 'learning_rate': 0.00011717072317285318, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.78, 'epoch': 0.9}
 45%|█████████████████████████████████████▏                                             | 9010/20117 [5:41:16<7:13:06,  2.34s/it] 45%|█████████████████████████████████████▏                                             | 9011/20117 [5:41:18<7:07:09,  2.31s/it] 45%|█████████████████████████████████████▏                                             | 9012/20117 [5:41:21<7:07:14,  2.31s/it] 45%|█████████████████████████████████████▏                                             | 9013/20117 [5:41:23<7:06:39,  2.31s/it] 45%|█████████████████████████████████████▏                                             | 9014/20117 [5:41:25<7:02:09,  2.28s/it] 45%|█████████████████████████████████████▏                                             | 9015/20117 [5:41:27<6:59:47,  2.27s/it] 45%|█████████████████████████████████████▏                                             | 9016/20117 [5:41:30<6:58:01,  2.26s/it] 45%|█████████████████████████████████████▏                                             | 9017/20117 [5:41:32<6:57:15,  2.26s/it] 45%|█████████████████████████████████████▏                                             | 9018/20117 [5:41:34<6:57:11,  2.26s/it] 45%|█████████████████████████████████████▏                                             | 9019/20117 [5:41:36<6:55:06,  2.24s/it] 45%|█████████████████████████████████████▏                                             | 9020/20117 [5:41:39<6:56:41,  2.25s/it]                                                                                                                                 {'loss': 0.2491, 'grad_norm': 0.3322821855545044, 'learning_rate': 0.0001170160868222086, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.47, 'epoch': 0.9}
 45%|█████████████████████████████████████▏                                             | 9020/20117 [5:41:39<6:56:41,  2.25s/it] 45%|█████████████████████████████████████▏                                             | 9021/20117 [5:41:41<6:56:28,  2.25s/it] 45%|█████████████████████████████████████▏                                             | 9022/20117 [5:41:43<6:54:43,  2.24s/it] 45%|█████████████████████████████████████▏                                             | 9023/20117 [5:41:45<6:53:15,  2.24s/it] 45%|█████████████████████████████████████▏                                             | 9024/20117 [5:41:48<6:58:13,  2.26s/it] 45%|█████████████████████████████████████▏                                             | 9025/20117 [5:41:50<6:55:33,  2.25s/it] 45%|█████████████████████████████████████▏                                             | 9026/20117 [5:41:52<6:53:37,  2.24s/it] 45%|█████████████████████████████████████▏                                             | 9027/20117 [5:41:54<6:59:27,  2.27s/it] 45%|█████████████████████████████████████▏                                             | 9028/20117 [5:41:57<6:56:30,  2.25s/it] 45%|█████████████████████████████████████▎                                             | 9029/20117 [5:41:59<6:58:28,  2.26s/it] 45%|█████████████████████████████████████▎                                             | 9030/20117 [5:42:01<6:59:25,  2.27s/it]                                                                                                                                 {'loss': 0.2568, 'grad_norm': 0.5119565725326538, 'learning_rate': 0.00011686140855734571, 'memory/max_active (GiB)': 19.66, 'memory/max_allocated (GiB)': 19.66, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.62, 'epoch': 0.9}
 45%|█████████████████████████████████████▎                                             | 9030/20117 [5:42:01<6:59:25,  2.27s/it] 45%|█████████████████████████████████████▎                                             | 9031/20117 [5:42:03<6:55:14,  2.25s/it] 45%|█████████████████████████████████████▎                                             | 9032/20117 [5:42:06<6:55:52,  2.25s/it] 45%|█████████████████████████████████████▎                                             | 9033/20117 [5:42:08<6:55:43,  2.25s/it] 45%|█████████████████████████████████████▎                                             | 9034/20117 [5:42:10<6:54:41,  2.25s/it] 45%|█████████████████████████████████████▎                                             | 9035/20117 [5:42:12<6:49:50,  2.22s/it] 45%|█████████████████████████████████████▎                                             | 9036/20117 [5:42:14<6:47:34,  2.21s/it] 45%|█████████████████████████████████████▎                                             | 9037/20117 [5:42:17<6:48:00,  2.21s/it] 45%|█████████████████████████████████████▎                                             | 9038/20117 [5:42:19<6:46:41,  2.20s/it] 45%|█████████████████████████████████████▎                                             | 9039/20117 [5:42:21<6:52:16,  2.23s/it] 45%|█████████████████████████████████████▎                                             | 9040/20117 [5:42:23<7:00:20,  2.28s/it]                                                                                                                                 {'loss': 0.3019, 'grad_norm': 0.32013964653015137, 'learning_rate': 0.00011670668875926982, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.77, 'epoch': 0.9}
 45%|█████████████████████████████████████▎                                             | 9040/20117 [5:42:23<7:00:20,  2.28s/it] 45%|█████████████████████████████████████▎                                             | 9041/20117 [5:42:26<7:08:56,  2.32s/it] 45%|█████████████████████████████████████▎                                             | 9042/20117 [5:42:28<7:10:10,  2.33s/it] 45%|█████████████████████████████████████▎                                             | 9043/20117 [5:42:31<7:09:42,  2.33s/it] 45%|█████████████████████████████████████▎                                             | 9044/20117 [5:42:33<7:11:21,  2.34s/it] 45%|█████████████████████████████████████▎                                             | 9045/20117 [5:42:35<7:09:11,  2.33s/it] 45%|█████████████████████████████████████▎                                             | 9046/20117 [5:42:38<7:10:00,  2.33s/it] 45%|█████████████████████████████████████▎                                             | 9047/20117 [5:42:40<7:06:40,  2.31s/it] 45%|█████████████████████████████████████▎                                             | 9048/20117 [5:42:42<6:59:24,  2.27s/it] 45%|█████████████████████████████████████▎                                             | 9049/20117 [5:42:44<6:52:20,  2.24s/it] 45%|█████████████████████████████████████▎                                             | 9050/20117 [5:42:46<6:47:43,  2.21s/it]                                                                                                                                 {'loss': 0.1927, 'grad_norm': 0.19132547080516815, 'learning_rate': 0.00011655192780908849, 'memory/max_active (GiB)': 18.16, 'memory/max_allocated (GiB)': 18.16, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.79, 'epoch': 0.9}
 45%|█████████████████████████████████████▎                                             | 9050/20117 [5:42:46<6:47:43,  2.21s/it] 45%|█████████████████████████████████████▎                                             | 9051/20117 [5:42:48<6:44:32,  2.19s/it] 45%|█████████████████████████████████████▎                                             | 9052/20117 [5:42:51<6:55:20,  2.25s/it] 45%|█████████████████████████████████████▎                                             | 9053/20117 [5:42:54<7:24:41,  2.41s/it] 45%|█████████████████████████████████████▎                                             | 9054/20117 [5:42:56<7:22:34,  2.40s/it] 45%|█████████████████████████████████████▎                                             | 9055/20117 [5:42:58<7:15:53,  2.36s/it] 45%|█████████████████████████████████████▎                                             | 9056/20117 [5:43:01<7:09:46,  2.33s/it] 45%|█████████████████████████████████████▎                                             | 9057/20117 [5:43:03<7:05:51,  2.31s/it] 45%|█████████████████████████████████████▎                                             | 9058/20117 [5:43:05<7:06:48,  2.32s/it] 45%|█████████████████████████████████████▍                                             | 9059/20117 [5:43:07<7:05:38,  2.31s/it] 45%|█████████████████████████████████████▍                                             | 9060/20117 [5:43:10<7:05:10,  2.31s/it]                                                                                                                                 {'loss': 0.2013, 'grad_norm': 0.4459245502948761, 'learning_rate': 0.00011639712608801059, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.24, 'epoch': 0.9}
 45%|█████████████████████████████████████▍                                             | 9060/20117 [5:43:10<7:05:10,  2.31s/it] 45%|█████████████████████████████████████▍                                             | 9061/20117 [5:43:12<7:02:58,  2.30s/it] 45%|█████████████████████████████████████▍                                             | 9062/20117 [5:43:14<7:02:49,  2.29s/it] 45%|█████████████████████████████████████▍                                             | 9063/20117 [5:43:17<7:04:10,  2.30s/it] 45%|█████████████████████████████████████▍                                             | 9064/20117 [5:43:19<7:01:32,  2.29s/it] 45%|█████████████████████████████████████▍                                             | 9065/20117 [5:43:21<7:00:56,  2.29s/it] 45%|█████████████████████████████████████▍                                             | 9066/20117 [5:43:23<6:58:22,  2.27s/it] 45%|█████████████████████████████████████▍                                             | 9067/20117 [5:43:26<6:57:17,  2.27s/it] 45%|█████████████████████████████████████▍                                             | 9068/20117 [5:43:28<7:04:08,  2.30s/it] 45%|█████████████████████████████████████▍                                             | 9069/20117 [5:43:30<7:00:38,  2.28s/it] 45%|█████████████████████████████████████▍                                             | 9070/20117 [5:43:33<6:59:08,  2.28s/it]                                                                                                                                 {'loss': 0.1513, 'grad_norm': 0.3540112376213074, 'learning_rate': 0.00011624228397734556, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.78, 'epoch': 0.9}
 45%|█████████████████████████████████████▍                                             | 9070/20117 [5:43:33<6:59:08,  2.28s/it] 45%|█████████████████████████████████████▍                                             | 9071/20117 [5:43:35<6:56:57,  2.26s/it] 45%|█████████████████████████████████████▍                                             | 9072/20117 [5:43:37<6:56:00,  2.26s/it] 45%|█████████████████████████████████████▍                                             | 9073/20117 [5:43:39<6:56:40,  2.26s/it] 45%|█████████████████████████████████████▍                                             | 9074/20117 [5:43:42<7:00:55,  2.29s/it] 45%|█████████████████████████████████████▍                                             | 9075/20117 [5:43:44<7:02:11,  2.29s/it] 45%|█████████████████████████████████████▍                                             | 9076/20117 [5:43:46<7:03:18,  2.30s/it] 45%|█████████████████████████████████████▍                                             | 9077/20117 [5:43:48<6:58:23,  2.27s/it] 45%|█████████████████████████████████████▍                                             | 9078/20117 [5:43:51<6:57:14,  2.27s/it] 45%|█████████████████████████████████████▍                                             | 9079/20117 [5:43:53<7:01:37,  2.29s/it] 45%|█████████████████████████████████████▍                                             | 9080/20117 [5:43:55<6:58:46,  2.28s/it]                                                                                                                                 {'loss': 0.2055, 'grad_norm': 0.3951474130153656, 'learning_rate': 0.00011608740185850219, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.92, 'epoch': 0.9}
 45%|█████████████████████████████████████▍                                             | 9080/20117 [5:43:55<6:58:46,  2.28s/it] 45%|█████████████████████████████████████▍                                             | 9081/20117 [5:43:58<6:59:18,  2.28s/it] 45%|█████████████████████████████████████▍                                             | 9082/20117 [5:44:00<6:57:52,  2.27s/it] 45%|█████████████████████████████████████▍                                             | 9083/20117 [5:44:02<6:56:10,  2.26s/it] 45%|█████████████████████████████████████▍                                             | 9084/20117 [5:44:04<6:56:33,  2.27s/it] 45%|█████████████████████████████████████▍                                             | 9085/20117 [5:44:07<6:56:21,  2.26s/it] 45%|█████████████████████████████████████▍                                             | 9086/20117 [5:44:09<6:59:08,  2.28s/it] 45%|█████████████████████████████████████▍                                             | 9087/20117 [5:44:11<7:00:51,  2.29s/it] 45%|█████████████████████████████████████▍                                             | 9088/20117 [5:44:14<7:03:25,  2.30s/it] 45%|█████████████████████████████████████▍                                             | 9089/20117 [5:44:16<6:58:41,  2.28s/it] 45%|█████████████████████████████████████▌                                             | 9090/20117 [5:44:18<6:58:32,  2.28s/it]                                                                                                                                 {'loss': 0.2148, 'grad_norm': 0.3915760815143585, 'learning_rate': 0.00011593248011298791, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.61, 'epoch': 0.9}
 45%|█████████████████████████████████████▌                                             | 9090/20117 [5:44:18<6:58:32,  2.28s/it] 45%|█████████████████████████████████████▌                                             | 9091/20117 [5:44:20<6:59:01,  2.28s/it] 45%|█████████████████████████████████████▌                                             | 9092/20117 [5:44:23<7:00:55,  2.29s/it] 45%|█████████████████████████████████████▌                                             | 9093/20117 [5:44:25<7:00:18,  2.29s/it] 45%|█████████████████████████████████████▌                                             | 9094/20117 [5:44:27<7:00:23,  2.29s/it] 45%|█████████████████████████████████████▌                                             | 9095/20117 [5:44:29<6:57:20,  2.27s/it] 45%|█████████████████████████████████████▌                                             | 9096/20117 [5:44:32<6:53:10,  2.25s/it] 45%|█████████████████████████████████████▌                                             | 9097/20117 [5:44:34<6:50:18,  2.23s/it] 45%|█████████████████████████████████████▌                                             | 9098/20117 [5:44:36<6:53:16,  2.25s/it] 45%|█████████████████████████████████████▌                                             | 9099/20117 [5:44:38<6:51:06,  2.24s/it] 45%|█████████████████████████████████████▌                                             | 9100/20117 [5:44:41<6:50:08,  2.23s/it]                                                                                                                                 {'loss': 0.187, 'grad_norm': 0.49453213810920715, 'learning_rate': 0.00011577751912240771, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.24, 'epoch': 0.9}
 45%|█████████████████████████████████████▌                                             | 9100/20117 [5:44:41<6:50:08,  2.23s/it] 45%|█████████████████████████████████████▌                                             | 9101/20117 [5:44:43<6:51:00,  2.24s/it] 45%|█████████████████████████████████████▌                                             | 9102/20117 [5:44:45<6:50:07,  2.23s/it] 45%|█████████████████████████████████████▌                                             | 9103/20117 [5:44:47<6:50:22,  2.24s/it] 45%|█████████████████████████████████████▌                                             | 9104/20117 [5:44:50<6:54:18,  2.26s/it] 45%|█████████████████████████████████████▌                                             | 9105/20117 [5:44:52<6:53:53,  2.26s/it] 45%|█████████████████████████████████████▌                                             | 9106/20117 [5:44:54<6:54:49,  2.26s/it] 45%|█████████████████████████████████████▌                                             | 9107/20117 [5:44:57<7:11:52,  2.35s/it] 45%|█████████████████████████████████████▌                                             | 9108/20117 [5:44:59<7:10:32,  2.35s/it] 45%|█████████████████████████████████████▌                                             | 9109/20117 [5:45:01<7:06:03,  2.32s/it] 45%|█████████████████████████████████████▌                                             | 9110/20117 [5:45:04<6:59:42,  2.29s/it]                                                                                                                                 {'loss': 0.219, 'grad_norm': 0.5185866951942444, 'learning_rate': 0.00011562251926846326, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 307.18, 'epoch': 0.91}
 45%|█████████████████████████████████████▌                                             | 9110/20117 [5:45:04<6:59:42,  2.29s/it] 45%|█████████████████████████████████████▌                                             | 9111/20117 [5:45:06<6:57:11,  2.27s/it] 45%|█████████████████████████████████████▌                                             | 9112/20117 [5:45:08<6:57:52,  2.28s/it] 45%|█████████████████████████████████████▌                                             | 9113/20117 [5:45:10<6:58:16,  2.28s/it] 45%|█████████████████████████████████████▌                                             | 9114/20117 [5:45:13<6:56:58,  2.27s/it] 45%|█████████████████████████████████████▌                                             | 9115/20117 [5:45:15<6:55:09,  2.26s/it] 45%|█████████████████████████████████████▌                                             | 9116/20117 [5:45:17<6:51:24,  2.24s/it] 45%|█████████████████████████████████████▌                                             | 9117/20117 [5:45:19<6:51:14,  2.24s/it] 45%|█████████████████████████████████████▌                                             | 9118/20117 [5:45:22<6:52:45,  2.25s/it] 45%|█████████████████████████████████████▌                                             | 9119/20117 [5:45:24<6:54:10,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9120/20117 [5:45:26<6:53:57,  2.26s/it]                                                                                                                                 {'loss': 0.2127, 'grad_norm': 0.44589415192604065, 'learning_rate': 0.00011546748093295195, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.35, 'epoch': 0.91}
 45%|█████████████████████████████████████▋                                             | 9120/20117 [5:45:26<6:53:57,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9121/20117 [5:45:28<6:51:42,  2.25s/it] 45%|█████████████████████████████████████▋                                             | 9122/20117 [5:45:30<6:48:41,  2.23s/it] 45%|█████████████████████████████████████▋                                             | 9123/20117 [5:45:33<6:53:25,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9124/20117 [5:45:35<6:53:37,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9125/20117 [5:45:37<6:53:16,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9126/20117 [5:45:40<6:50:50,  2.24s/it] 45%|█████████████████████████████████████▋                                             | 9127/20117 [5:45:42<6:50:14,  2.24s/it] 45%|█████████████████████████████████████▋                                             | 9128/20117 [5:45:44<6:50:49,  2.24s/it] 45%|█████████████████████████████████████▋                                             | 9129/20117 [5:45:46<6:59:43,  2.29s/it] 45%|█████████████████████████████████████▋                                             | 9130/20117 [5:45:49<6:54:54,  2.27s/it]                                                                                                                                 {'loss': 0.2057, 'grad_norm': 0.4594690501689911, 'learning_rate': 0.00011531240449776594, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.45, 'epoch': 0.91}
 45%|█████████████████████████████████████▋                                             | 9130/20117 [5:45:49<6:54:54,  2.27s/it] 45%|█████████████████████████████████████▋                                             | 9131/20117 [5:45:51<6:58:03,  2.28s/it] 45%|█████████████████████████████████████▋                                             | 9132/20117 [5:45:53<6:54:47,  2.27s/it] 45%|█████████████████████████████████████▋                                             | 9133/20117 [5:45:55<6:52:10,  2.25s/it] 45%|█████████████████████████████████████▋                                             | 9134/20117 [5:45:58<6:54:29,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9135/20117 [5:46:00<6:56:57,  2.28s/it] 45%|█████████████████████████████████████▋                                             | 9136/20117 [5:46:02<6:57:01,  2.28s/it] 45%|█████████████████████████████████████▋                                             | 9137/20117 [5:46:04<6:52:52,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9138/20117 [5:46:07<6:52:33,  2.25s/it] 45%|█████████████████████████████████████▋                                             | 9139/20117 [5:46:09<6:53:07,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9140/20117 [5:46:11<6:53:06,  2.26s/it]                                                                                                                                 {'loss': 0.2213, 'grad_norm': 0.4069642722606659, 'learning_rate': 0.00011515729034489133, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.15, 'epoch': 0.91}
 45%|█████████████████████████████████████▋                                             | 9140/20117 [5:46:11<6:53:06,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9141/20117 [5:46:14<6:57:33,  2.28s/it] 45%|█████████████████████████████████████▋                                             | 9142/20117 [5:46:16<6:56:40,  2.28s/it] 45%|█████████████████████████████████████▋                                             | 9143/20117 [5:46:18<6:55:07,  2.27s/it] 45%|█████████████████████████████████████▋                                             | 9144/20117 [5:46:20<6:54:56,  2.27s/it] 45%|█████████████████████████████████████▋                                             | 9145/20117 [5:46:23<6:58:22,  2.29s/it] 45%|█████████████████████████████████████▋                                             | 9146/20117 [5:46:25<6:56:17,  2.28s/it] 45%|█████████████████████████████████████▋                                             | 9147/20117 [5:46:27<6:56:05,  2.28s/it] 45%|█████████████████████████████████████▋                                             | 9148/20117 [5:46:29<6:53:38,  2.26s/it] 45%|█████████████████████████████████████▋                                             | 9149/20117 [5:46:32<6:51:57,  2.25s/it] 45%|█████████████████████████████████████▊                                             | 9150/20117 [5:46:34<6:54:12,  2.27s/it]                                                                                                                                 {'loss': 0.2258, 'grad_norm': 0.5103911757469177, 'learning_rate': 0.00011500213885640705, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.85, 'epoch': 0.91}
 45%|█████████████████████████████████████▊                                             | 9150/20117 [5:46:34<6:54:12,  2.27s/it] 45%|█████████████████████████████████████▊                                             | 9151/20117 [5:46:36<6:51:18,  2.25s/it] 45%|█████████████████████████████████████▊                                             | 9152/20117 [5:46:39<6:55:57,  2.28s/it] 45%|█████████████████████████████████████▊                                             | 9153/20117 [5:46:41<6:53:22,  2.26s/it] 46%|█████████████████████████████████████▊                                             | 9154/20117 [5:46:43<6:56:03,  2.28s/it] 46%|█████████████████████████████████████▊                                             | 9155/20117 [5:46:45<6:55:43,  2.28s/it] 46%|█████████████████████████████████████▊                                             | 9156/20117 [5:46:48<6:55:36,  2.28s/it] 46%|█████████████████████████████████████▊                                             | 9157/20117 [5:46:50<6:52:03,  2.26s/it] 46%|█████████████████████████████████████▊                                             | 9158/20117 [5:46:52<7:09:50,  2.35s/it] 46%|█████████████████████████████████████▊                                             | 9159/20117 [5:46:55<7:10:33,  2.36s/it] 46%|█████████████████████████████████████▊                                             | 9160/20117 [5:46:57<7:04:31,  2.32s/it]                                                                                                                                 {'loss': 0.2709, 'grad_norm': 0.5270497798919678, 'learning_rate': 0.00011484695041448399, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.69, 'epoch': 0.91}
 46%|█████████████████████████████████████▊                                             | 9160/20117 [5:46:57<7:04:31,  2.32s/it] 46%|█████████████████████████████████████▊                                             | 9161/20117 [5:46:59<7:02:59,  2.32s/it] 46%|█████████████████████████████████████▊                                             | 9162/20117 [5:47:02<7:05:34,  2.33s/it] 46%|█████████████████████████████████████▊                                             | 9163/20117 [5:47:04<7:04:09,  2.32s/it] 46%|█████████████████████████████████████▊                                             | 9164/20117 [5:47:06<7:01:55,  2.31s/it] 46%|█████████████████████████████████████▊                                             | 9165/20117 [5:47:09<6:57:39,  2.29s/it] 46%|█████████████████████████████████████▊                                             | 9166/20117 [5:47:11<7:00:45,  2.31s/it] 46%|█████████████████████████████████████▊                                             | 9167/20117 [5:47:13<6:56:47,  2.28s/it] 46%|█████████████████████████████████████▊                                             | 9168/20117 [5:47:15<6:53:20,  2.27s/it] 46%|█████████████████████████████████████▊                                             | 9169/20117 [5:47:18<7:00:45,  2.31s/it] 46%|█████████████████████████████████████▊                                             | 9170/20117 [5:47:20<6:55:08,  2.28s/it]                                                                                                                                 {'loss': 0.1935, 'grad_norm': 0.48765021562576294, 'learning_rate': 0.00011469172540138407, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.09, 'epoch': 0.91}
 46%|█████████████████████████████████████▊                                             | 9170/20117 [5:47:20<6:55:08,  2.28s/it] 46%|█████████████████████████████████████▊                                             | 9171/20117 [5:47:22<6:51:28,  2.26s/it] 46%|█████████████████████████████████████▊                                             | 9172/20117 [5:47:24<6:53:31,  2.27s/it] 46%|█████████████████████████████████████▊                                             | 9173/20117 [5:47:27<6:53:29,  2.27s/it] 46%|█████████████████████████████████████▊                                             | 9174/20117 [5:47:29<6:52:47,  2.26s/it] 46%|█████████████████████████████████████▊                                             | 9175/20117 [5:47:31<6:55:41,  2.28s/it] 46%|█████████████████████████████████████▊                                             | 9176/20117 [5:47:34<6:58:12,  2.29s/it] 46%|█████████████████████████████████████▊                                             | 9177/20117 [5:47:36<6:58:45,  2.30s/it] 46%|█████████████████████████████████████▊                                             | 9178/20117 [5:47:38<6:57:30,  2.29s/it] 46%|█████████████████████████████████████▊                                             | 9179/20117 [5:47:40<6:58:36,  2.30s/it] 46%|█████████████████████████████████████▉                                             | 9180/20117 [5:47:43<6:57:11,  2.29s/it]                                                                                                                                 {'loss': 0.2296, 'grad_norm': 0.4249947667121887, 'learning_rate': 0.00011453646419945934, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 412.86, 'epoch': 0.91}
 46%|█████████████████████████████████████▉                                             | 9180/20117 [5:47:43<6:57:11,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9181/20117 [5:47:45<6:56:52,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9182/20117 [5:47:47<6:56:14,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9183/20117 [5:47:50<6:55:02,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9184/20117 [5:47:52<6:53:21,  2.27s/it] 46%|█████████████████████████████████████▉                                             | 9185/20117 [5:47:54<6:57:12,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9186/20117 [5:47:56<6:56:44,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9187/20117 [5:47:59<6:55:55,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9188/20117 [5:48:01<6:52:41,  2.27s/it] 46%|█████████████████████████████████████▉                                             | 9189/20117 [5:48:03<6:54:28,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9190/20117 [5:48:06<6:54:31,  2.28s/it]                                                                                                                                 {'loss': 0.2361, 'grad_norm': 0.36251431703567505, 'learning_rate': 0.00011438116719115089, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.29, 'epoch': 0.91}
 46%|█████████████████████████████████████▉                                             | 9190/20117 [5:48:06<6:54:31,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9191/20117 [5:48:08<6:52:29,  2.27s/it] 46%|█████████████████████████████████████▉                                             | 9192/20117 [5:48:10<6:55:23,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9193/20117 [5:48:12<6:53:52,  2.27s/it] 46%|█████████████████████████████████████▉                                             | 9194/20117 [5:48:15<6:56:56,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9195/20117 [5:48:17<6:58:06,  2.30s/it] 46%|█████████████████████████████████████▉                                             | 9196/20117 [5:48:19<6:56:25,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9197/20117 [5:48:22<6:54:26,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9198/20117 [5:48:24<6:54:50,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9199/20117 [5:48:26<6:56:44,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9200/20117 [5:48:28<6:55:00,  2.28s/it]                                                                                                                                 {'loss': 0.2446, 'grad_norm': 0.3858913481235504, 'learning_rate': 0.00011422583475898814, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.61, 'epoch': 0.91}
 46%|█████████████████████████████████████▉                                             | 9200/20117 [5:48:28<6:55:00,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9201/20117 [5:48:31<6:56:21,  2.29s/it] 46%|█████████████████████████████████████▉                                             | 9202/20117 [5:48:33<6:55:18,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9203/20117 [5:48:35<6:54:44,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9204/20117 [5:48:38<6:55:32,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9205/20117 [5:48:40<6:55:15,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9206/20117 [5:48:42<6:52:32,  2.27s/it] 46%|█████████████████████████████████████▉                                             | 9207/20117 [5:48:44<6:52:39,  2.27s/it] 46%|█████████████████████████████████████▉                                             | 9208/20117 [5:48:47<6:55:25,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9209/20117 [5:48:49<6:53:48,  2.28s/it] 46%|█████████████████████████████████████▉                                             | 9210/20117 [5:48:51<6:56:34,  2.29s/it]                                                                                                                                 {'loss': 0.1683, 'grad_norm': 0.43422338366508484, 'learning_rate': 0.00011407046728558768, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.63, 'epoch': 0.92}
 46%|█████████████████████████████████████▉                                             | 9210/20117 [5:48:51<6:56:34,  2.29s/it] 46%|██████████████████████████████████████                                             | 9211/20117 [5:48:54<7:13:24,  2.38s/it] 46%|██████████████████████████████████████                                             | 9212/20117 [5:48:56<7:10:50,  2.37s/it] 46%|██████████████████████████████████████                                             | 9213/20117 [5:48:58<7:08:06,  2.36s/it] 46%|██████████████████████████████████████                                             | 9214/20117 [5:49:01<7:09:53,  2.37s/it] 46%|██████████████████████████████████████                                             | 9215/20117 [5:49:03<7:06:38,  2.35s/it] 46%|██████████████████████████████████████                                             | 9216/20117 [5:49:05<7:02:41,  2.33s/it] 46%|██████████████████████████████████████                                             | 9217/20117 [5:49:08<6:58:21,  2.30s/it] 46%|██████████████████████████████████████                                             | 9218/20117 [5:49:10<6:59:16,  2.31s/it] 46%|██████████████████████████████████████                                             | 9219/20117 [5:49:12<6:54:47,  2.28s/it] 46%|██████████████████████████████████████                                             | 9220/20117 [5:49:14<6:45:27,  2.23s/it]                                                                                                                                 {'loss': 0.1423, 'grad_norm': 0.43746358156204224, 'learning_rate': 0.00011391506515365245, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.04, 'epoch': 0.92}
 46%|██████████████████████████████████████                                             | 9220/20117 [5:49:14<6:45:27,  2.23s/it] 46%|██████████████████████████████████████                                             | 9221/20117 [5:49:17<6:41:53,  2.21s/it] 46%|██████████████████████████████████████                                             | 9222/20117 [5:49:19<6:42:24,  2.22s/it] 46%|██████████████████████████████████████                                             | 9223/20117 [5:49:21<6:46:18,  2.24s/it] 46%|██████████████████████████████████████                                             | 9224/20117 [5:49:23<6:42:36,  2.22s/it] 46%|██████████████████████████████████████                                             | 9225/20117 [5:49:25<6:44:09,  2.23s/it] 46%|██████████████████████████████████████                                             | 9226/20117 [5:49:28<6:47:17,  2.24s/it] 46%|██████████████████████████████████████                                             | 9227/20117 [5:49:30<6:58:06,  2.30s/it] 46%|██████████████████████████████████████                                             | 9228/20117 [5:49:33<7:04:44,  2.34s/it] 46%|██████████████████████████████████████                                             | 9229/20117 [5:49:35<7:04:22,  2.34s/it] 46%|██████████████████████████████████████                                             | 9230/20117 [5:49:37<7:06:23,  2.35s/it]                                                                                                                                 {'loss': 0.179, 'grad_norm': 0.30726879835128784, 'learning_rate': 0.00011375962874597073, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.89, 'epoch': 0.92}
 46%|██████████████████████████████████████                                             | 9230/20117 [5:49:37<7:06:23,  2.35s/it] 46%|██████████████████████████████████████                                             | 9231/20117 [5:49:40<7:03:12,  2.33s/it] 46%|██████████████████████████████████████                                             | 9232/20117 [5:49:42<7:04:37,  2.34s/it] 46%|██████████████████████████████████████                                             | 9233/20117 [5:49:44<7:04:28,  2.34s/it] 46%|██████████████████████████████████████                                             | 9234/20117 [5:49:46<6:55:55,  2.29s/it] 46%|██████████████████████████████████████                                             | 9235/20117 [5:49:49<6:49:09,  2.26s/it] 46%|██████████████████████████████████████                                             | 9236/20117 [5:49:51<6:43:35,  2.23s/it] 46%|██████████████████████████████████████                                             | 9237/20117 [5:49:53<6:42:15,  2.22s/it] 46%|██████████████████████████████████████                                             | 9238/20117 [5:49:55<6:48:23,  2.25s/it] 46%|██████████████████████████████████████                                             | 9239/20117 [5:49:58<6:59:17,  2.31s/it] 46%|██████████████████████████████████████                                             | 9240/20117 [5:50:00<7:06:18,  2.35s/it]                                                                                                                                 {'loss': 0.2571, 'grad_norm': 0.5088397264480591, 'learning_rate': 0.00011360415844541523, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.77, 'epoch': 0.92}
 46%|██████████████████████████████████████                                             | 9240/20117 [5:50:00<7:06:18,  2.35s/it] 46%|██████████████████████████████████████▏                                            | 9241/20117 [5:50:02<7:01:31,  2.33s/it] 46%|██████████████████████████████████████▏                                            | 9242/20117 [5:50:05<6:57:13,  2.30s/it] 46%|██████████████████████████████████████▏                                            | 9243/20117 [5:50:07<6:54:23,  2.29s/it] 46%|██████████████████████████████████████▏                                            | 9244/20117 [5:50:09<6:58:39,  2.31s/it] 46%|██████████████████████████████████████▏                                            | 9245/20117 [5:50:12<6:57:15,  2.30s/it] 46%|██████████████████████████████████████▏                                            | 9246/20117 [5:50:14<6:54:41,  2.29s/it] 46%|██████████████████████████████████████▏                                            | 9247/20117 [5:50:16<6:50:49,  2.27s/it] 46%|██████████████████████████████████████▏                                            | 9248/20117 [5:50:18<6:50:37,  2.27s/it] 46%|██████████████████████████████████████▏                                            | 9249/20117 [5:50:21<6:59:35,  2.32s/it] 46%|██████████████████████████████████████▏                                            | 9250/20117 [5:50:23<6:56:29,  2.30s/it]                                                                                                                                 {'loss': 0.2228, 'grad_norm': 0.545219898223877, 'learning_rate': 0.00011344865463494219, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.28, 'epoch': 0.92}
 46%|██████████████████████████████████████▏                                            | 9250/20117 [5:50:23<6:56:29,  2.30s/it] 46%|██████████████████████████████████████▏                                            | 9251/20117 [5:50:25<6:58:54,  2.31s/it] 46%|██████████████████████████████████████▏                                            | 9252/20117 [5:50:28<6:55:47,  2.30s/it] 46%|██████████████████████████████████████▏                                            | 9253/20117 [5:50:30<6:54:36,  2.29s/it] 46%|██████████████████████████████████████▏                                            | 9254/20117 [5:50:32<6:53:42,  2.29s/it] 46%|██████████████████████████████████████▏                                            | 9255/20117 [5:50:35<6:53:44,  2.29s/it] 46%|██████████████████████████████████████▏                                            | 9256/20117 [5:50:37<6:48:07,  2.25s/it] 46%|██████████████████████████████████████▏                                            | 9257/20117 [5:50:39<6:48:46,  2.26s/it] 46%|██████████████████████████████████████▏                                            | 9258/20117 [5:50:41<6:45:22,  2.24s/it] 46%|██████████████████████████████████████▏                                            | 9259/20117 [5:50:43<6:48:33,  2.26s/it] 46%|██████████████████████████████████████▏                                            | 9260/20117 [5:50:46<6:48:26,  2.26s/it]                                                                                                                                 {'loss': 0.2236, 'grad_norm': 0.5700411796569824, 'learning_rate': 0.00011329311769759035, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.23, 'epoch': 0.92}
 46%|██████████████████████████████████████▏                                            | 9260/20117 [5:50:46<6:48:26,  2.26s/it] 46%|██████████████████████████████████████▏                                            | 9261/20117 [5:50:48<6:47:27,  2.25s/it] 46%|██████████████████████████████████████▏                                            | 9262/20117 [5:50:50<6:45:18,  2.24s/it] 46%|██████████████████████████████████████▏                                            | 9263/20117 [5:50:52<6:45:45,  2.24s/it] 46%|██████████████████████████████████████▏                                            | 9264/20117 [5:50:55<7:02:42,  2.34s/it] 46%|██████████████████████████████████████▏                                            | 9265/20117 [5:50:57<7:01:19,  2.33s/it] 46%|██████████████████████████████████████▏                                            | 9266/20117 [5:51:00<7:02:49,  2.34s/it] 46%|██████████████████████████████████████▏                                            | 9267/20117 [5:51:02<7:02:14,  2.33s/it] 46%|██████████████████████████████████████▏                                            | 9268/20117 [5:51:04<6:59:29,  2.32s/it] 46%|██████████████████████████████████████▏                                            | 9269/20117 [5:51:07<6:56:40,  2.30s/it] 46%|██████████████████████████████████████▏                                            | 9270/20117 [5:51:09<6:57:08,  2.31s/it]                                                                                                                                 {'loss': 0.2487, 'grad_norm': 0.3975035548210144, 'learning_rate': 0.00011313754801648003, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.97, 'epoch': 0.92}
 46%|██████████████████████████████████████▏                                            | 9270/20117 [5:51:09<6:57:08,  2.31s/it] 46%|██████████████████████████████████████▎                                            | 9271/20117 [5:51:11<6:56:41,  2.31s/it] 46%|██████████████████████████████████████▎                                            | 9272/20117 [5:51:13<6:54:45,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9273/20117 [5:51:16<6:53:45,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9274/20117 [5:51:18<6:50:44,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9275/20117 [5:51:20<6:54:05,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9276/20117 [5:51:23<6:54:06,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9277/20117 [5:51:25<6:51:32,  2.28s/it] 46%|██████████████████████████████████████▎                                            | 9278/20117 [5:51:27<6:49:51,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9279/20117 [5:51:29<6:49:12,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9280/20117 [5:51:32<6:50:25,  2.27s/it]                                                                                                                                 {'loss': 0.2511, 'grad_norm': 0.5616829991340637, 'learning_rate': 0.00011298194597481226, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.41, 'epoch': 0.92}
 46%|██████████████████████████████████████▎                                            | 9280/20117 [5:51:32<6:50:25,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9281/20117 [5:51:34<6:49:58,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9282/20117 [5:51:36<6:50:27,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9283/20117 [5:51:38<6:50:11,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9284/20117 [5:51:41<6:51:28,  2.28s/it] 46%|██████████████████████████████████████▎                                            | 9285/20117 [5:51:43<6:52:00,  2.28s/it] 46%|██████████████████████████████████████▎                                            | 9286/20117 [5:51:45<6:53:38,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9287/20117 [5:51:48<6:54:09,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9288/20117 [5:51:50<6:51:41,  2.28s/it] 46%|██████████████████████████████████████▎                                            | 9289/20117 [5:51:52<6:52:43,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9290/20117 [5:51:54<6:54:02,  2.29s/it]                                                                                                                                 {'loss': 0.2809, 'grad_norm': 0.38982534408569336, 'learning_rate': 0.00011282631195586777, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.76, 'epoch': 0.92}
 46%|██████████████████████████████████████▎                                            | 9290/20117 [5:51:54<6:54:02,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9291/20117 [5:51:57<6:51:52,  2.28s/it] 46%|██████████████████████████████████████▎                                            | 9292/20117 [5:51:59<6:50:26,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9293/20117 [5:52:01<6:49:15,  2.27s/it] 46%|██████████████████████████████████████▎                                            | 9294/20117 [5:52:04<6:50:41,  2.28s/it] 46%|██████████████████████████████████████▎                                            | 9295/20117 [5:52:06<6:53:27,  2.29s/it] 46%|██████████████████████████████████████▎                                            | 9296/20117 [5:52:08<6:55:12,  2.30s/it] 46%|██████████████████████████████████████▎                                            | 9297/20117 [5:52:10<6:56:27,  2.31s/it] 46%|██████████████████████████████████████▎                                            | 9298/20117 [5:52:13<6:57:19,  2.31s/it] 46%|██████████████████████████████████████▎                                            | 9299/20117 [5:52:15<6:54:27,  2.30s/it] 46%|██████████████████████████████████████▎                                            | 9300/20117 [5:52:17<6:54:13,  2.30s/it]                                                                                                                                 {'loss': 0.2608, 'grad_norm': 0.531318187713623, 'learning_rate': 0.00011267064634300603, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 393.15, 'epoch': 0.92}
 46%|██████████████████████████████████████▎                                            | 9300/20117 [5:52:17<6:54:13,  2.30s/it] 46%|██████████████████████████████████████▎                                            | 9301/20117 [5:52:20<6:52:57,  2.29s/it] 46%|██████████████████████████████████████▍                                            | 9302/20117 [5:52:22<6:48:55,  2.27s/it] 46%|██████████████████████████████████████▍                                            | 9303/20117 [5:52:24<6:51:47,  2.28s/it] 46%|██████████████████████████████████████▍                                            | 9304/20117 [5:52:27<6:53:06,  2.29s/it] 46%|██████████████████████████████████████▍                                            | 9305/20117 [5:52:29<6:49:49,  2.27s/it] 46%|██████████████████████████████████████▍                                            | 9306/20117 [5:52:31<6:52:04,  2.29s/it] 46%|██████████████████████████████████████▍                                            | 9307/20117 [5:52:33<6:48:38,  2.27s/it] 46%|██████████████████████████████████████▍                                            | 9308/20117 [5:52:35<6:44:07,  2.24s/it] 46%|██████████████████████████████████████▍                                            | 9309/20117 [5:52:38<6:46:35,  2.26s/it] 46%|██████████████████████████████████████▍                                            | 9310/20117 [5:52:40<6:46:13,  2.26s/it]                                                                                                                                 {'loss': 0.2229, 'grad_norm': 0.5956375002861023, 'learning_rate': 0.00011251494951966437, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 284.92, 'epoch': 0.93}
 46%|██████████████████████████████████████▍                                            | 9310/20117 [5:52:40<6:46:13,  2.26s/it] 46%|██████████████████████████████████████▍                                            | 9311/20117 [5:52:42<6:44:38,  2.25s/it] 46%|██████████████████████████████████████▍                                            | 9312/20117 [5:52:44<6:45:22,  2.25s/it] 46%|██████████████████████████████████████▍                                            | 9313/20117 [5:52:47<6:45:11,  2.25s/it] 46%|██████████████████████████████████████▍                                            | 9314/20117 [5:52:49<6:45:36,  2.25s/it] 46%|██████████████████████████████████████▍                                            | 9315/20117 [5:52:51<6:43:34,  2.24s/it] 46%|██████████████████████████████████████▍                                            | 9316/20117 [5:52:54<7:00:04,  2.33s/it] 46%|██████████████████████████████████████▍                                            | 9317/20117 [5:52:56<6:55:41,  2.31s/it] 46%|██████████████████████████████████████▍                                            | 9318/20117 [5:52:58<6:54:54,  2.31s/it] 46%|██████████████████████████████████████▍                                            | 9319/20117 [5:53:01<6:53:24,  2.30s/it] 46%|██████████████████████████████████████▍                                            | 9320/20117 [5:53:03<6:52:37,  2.29s/it]                                                                                                                                 {'loss': 0.1985, 'grad_norm': 0.42666998505592346, 'learning_rate': 0.0001123592218693569, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.3, 'epoch': 0.93}
 46%|██████████████████████████████████████▍                                            | 9320/20117 [5:53:03<6:52:37,  2.29s/it] 46%|██████████████████████████████████████▍                                            | 9321/20117 [5:53:05<6:54:18,  2.30s/it] 46%|██████████████████████████████████████▍                                            | 9322/20117 [5:53:07<6:50:25,  2.28s/it] 46%|██████████████████████████████████████▍                                            | 9323/20117 [5:53:10<6:48:38,  2.27s/it] 46%|██████████████████████████████████████▍                                            | 9324/20117 [5:53:12<6:48:14,  2.27s/it] 46%|██████████████████████████████████████▍                                            | 9325/20117 [5:53:14<6:51:34,  2.29s/it] 46%|██████████████████████████████████████▍                                            | 9326/20117 [5:53:17<6:50:52,  2.28s/it] 46%|██████████████████████████████████████▍                                            | 9327/20117 [5:53:19<6:49:05,  2.27s/it] 46%|██████████████████████████████████████▍                                            | 9328/20117 [5:53:21<6:49:34,  2.28s/it] 46%|██████████████████████████████████████▍                                            | 9329/20117 [5:53:23<6:49:44,  2.28s/it] 46%|██████████████████████████████████████▍                                            | 9330/20117 [5:53:26<6:49:42,  2.28s/it]                                                                                                                                 {'loss': 0.2535, 'grad_norm': 0.4182765781879425, 'learning_rate': 0.00011220346377567381, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.83, 'epoch': 0.93}
 46%|██████████████████████████████████████▍                                            | 9330/20117 [5:53:26<6:49:42,  2.28s/it] 46%|██████████████████████████████████████▍                                            | 9331/20117 [5:53:28<6:53:57,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9332/20117 [5:53:30<6:53:49,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9333/20117 [5:53:33<6:53:11,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9334/20117 [5:53:35<6:54:56,  2.31s/it] 46%|██████████████████████████████████████▌                                            | 9335/20117 [5:53:37<6:51:43,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9336/20117 [5:53:40<6:53:32,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9337/20117 [5:53:42<6:49:57,  2.28s/it] 46%|██████████████████████████████████████▌                                            | 9338/20117 [5:53:44<6:51:00,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9339/20117 [5:53:46<6:51:51,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9340/20117 [5:53:49<6:52:39,  2.30s/it]                                                                                                                                 {'loss': 0.2309, 'grad_norm': 0.5867879390716553, 'learning_rate': 0.00011204767562228017, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.46, 'epoch': 0.93}
 46%|██████████████████████████████████████▌                                            | 9340/20117 [5:53:49<6:52:39,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9341/20117 [5:53:51<6:52:29,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9342/20117 [5:53:53<6:50:01,  2.28s/it] 46%|██████████████████████████████████████▌                                            | 9343/20117 [5:53:55<6:50:22,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9344/20117 [5:53:58<6:51:34,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9345/20117 [5:54:00<6:48:43,  2.28s/it] 46%|██████████████████████████████████████▌                                            | 9346/20117 [5:54:02<6:50:14,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9347/20117 [5:54:05<6:50:56,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9348/20117 [5:54:07<6:51:38,  2.29s/it] 46%|██████████████████████████████████████▌                                            | 9349/20117 [5:54:09<6:55:09,  2.31s/it] 46%|██████████████████████████████████████▌                                            | 9350/20117 [5:54:12<6:52:59,  2.30s/it]                                                                                                                                 {'loss': 0.232, 'grad_norm': 0.27041056752204895, 'learning_rate': 0.00011189185779291515, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.56, 'epoch': 0.93}
 46%|██████████████████████████████████████▌                                            | 9350/20117 [5:54:12<6:52:59,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9351/20117 [5:54:14<6:53:26,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9352/20117 [5:54:16<6:53:36,  2.31s/it] 46%|██████████████████████████████████████▌                                            | 9353/20117 [5:54:18<6:51:44,  2.30s/it] 46%|██████████████████████████████████████▌                                            | 9354/20117 [5:54:21<6:53:26,  2.30s/it] 47%|██████████████████████████████████████▌                                            | 9355/20117 [5:54:23<6:55:31,  2.32s/it] 47%|██████████████████████████████████████▌                                            | 9356/20117 [5:54:25<6:55:19,  2.32s/it] 47%|██████████████████████████████████████▌                                            | 9357/20117 [5:54:28<6:52:47,  2.30s/it] 47%|██████████████████████████████████████▌                                            | 9358/20117 [5:54:30<6:55:17,  2.32s/it] 47%|██████████████████████████████████████▌                                            | 9359/20117 [5:54:32<6:54:14,  2.31s/it] 47%|██████████████████████████████████████▌                                            | 9360/20117 [5:54:35<6:59:33,  2.34s/it]                                                                                                                                 {'loss': 0.2399, 'grad_norm': 0.5081501603126526, 'learning_rate': 0.00011173601067139099, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.84, 'epoch': 0.93}
 47%|██████████████████████████████████████▌                                            | 9360/20117 [5:54:35<6:59:33,  2.34s/it] 47%|██████████████████████████████████████▌                                            | 9361/20117 [5:54:37<6:59:07,  2.34s/it] 47%|██████████████████████████████████████▋                                            | 9362/20117 [5:54:39<6:55:53,  2.32s/it] 47%|██████████████████████████████████████▋                                            | 9363/20117 [5:54:42<6:58:07,  2.33s/it] 47%|██████████████████████████████████████▋                                            | 9364/20117 [5:54:44<6:57:44,  2.33s/it] 47%|██████████████████████████████████████▋                                            | 9365/20117 [5:54:46<6:56:58,  2.33s/it] 47%|██████████████████████████████████████▋                                            | 9366/20117 [5:54:49<6:52:50,  2.30s/it] 47%|██████████████████████████████████████▋                                            | 9367/20117 [5:54:51<6:54:40,  2.31s/it] 47%|██████████████████████████████████████▋                                            | 9368/20117 [5:54:54<7:10:01,  2.40s/it] 47%|██████████████████████████████████████▋                                            | 9369/20117 [5:54:56<7:06:17,  2.38s/it] 47%|██████████████████████████████████████▋                                            | 9370/20117 [5:54:58<7:06:08,  2.38s/it]                                                                                                                                 {'loss': 0.2606, 'grad_norm': 0.7966908812522888, 'learning_rate': 0.00011158013464159208, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.66, 'epoch': 0.93}
 47%|██████████████████████████████████████▋                                            | 9370/20117 [5:54:58<7:06:08,  2.38s/it] 47%|██████████████████████████████████████▋                                            | 9371/20117 [5:55:01<7:03:05,  2.36s/it] 47%|██████████████████████████████████████▋                                            | 9372/20117 [5:55:03<6:59:22,  2.34s/it] 47%|██████████████████████████████████████▋                                            | 9373/20117 [5:55:05<6:56:40,  2.33s/it] 47%|██████████████████████████████████████▋                                            | 9374/20117 [5:55:07<6:54:12,  2.31s/it] 47%|██████████████████████████████████████▋                                            | 9375/20117 [5:55:10<6:53:24,  2.31s/it] 47%|██████████████████████████████████████▋                                            | 9376/20117 [5:55:12<6:54:39,  2.32s/it] 47%|██████████████████████████████████████▋                                            | 9377/20117 [5:55:14<6:51:29,  2.30s/it] 47%|██████████████████████████████████████▋                                            | 9378/20117 [5:55:17<6:49:32,  2.29s/it] 47%|██████████████████████████████████████▋                                            | 9379/20117 [5:55:19<6:52:43,  2.31s/it] 47%|██████████████████████████████████████▋                                            | 9380/20117 [5:55:21<6:50:07,  2.29s/it]                                                                                                                                 {'loss': 0.1581, 'grad_norm': 0.45302364230155945, 'learning_rate': 0.00011142423008747403, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.72, 'epoch': 0.93}
 47%|██████████████████████████████████████▋                                            | 9380/20117 [5:55:21<6:50:07,  2.29s/it] 47%|██████████████████████████████████████▋                                            | 9381/20117 [5:55:24<6:56:27,  2.33s/it] 47%|██████████████████████████████████████▋                                            | 9382/20117 [5:55:26<6:52:43,  2.31s/it] 47%|██████████████████████████████████████▋                                            | 9383/20117 [5:55:28<6:45:49,  2.27s/it] 47%|██████████████████████████████████████▋                                            | 9384/20117 [5:55:30<6:46:55,  2.27s/it] 47%|██████████████████████████████████████▋                                            | 9385/20117 [5:55:33<6:47:16,  2.28s/it] 47%|██████████████████████████████████████▋                                            | 9386/20117 [5:55:35<6:46:02,  2.27s/it] 47%|██████████████████████████████████████▋                                            | 9387/20117 [5:55:37<6:46:29,  2.27s/it] 47%|██████████████████████████████████████▋                                            | 9388/20117 [5:55:39<6:43:41,  2.26s/it] 47%|██████████████████████████████████████▋                                            | 9389/20117 [5:55:42<6:42:15,  2.25s/it] 47%|██████████████████████████████████████▋                                            | 9390/20117 [5:55:44<6:45:00,  2.27s/it]                                                                                                                                 {'loss': 0.2115, 'grad_norm': 0.5959784984588623, 'learning_rate': 0.00011126829739306271, 'memory/max_active (GiB)': 19.1, 'memory/max_allocated (GiB)': 19.1, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.85, 'epoch': 0.93}
 47%|██████████████████████████████████████▋                                            | 9390/20117 [5:55:44<6:45:00,  2.27s/it] 47%|██████████████████████████████████████▋                                            | 9391/20117 [5:55:46<6:41:25,  2.25s/it] 47%|██████████████████████████████████████▊                                            | 9392/20117 [5:55:48<6:42:46,  2.25s/it] 47%|██████████████████████████████████████▊                                            | 9393/20117 [5:55:51<6:47:35,  2.28s/it] 47%|██████████████████████████████████████▊                                            | 9394/20117 [5:55:53<6:44:27,  2.26s/it] 47%|██████████████████████████████████████▊                                            | 9395/20117 [5:55:55<6:44:19,  2.26s/it] 47%|██████████████████████████████████████▊                                            | 9396/20117 [5:55:57<6:41:20,  2.25s/it] 47%|██████████████████████████████████████▊                                            | 9397/20117 [5:56:00<6:43:05,  2.26s/it] 47%|██████████████████████████████████████▊                                            | 9398/20117 [5:56:02<6:44:56,  2.27s/it] 47%|██████████████████████████████████████▊                                            | 9399/20117 [5:56:04<6:45:45,  2.27s/it] 47%|██████████████████████████████████████▊                                            | 9400/20117 [5:56:07<6:44:05,  2.26s/it]                                                                                                                                 {'loss': 0.1854, 'grad_norm': 0.5530646443367004, 'learning_rate': 0.00011111233694245328, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.73, 'epoch': 0.93}
 47%|██████████████████████████████████████▊                                            | 9400/20117 [5:56:07<6:44:05,  2.26s/it] 47%|██████████████████████████████████████▊                                            | 9401/20117 [5:56:09<6:46:09,  2.27s/it] 47%|██████████████████████████████████████▊                                            | 9402/20117 [5:56:11<6:44:37,  2.27s/it] 47%|██████████████████████████████████████▊                                            | 9403/20117 [5:56:13<6:44:47,  2.27s/it] 47%|██████████████████████████████████████▊                                            | 9404/20117 [5:56:16<6:46:34,  2.28s/it] 47%|██████████████████████████████████████▊                                            | 9405/20117 [5:56:18<6:44:38,  2.27s/it] 47%|██████████████████████████████████████▊                                            | 9406/20117 [5:56:20<6:41:22,  2.25s/it] 47%|██████████████████████████████████████▊                                            | 9407/20117 [5:56:22<6:35:39,  2.22s/it] 47%|██████████████████████████████████████▊                                            | 9408/20117 [5:56:25<6:36:47,  2.22s/it] 47%|██████████████████████████████████████▊                                            | 9409/20117 [5:56:27<6:35:23,  2.22s/it] 47%|██████████████████████████████████████▊                                            | 9410/20117 [5:56:29<6:34:18,  2.21s/it]                                                                                                                                 {'loss': 0.2307, 'grad_norm': 0.4127480387687683, 'learning_rate': 0.00011095634911980933, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.27, 'epoch': 0.94}
 47%|██████████████████████████████████████▊                                            | 9410/20117 [5:56:29<6:34:18,  2.21s/it] 47%|██████████████████████████████████████▊                                            | 9411/20117 [5:56:31<6:35:21,  2.22s/it] 47%|██████████████████████████████████████▊                                            | 9412/20117 [5:56:33<6:43:23,  2.26s/it] 47%|██████████████████████████████████████▊                                            | 9413/20117 [5:56:36<6:49:30,  2.30s/it] 47%|██████████████████████████████████████▊                                            | 9414/20117 [5:56:38<6:49:17,  2.29s/it] 47%|██████████████████████████████████████▊                                            | 9415/20117 [5:56:40<6:45:56,  2.28s/it] 47%|██████████████████████████████████████▊                                            | 9416/20117 [5:56:43<6:47:00,  2.28s/it] 47%|██████████████████████████████████████▊                                            | 9417/20117 [5:56:45<6:46:13,  2.28s/it] 47%|██████████████████████████████████████▊                                            | 9418/20117 [5:56:47<6:46:06,  2.28s/it] 47%|██████████████████████████████████████▊                                            | 9419/20117 [5:56:50<7:02:23,  2.37s/it] 47%|██████████████████████████████████████▊                                            | 9420/20117 [5:56:52<6:51:59,  2.31s/it]                                                                                                                                 {'loss': 0.1953, 'grad_norm': 0.4038192629814148, 'learning_rate': 0.0001108003343093618, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.3, 'epoch': 0.94}
 47%|██████████████████████████████████████▊                                            | 9420/20117 [5:56:52<6:51:59,  2.31s/it] 47%|██████████████████████████████████████▊                                            | 9421/20117 [5:56:54<6:43:31,  2.26s/it] 47%|██████████████████████████████████████▊                                            | 9422/20117 [5:56:56<6:40:58,  2.25s/it] 47%|██████████████████████████████████████▉                                            | 9423/20117 [5:56:59<6:39:45,  2.24s/it] 47%|██████████████████████████████████████▉                                            | 9424/20117 [5:57:01<6:43:08,  2.26s/it] 47%|██████████████████████████████████████▉                                            | 9425/20117 [5:57:03<6:48:48,  2.29s/it] 47%|██████████████████████████████████████▉                                            | 9426/20117 [5:57:06<6:51:24,  2.31s/it] 47%|██████████████████████████████████████▉                                            | 9427/20117 [5:57:08<6:49:37,  2.30s/it] 47%|██████████████████████████████████████▉                                            | 9428/20117 [5:57:10<6:48:49,  2.29s/it] 47%|██████████████████████████████████████▉                                            | 9429/20117 [5:57:13<6:51:25,  2.31s/it] 47%|██████████████████████████████████████▉                                            | 9430/20117 [5:57:15<6:48:21,  2.29s/it]                                                                                                                                 {'loss': 0.2505, 'grad_norm': 0.4245215654373169, 'learning_rate': 0.00011064429289540821, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.4, 'epoch': 0.94}
 47%|██████████████████████████████████████▉                                            | 9430/20117 [5:57:15<6:48:21,  2.29s/it] 47%|██████████████████████████████████████▉                                            | 9431/20117 [5:57:17<6:47:10,  2.29s/it] 47%|██████████████████████████████████████▉                                            | 9432/20117 [5:57:19<6:43:50,  2.27s/it] 47%|██████████████████████████████████████▉                                            | 9433/20117 [5:57:21<6:41:49,  2.26s/it] 47%|██████████████████████████████████████▉                                            | 9434/20117 [5:57:24<6:40:20,  2.25s/it] 47%|██████████████████████████████████████▉                                            | 9435/20117 [5:57:26<6:36:24,  2.23s/it] 47%|██████████████████████████████████████▉                                            | 9436/20117 [5:57:28<6:38:33,  2.24s/it] 47%|██████████████████████████████████████▉                                            | 9437/20117 [5:57:30<6:39:47,  2.25s/it] 47%|██████████████████████████████████████▉                                            | 9438/20117 [5:57:33<6:37:18,  2.23s/it] 47%|██████████████████████████████████████▉                                            | 9439/20117 [5:57:35<6:38:07,  2.24s/it] 47%|██████████████████████████████████████▉                                            | 9440/20117 [5:57:37<6:38:39,  2.24s/it]                                                                                                                                 {'loss': 0.1584, 'grad_norm': 0.4768611192703247, 'learning_rate': 0.00011048822526231148, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.23, 'epoch': 0.94}
 47%|██████████████████████████████████████▉                                            | 9440/20117 [5:57:37<6:38:39,  2.24s/it] 47%|██████████████████████████████████████▉                                            | 9441/20117 [5:57:39<6:39:45,  2.25s/it] 47%|██████████████████████████████████████▉                                            | 9442/20117 [5:57:42<6:41:39,  2.26s/it] 47%|██████████████████████████████████████▉                                            | 9443/20117 [5:57:44<6:44:40,  2.27s/it] 47%|██████████████████████████████████████▉                                            | 9444/20117 [5:57:46<6:45:57,  2.28s/it] 47%|██████████████████████████████████████▉                                            | 9445/20117 [5:57:49<6:43:26,  2.27s/it] 47%|██████████████████████████████████████▉                                            | 9446/20117 [5:57:51<6:39:57,  2.25s/it] 47%|██████████████████████████████████████▉                                            | 9447/20117 [5:57:53<6:41:39,  2.26s/it] 47%|██████████████████████████████████████▉                                            | 9448/20117 [5:57:55<6:42:25,  2.26s/it] 47%|██████████████████████████████████████▉                                            | 9449/20117 [5:57:58<6:41:32,  2.26s/it] 47%|██████████████████████████████████████▉                                            | 9450/20117 [5:58:00<6:45:21,  2.28s/it]                                                                                                                                 {'loss': 0.2287, 'grad_norm': 0.3262840807437897, 'learning_rate': 0.00011033213179449917, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.18, 'epoch': 0.94}
 47%|██████████████████████████████████████▉                                            | 9450/20117 [5:58:00<6:45:21,  2.28s/it] 47%|██████████████████████████████████████▉                                            | 9451/20117 [5:58:02<6:42:59,  2.27s/it] 47%|██████████████████████████████████████▉                                            | 9452/20117 [5:58:04<6:49:14,  2.30s/it] 47%|███████████████████████████████████████                                            | 9453/20117 [5:58:07<6:49:58,  2.31s/it] 47%|███████████████████████████████████████                                            | 9454/20117 [5:58:09<6:49:02,  2.30s/it] 47%|███████████████████████████████████████                                            | 9455/20117 [5:58:11<6:49:45,  2.31s/it] 47%|███████████████████████████████████████                                            | 9456/20117 [5:58:14<6:49:58,  2.31s/it] 47%|███████████████████████████████████████                                            | 9457/20117 [5:58:16<6:43:13,  2.27s/it] 47%|███████████████████████████████████████                                            | 9458/20117 [5:58:18<6:44:33,  2.28s/it] 47%|███████████████████████████████████████                                            | 9459/20117 [5:58:20<6:43:20,  2.27s/it] 47%|███████████████████████████████████████                                            | 9460/20117 [5:58:23<6:37:48,  2.24s/it]                                                                                                                                 {'loss': 0.206, 'grad_norm': 0.5882810354232788, 'learning_rate': 0.00011017601287646251, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 405.83, 'epoch': 0.94}
 47%|███████████████████████████████████████                                            | 9460/20117 [5:58:23<6:37:48,  2.24s/it] 47%|███████████████████████████████████████                                            | 9461/20117 [5:58:25<6:36:26,  2.23s/it] 47%|███████████████████████████████████████                                            | 9462/20117 [5:58:27<6:41:58,  2.26s/it] 47%|███████████████████████████████████████                                            | 9463/20117 [5:58:29<6:40:30,  2.26s/it] 47%|███████████████████████████████████████                                            | 9464/20117 [5:58:32<6:39:07,  2.25s/it] 47%|███████████████████████████████████████                                            | 9465/20117 [5:58:34<6:35:12,  2.23s/it] 47%|███████████████████████████████████████                                            | 9466/20117 [5:58:36<6:35:26,  2.23s/it] 47%|███████████████████████████████████████                                            | 9467/20117 [5:58:38<6:37:12,  2.24s/it] 47%|███████████████████████████████████████                                            | 9468/20117 [5:58:41<6:35:50,  2.23s/it] 47%|███████████████████████████████████████                                            | 9469/20117 [5:58:43<6:32:44,  2.21s/it] 47%|███████████████████████████████████████                                            | 9470/20117 [5:58:45<6:33:03,  2.22s/it]                                                                                                                                 {'loss': 0.2209, 'grad_norm': 0.4905533492565155, 'learning_rate': 0.0001100198688927554, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.29, 'epoch': 0.94}
 47%|███████████████████████████████████████                                            | 9470/20117 [5:58:45<6:33:03,  2.22s/it] 47%|███████████████████████████████████████                                            | 9471/20117 [5:58:47<6:30:57,  2.20s/it] 47%|███████████████████████████████████████                                            | 9472/20117 [5:58:49<6:34:36,  2.22s/it] 47%|███████████████████████████████████████                                            | 9473/20117 [5:58:52<6:53:28,  2.33s/it] 47%|███████████████████████████████████████                                            | 9474/20117 [5:58:54<6:44:05,  2.28s/it] 47%|███████████████████████████████████████                                            | 9475/20117 [5:58:56<6:43:37,  2.28s/it] 47%|███████████████████████████████████████                                            | 9476/20117 [5:58:59<6:40:32,  2.26s/it] 47%|███████████████████████████████████████                                            | 9477/20117 [5:59:01<6:37:22,  2.24s/it] 47%|███████████████████████████████████████                                            | 9478/20117 [5:59:03<6:38:05,  2.25s/it] 47%|███████████████████████████████████████                                            | 9479/20117 [5:59:05<6:33:45,  2.22s/it] 47%|███████████████████████████████████████                                            | 9480/20117 [5:59:07<6:35:30,  2.23s/it]                                                                                                                                 {'loss': 0.2418, 'grad_norm': 0.5608656406402588, 'learning_rate': 0.00010986370022799346, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.39, 'epoch': 0.94}
 47%|███████████████████████████████████████                                            | 9480/20117 [5:59:07<6:35:30,  2.23s/it] 47%|███████████████████████████████████████                                            | 9481/20117 [5:59:10<6:40:50,  2.26s/it] 47%|███████████████████████████████████████                                            | 9482/20117 [5:59:12<6:42:18,  2.27s/it] 47%|███████████████████████████████████████▏                                           | 9483/20117 [5:59:15<6:53:03,  2.33s/it] 47%|███████████████████████████████████████▏                                           | 9484/20117 [5:59:17<6:51:05,  2.32s/it] 47%|███████████████████████████████████████▏                                           | 9485/20117 [5:59:19<6:47:22,  2.30s/it] 47%|███████████████████████████████████████▏                                           | 9486/20117 [5:59:21<6:44:53,  2.29s/it] 47%|███████████████████████████████████████▏                                           | 9487/20117 [5:59:24<6:42:01,  2.27s/it] 47%|███████████████████████████████████████▏                                           | 9488/20117 [5:59:26<6:40:30,  2.26s/it] 47%|███████████████████████████████████████▏                                           | 9489/20117 [5:59:28<6:41:31,  2.27s/it] 47%|███████████████████████████████████████▏                                           | 9490/20117 [5:59:30<6:38:06,  2.25s/it]                                                                                                                                 {'loss': 0.2742, 'grad_norm': 0.4696904718875885, 'learning_rate': 0.00010970750726685309, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 399.98, 'epoch': 0.94}
 47%|███████████████████████████████████████▏                                           | 9490/20117 [5:59:30<6:38:06,  2.25s/it] 47%|███████████████████████████████████████▏                                           | 9491/20117 [5:59:33<6:36:52,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9492/20117 [5:59:35<6:38:06,  2.25s/it] 47%|███████████████████████████████████████▏                                           | 9493/20117 [5:59:37<6:32:38,  2.22s/it] 47%|███████████████████████████████████████▏                                           | 9494/20117 [5:59:39<6:33:59,  2.23s/it] 47%|███████████████████████████████████████▏                                           | 9495/20117 [5:59:41<6:35:20,  2.23s/it] 47%|███████████████████████████████████████▏                                           | 9496/20117 [5:59:44<6:33:11,  2.22s/it] 47%|███████████████████████████████████████▏                                           | 9497/20117 [5:59:46<6:34:10,  2.23s/it] 47%|███████████████████████████████████████▏                                           | 9498/20117 [5:59:48<6:36:14,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9499/20117 [5:59:50<6:34:11,  2.23s/it] 47%|███████████████████████████████████████▏                                           | 9500/20117 [5:59:53<6:38:47,  2.25s/it]                                                                                                                                 {'loss': 0.2259, 'grad_norm': 0.423969030380249, 'learning_rate': 0.00010955129039407062, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.84, 'epoch': 0.94}
 47%|███████████████████████████████████████▏                                           | 9500/20117 [5:59:53<6:38:47,  2.25s/it] 47%|███████████████████████████████████████▏                                           | 9501/20117 [5:59:55<6:35:07,  2.23s/it] 47%|███████████████████████████████████████▏                                           | 9502/20117 [5:59:57<6:35:26,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9503/20117 [5:59:59<6:36:52,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9504/20117 [6:00:01<6:32:04,  2.22s/it] 47%|███████████████████████████████████████▏                                           | 9505/20117 [6:00:04<6:34:15,  2.23s/it] 47%|███████████████████████████████████████▏                                           | 9506/20117 [6:00:06<6:36:29,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9507/20117 [6:00:08<6:36:22,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9508/20117 [6:00:11<6:40:43,  2.27s/it] 47%|███████████████████████████████████████▏                                           | 9509/20117 [6:00:13<6:38:09,  2.25s/it] 47%|███████████████████████████████████████▏                                           | 9510/20117 [6:00:15<6:35:26,  2.24s/it]                                                                                                                                 {'loss': 0.1855, 'grad_norm': 0.438812792301178, 'learning_rate': 0.0001093950499944412, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.59, 'epoch': 0.95}
 47%|███████████████████████████████████████▏                                           | 9510/20117 [6:00:15<6:35:26,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9511/20117 [6:00:17<6:38:53,  2.26s/it] 47%|███████████████████████████████████████▏                                           | 9512/20117 [6:00:20<6:36:32,  2.24s/it] 47%|███████████████████████████████████████▏                                           | 9513/20117 [6:00:22<6:36:40,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9514/20117 [6:00:24<6:35:47,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9515/20117 [6:00:26<6:36:10,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9516/20117 [6:00:28<6:35:58,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9517/20117 [6:00:31<6:35:39,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9518/20117 [6:00:33<6:42:20,  2.28s/it] 47%|███████████████████████████████████████▎                                           | 9519/20117 [6:00:35<6:41:37,  2.27s/it] 47%|███████████████████████████████████████▎                                           | 9520/20117 [6:00:38<6:39:31,  2.26s/it]                                                                                                                                 {'loss': 0.2713, 'grad_norm': 0.5361111164093018, 'learning_rate': 0.00010923878645281794, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.0, 'epoch': 0.95}
 47%|███████████████████████████████████████▎                                           | 9520/20117 [6:00:38<6:39:31,  2.26s/it] 47%|███████████████████████████████████████▎                                           | 9521/20117 [6:00:40<6:36:05,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9522/20117 [6:00:42<6:38:49,  2.26s/it] 47%|███████████████████████████████████████▎                                           | 9523/20117 [6:00:44<6:37:16,  2.25s/it] 47%|███████████████████████████████████████▎                                           | 9524/20117 [6:00:47<6:35:30,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9525/20117 [6:00:49<6:55:58,  2.36s/it] 47%|███████████████████████████████████████▎                                           | 9526/20117 [6:00:51<6:49:45,  2.32s/it] 47%|███████████████████████████████████████▎                                           | 9527/20117 [6:00:54<6:47:00,  2.31s/it] 47%|███████████████████████████████████████▎                                           | 9528/20117 [6:00:56<6:45:51,  2.30s/it] 47%|███████████████████████████████████████▎                                           | 9529/20117 [6:00:58<6:40:33,  2.27s/it] 47%|███████████████████████████████████████▎                                           | 9530/20117 [6:01:00<6:41:32,  2.28s/it]                                                                                                                                 {'loss': 0.2539, 'grad_norm': 0.39556553959846497, 'learning_rate': 0.000109082500154111, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.96, 'epoch': 0.95}
 47%|███████████████████████████████████████▎                                           | 9530/20117 [6:01:00<6:41:32,  2.28s/it] 47%|███████████████████████████████████████▎                                           | 9531/20117 [6:01:03<6:41:13,  2.27s/it] 47%|███████████████████████████████████████▎                                           | 9532/20117 [6:01:05<6:42:01,  2.28s/it] 47%|███████████████████████████████████████▎                                           | 9533/20117 [6:01:07<6:45:27,  2.30s/it] 47%|███████████████████████████████████████▎                                           | 9534/20117 [6:01:10<6:47:11,  2.31s/it] 47%|███████████████████████████████████████▎                                           | 9535/20117 [6:01:12<6:46:56,  2.31s/it] 47%|███████████████████████████████████████▎                                           | 9536/20117 [6:01:14<6:43:22,  2.29s/it] 47%|███████████████████████████████████████▎                                           | 9537/20117 [6:01:16<6:37:45,  2.26s/it] 47%|███████████████████████████████████████▎                                           | 9538/20117 [6:01:19<6:38:13,  2.26s/it] 47%|███████████████████████████████████████▎                                           | 9539/20117 [6:01:21<6:38:49,  2.26s/it] 47%|███████████████████████████████████████▎                                           | 9540/20117 [6:01:23<6:35:10,  2.24s/it]                                                                                                                                 {'loss': 0.2282, 'grad_norm': 0.42129939794540405, 'learning_rate': 0.00010892619148328654, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.18, 'epoch': 0.95}
 47%|███████████████████████████████████████▎                                           | 9540/20117 [6:01:23<6:35:10,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9541/20117 [6:01:25<6:36:00,  2.25s/it] 47%|███████████████████████████████████████▎                                           | 9542/20117 [6:01:28<6:34:47,  2.24s/it] 47%|███████████████████████████████████████▎                                           | 9543/20117 [6:01:30<6:37:30,  2.26s/it] 47%|███████████████████████████████████████▍                                           | 9544/20117 [6:01:32<6:38:33,  2.26s/it] 47%|███████████████████████████████████████▍                                           | 9545/20117 [6:01:34<6:40:41,  2.27s/it] 47%|███████████████████████████████████████▍                                           | 9546/20117 [6:01:37<6:39:24,  2.27s/it] 47%|███████████████████████████████████████▍                                           | 9547/20117 [6:01:39<6:41:13,  2.28s/it] 47%|███████████████████████████████████████▍                                           | 9548/20117 [6:01:41<6:37:16,  2.26s/it] 47%|███████████████████████████████████████▍                                           | 9549/20117 [6:01:43<6:36:43,  2.25s/it] 47%|███████████████████████████████████████▍                                           | 9550/20117 [6:01:46<6:40:52,  2.28s/it]                                                                                                                                 {'loss': 0.2342, 'grad_norm': 0.49444761872291565, 'learning_rate': 0.00010876986082536584, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 394.82, 'epoch': 0.95}
 47%|███████████████████████████████████████▍                                           | 9550/20117 [6:01:46<6:40:52,  2.28s/it] 47%|███████████████████████████████████████▍                                           | 9551/20117 [6:01:48<6:40:15,  2.27s/it] 47%|███████████████████████████████████████▍                                           | 9552/20117 [6:01:50<6:39:29,  2.27s/it] 47%|███████████████████████████████████████▍                                           | 9553/20117 [6:01:53<6:39:08,  2.27s/it] 47%|███████████████████████████████████████▍                                           | 9554/20117 [6:01:55<6:37:17,  2.26s/it] 47%|███████████████████████████████████████▍                                           | 9555/20117 [6:01:57<6:38:40,  2.26s/it] 48%|███████████████████████████████████████▍                                           | 9556/20117 [6:01:59<6:36:13,  2.25s/it] 48%|███████████████████████████████████████▍                                           | 9557/20117 [6:02:02<6:37:04,  2.26s/it] 48%|███████████████████████████████████████▍                                           | 9558/20117 [6:02:04<6:37:28,  2.26s/it] 48%|███████████████████████████████████████▍                                           | 9559/20117 [6:02:06<6:40:12,  2.27s/it] 48%|███████████████████████████████████████▍                                           | 9560/20117 [6:02:08<6:39:15,  2.27s/it]                                                                                                                                 {'loss': 0.1993, 'grad_norm': 0.3940083682537079, 'learning_rate': 0.0001086135085654244, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.47, 'epoch': 0.95}
 48%|███████████████████████████████████████▍                                           | 9560/20117 [6:02:08<6:39:15,  2.27s/it] 48%|███████████████████████████████████████▍                                           | 9561/20117 [6:02:11<6:41:36,  2.28s/it] 48%|███████████████████████████████████████▍                                           | 9562/20117 [6:02:13<6:37:18,  2.26s/it] 48%|███████████████████████████████████████▍                                           | 9563/20117 [6:02:15<6:39:25,  2.27s/it] 48%|███████████████████████████████████████▍                                           | 9564/20117 [6:02:18<6:39:44,  2.27s/it] 48%|███████████████████████████████████████▍                                           | 9565/20117 [6:02:20<6:34:31,  2.24s/it] 48%|███████████████████████████████████████▍                                           | 9566/20117 [6:02:22<6:35:40,  2.25s/it] 48%|███████████████████████████████████████▍                                           | 9567/20117 [6:02:24<6:35:23,  2.25s/it] 48%|███████████████████████████████████████▍                                           | 9568/20117 [6:02:26<6:34:12,  2.24s/it] 48%|███████████████████████████████████████▍                                           | 9569/20117 [6:02:29<6:36:41,  2.26s/it] 48%|███████████████████████████████████████▍                                           | 9570/20117 [6:02:31<6:39:08,  2.27s/it]                                                                                                                                 {'loss': 0.2479, 'grad_norm': 0.40422961115837097, 'learning_rate': 0.00010845713508859088, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.94, 'epoch': 0.95}
 48%|███████████████████████████████████████▍                                           | 9570/20117 [6:02:31<6:39:08,  2.27s/it] 48%|███████████████████████████████████████▍                                           | 9571/20117 [6:02:33<6:40:45,  2.28s/it] 48%|███████████████████████████████████████▍                                           | 9572/20117 [6:02:36<6:46:00,  2.31s/it] 48%|███████████████████████████████████████▍                                           | 9573/20117 [6:02:38<6:39:17,  2.27s/it] 48%|███████████████████████████████████████▌                                           | 9574/20117 [6:02:40<6:37:16,  2.26s/it] 48%|███████████████████████████████████████▌                                           | 9575/20117 [6:02:42<6:37:21,  2.26s/it] 48%|███████████████████████████████████████▌                                           | 9576/20117 [6:02:45<6:37:54,  2.26s/it] 48%|███████████████████████████████████████▌                                           | 9577/20117 [6:02:47<6:41:09,  2.28s/it] 48%|███████████████████████████████████████▌                                           | 9578/20117 [6:02:50<7:00:11,  2.39s/it] 48%|███████████████████████████████████████▌                                           | 9579/20117 [6:02:52<6:58:11,  2.38s/it] 48%|███████████████████████████████████████▌                                           | 9580/20117 [6:02:54<6:52:56,  2.35s/it]                                                                                                                                 {'loss': 0.2217, 'grad_norm': 0.5008521676063538, 'learning_rate': 0.00010830074078004615, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 390.26, 'epoch': 0.95}
 48%|███████████████████████████████████████▌                                           | 9580/20117 [6:02:54<6:52:56,  2.35s/it] 48%|███████████████████████████████████████▌                                           | 9581/20117 [6:02:57<6:47:19,  2.32s/it] 48%|███████████████████████████████████████▌                                           | 9582/20117 [6:02:59<6:43:23,  2.30s/it] 48%|███████████████████████████████████████▌                                           | 9583/20117 [6:03:01<6:42:32,  2.29s/it] 48%|███████████████████████████████████████▌                                           | 9584/20117 [6:03:03<6:37:18,  2.26s/it] 48%|███████████████████████████████████████▌                                           | 9585/20117 [6:03:06<6:53:41,  2.36s/it] 48%|███████████████████████████████████████▌                                           | 9586/20117 [6:03:08<6:47:14,  2.32s/it] 48%|███████████████████████████████████████▌                                           | 9587/20117 [6:03:10<6:44:31,  2.30s/it] 48%|███████████████████████████████████████▌                                           | 9588/20117 [6:03:13<6:46:59,  2.32s/it] 48%|███████████████████████████████████████▌                                           | 9589/20117 [6:03:15<6:39:48,  2.28s/it] 48%|███████████████████████████████████████▌                                           | 9590/20117 [6:03:17<6:38:33,  2.27s/it]                                                                                                                                 {'loss': 0.2178, 'grad_norm': 0.37146544456481934, 'learning_rate': 0.00010814432602502246, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.55, 'epoch': 0.95}
 48%|███████████████████████████████████████▌                                           | 9590/20117 [6:03:17<6:38:33,  2.27s/it] 48%|███████████████████████████████████████▌                                           | 9591/20117 [6:03:19<6:37:59,  2.27s/it] 48%|███████████████████████████████████████▌                                           | 9592/20117 [6:03:22<6:44:58,  2.31s/it] 48%|███████████████████████████████████████▌                                           | 9593/20117 [6:03:24<6:40:55,  2.29s/it] 48%|███████████████████████████████████████▌                                           | 9594/20117 [6:03:26<6:36:29,  2.26s/it] 48%|███████████████████████████████████████▌                                           | 9595/20117 [6:03:28<6:33:47,  2.25s/it] 48%|███████████████████████████████████████▌                                           | 9596/20117 [6:03:31<6:31:16,  2.23s/it] 48%|███████████████████████████████████████▌                                           | 9597/20117 [6:03:33<6:35:20,  2.25s/it] 48%|███████████████████████████████████████▌                                           | 9598/20117 [6:03:35<6:34:53,  2.25s/it] 48%|███████████████████████████████████████▌                                           | 9599/20117 [6:03:38<6:39:50,  2.28s/it] 48%|███████████████████████████████████████▌                                           | 9600/20117 [6:03:40<6:44:21,  2.31s/it]                                                                                                                                 {'loss': 0.2183, 'grad_norm': 0.4229485094547272, 'learning_rate': 0.00010798789120880246, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 311.66, 'epoch': 0.95}
 48%|███████████████████████████████████████▌                                           | 9600/20117 [6:03:40<6:44:21,  2.31s/it] 48%|███████████████████████████████████████▌                                           | 9601/20117 [6:03:42<6:44:03,  2.31s/it] 48%|███████████████████████████████████████▌                                           | 9602/20117 [6:03:44<6:43:13,  2.30s/it] 48%|███████████████████████████████████████▌                                           | 9603/20117 [6:03:47<6:47:44,  2.33s/it] 48%|███████████████████████████████████████▌                                           | 9604/20117 [6:03:49<6:51:16,  2.35s/it] 48%|███████████████████████████████████████▋                                           | 9605/20117 [6:03:52<6:53:12,  2.36s/it] 48%|███████████████████████████████████████▋                                           | 9606/20117 [6:03:54<6:50:02,  2.34s/it] 48%|███████████████████████████████████████▋                                           | 9607/20117 [6:03:56<6:43:25,  2.30s/it] 48%|███████████████████████████████████████▋                                           | 9608/20117 [6:03:58<6:37:57,  2.27s/it] 48%|███████████████████████████████████████▋                                           | 9609/20117 [6:04:01<6:38:18,  2.27s/it] 48%|███████████████████████████████████████▋                                           | 9610/20117 [6:04:03<6:34:20,  2.25s/it]                                                                                                                                 {'loss': 0.2663, 'grad_norm': 0.38394778966903687, 'learning_rate': 0.00010783143671671813, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.51, 'epoch': 0.96}
 48%|███████████████████████████████████████▋                                           | 9610/20117 [6:04:03<6:34:20,  2.25s/it] 48%|███████████████████████████████████████▋                                           | 9611/20117 [6:04:05<6:43:10,  2.30s/it] 48%|███████████████████████████████████████▋                                           | 9612/20117 [6:04:08<6:44:50,  2.31s/it] 48%|███████████████████████████████████████▋                                           | 9613/20117 [6:04:10<6:45:13,  2.31s/it] 48%|███████████████████████████████████████▋                                           | 9614/20117 [6:04:12<6:46:02,  2.32s/it] 48%|███████████████████████████████████████▋                                           | 9615/20117 [6:04:15<6:45:45,  2.32s/it] 48%|███████████████████████████████████████▋                                           | 9616/20117 [6:04:17<6:46:58,  2.33s/it] 48%|███████████████████████████████████████▋                                           | 9617/20117 [6:04:19<6:41:38,  2.30s/it] 48%|███████████████████████████████████████▋                                           | 9618/20117 [6:04:21<6:38:58,  2.28s/it] 48%|███████████████████████████████████████▋                                           | 9619/20117 [6:04:24<6:36:21,  2.27s/it] 48%|███████████████████████████████████████▋                                           | 9620/20117 [6:04:26<6:35:08,  2.26s/it]                                                                                                                                 {'loss': 0.2776, 'grad_norm': 0.4587366580963135, 'learning_rate': 0.00010767496293414996, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 464.41, 'epoch': 0.96}
 48%|███████████████████████████████████████▋                                           | 9620/20117 [6:04:26<6:35:08,  2.26s/it] 48%|███████████████████████████████████████▋                                           | 9621/20117 [6:04:28<6:36:10,  2.26s/it] 48%|███████████████████████████████████████▋                                           | 9622/20117 [6:04:30<6:38:12,  2.28s/it] 48%|███████████████████████████████████████▋                                           | 9623/20117 [6:04:33<6:35:36,  2.26s/it] 48%|███████████████████████████████████████▋                                           | 9624/20117 [6:04:35<6:36:11,  2.27s/it] 48%|███████████████████████████████████████▋                                           | 9625/20117 [6:04:37<6:36:38,  2.27s/it] 48%|███████████████████████████████████████▋                                           | 9626/20117 [6:04:40<6:37:40,  2.27s/it] 48%|███████████████████████████████████████▋                                           | 9627/20117 [6:04:42<6:37:31,  2.27s/it] 48%|███████████████████████████████████████▋                                           | 9628/20117 [6:04:44<6:34:09,  2.25s/it] 48%|███████████████████████████████████████▋                                           | 9629/20117 [6:04:46<6:32:07,  2.24s/it] 48%|███████████████████████████████████████▋                                           | 9630/20117 [6:04:48<6:31:30,  2.24s/it]                                                                                                                                 {'loss': 0.2387, 'grad_norm': 0.5589081645011902, 'learning_rate': 0.0001075184702465259, 'memory/max_active (GiB)': 21.47, 'memory/max_allocated (GiB)': 21.47, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.44, 'epoch': 0.96}
 48%|███████████████████████████████████████▋                                           | 9630/20117 [6:04:48<6:31:30,  2.24s/it] 48%|███████████████████████████████████████▋                                           | 9631/20117 [6:04:51<6:35:07,  2.26s/it] 48%|███████████████████████████████████████▋                                           | 9632/20117 [6:04:53<6:54:37,  2.37s/it] 48%|███████████████████████████████████████▋                                           | 9633/20117 [6:04:56<6:50:52,  2.35s/it] 48%|███████████████████████████████████████▋                                           | 9634/20117 [6:04:58<6:48:43,  2.34s/it] 48%|███████████████████████████████████████▊                                           | 9635/20117 [6:05:00<6:44:31,  2.32s/it] 48%|███████████████████████████████████████▊                                           | 9636/20117 [6:05:02<6:39:48,  2.29s/it] 48%|███████████████████████████████████████▊                                           | 9637/20117 [6:05:05<6:38:39,  2.28s/it] 48%|███████████████████████████████████████▊                                           | 9638/20117 [6:05:07<6:38:46,  2.28s/it] 48%|███████████████████████████████████████▊                                           | 9639/20117 [6:05:09<6:39:59,  2.29s/it] 48%|███████████████████████████████████████▊                                           | 9640/20117 [6:05:12<6:41:25,  2.30s/it]                                                                                                                                 {'loss': 0.2587, 'grad_norm': 0.6210941672325134, 'learning_rate': 0.0001073619590393206, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.16, 'epoch': 0.96}
 48%|███████████████████████████████████████▊                                           | 9640/20117 [6:05:12<6:41:25,  2.30s/it] 48%|███████████████████████████████████████▊                                           | 9641/20117 [6:05:14<6:41:06,  2.30s/it] 48%|███████████████████████████████████████▊                                           | 9642/20117 [6:05:16<6:37:36,  2.28s/it] 48%|███████████████████████████████████████▊                                           | 9643/20117 [6:05:18<6:37:11,  2.28s/it] 48%|███████████████████████████████████████▊                                           | 9644/20117 [6:05:21<6:37:01,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9645/20117 [6:05:23<6:36:30,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9646/20117 [6:05:25<6:36:49,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9647/20117 [6:05:27<6:32:05,  2.25s/it] 48%|███████████████████████████████████████▊                                           | 9648/20117 [6:05:30<6:30:47,  2.24s/it] 48%|███████████████████████████████████████▊                                           | 9649/20117 [6:05:32<6:31:36,  2.24s/it] 48%|███████████████████████████████████████▊                                           | 9650/20117 [6:05:34<6:29:07,  2.23s/it]                                                                                                                                 {'loss': 0.2378, 'grad_norm': 0.6241095066070557, 'learning_rate': 0.0001072054296980542, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 385.04, 'epoch': 0.96}
 48%|███████████████████████████████████████▊                                           | 9650/20117 [6:05:34<6:29:07,  2.23s/it] 48%|███████████████████████████████████████▊                                           | 9651/20117 [6:05:36<6:32:33,  2.25s/it] 48%|███████████████████████████████████████▊                                           | 9652/20117 [6:05:39<6:30:00,  2.24s/it] 48%|███████████████████████████████████████▊                                           | 9653/20117 [6:05:41<6:30:16,  2.24s/it] 48%|███████████████████████████████████████▊                                           | 9654/20117 [6:05:43<6:35:55,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9655/20117 [6:05:45<6:35:34,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9656/20117 [6:05:48<6:35:17,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9657/20117 [6:05:50<6:38:36,  2.29s/it] 48%|███████████████████████████████████████▊                                           | 9658/20117 [6:05:52<6:36:20,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9659/20117 [6:05:55<6:39:06,  2.29s/it] 48%|███████████████████████████████████████▊                                           | 9660/20117 [6:05:57<6:38:10,  2.28s/it]                                                                                                                                 {'loss': 0.2767, 'grad_norm': 0.40504932403564453, 'learning_rate': 0.00010704888260829156, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 384.99, 'epoch': 0.96}
 48%|███████████████████████████████████████▊                                           | 9660/20117 [6:05:57<6:38:10,  2.28s/it] 48%|███████████████████████████████████████▊                                           | 9661/20117 [6:05:59<6:35:00,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9662/20117 [6:06:01<6:36:00,  2.27s/it] 48%|███████████████████████████████████████▊                                           | 9663/20117 [6:06:04<6:33:00,  2.26s/it] 48%|███████████████████████████████████████▊                                           | 9664/20117 [6:06:06<6:32:12,  2.25s/it] 48%|███████████████████████████████████████▉                                           | 9665/20117 [6:06:08<6:37:37,  2.28s/it] 48%|███████████████████████████████████████▉                                           | 9666/20117 [6:06:11<6:41:16,  2.30s/it] 48%|███████████████████████████████████████▉                                           | 9667/20117 [6:06:13<6:40:34,  2.30s/it] 48%|███████████████████████████████████████▉                                           | 9668/20117 [6:06:15<6:38:33,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9669/20117 [6:06:17<6:35:54,  2.27s/it] 48%|███████████████████████████████████████▉                                           | 9670/20117 [6:06:20<6:35:57,  2.27s/it]                                                                                                                                 {'loss': 0.2388, 'grad_norm': 0.4956580400466919, 'learning_rate': 0.0001068923181556412, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.96, 'epoch': 0.96}
 48%|███████████████████████████████████████▉                                           | 9670/20117 [6:06:20<6:35:57,  2.27s/it] 48%|███████████████████████████████████████▉                                           | 9671/20117 [6:06:22<6:38:32,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9672/20117 [6:06:24<6:34:39,  2.27s/it] 48%|███████████████████████████████████████▉                                           | 9673/20117 [6:06:27<6:36:58,  2.28s/it] 48%|███████████████████████████████████████▉                                           | 9674/20117 [6:06:29<6:36:12,  2.28s/it] 48%|███████████████████████████████████████▉                                           | 9675/20117 [6:06:31<6:32:43,  2.26s/it] 48%|███████████████████████████████████████▉                                           | 9676/20117 [6:06:33<6:34:02,  2.26s/it] 48%|███████████████████████████████████████▉                                           | 9677/20117 [6:06:36<6:33:37,  2.26s/it] 48%|███████████████████████████████████████▉                                           | 9678/20117 [6:06:38<6:33:28,  2.26s/it] 48%|███████████████████████████████████████▉                                           | 9679/20117 [6:06:40<6:39:07,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9680/20117 [6:06:42<6:38:25,  2.29s/it]                                                                                                                                 {'loss': 0.2238, 'grad_norm': 0.2848256230354309, 'learning_rate': 0.00010673573672575454, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.23, 'epoch': 0.96}
 48%|███████████████████████████████████████▉                                           | 9680/20117 [6:06:42<6:38:25,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9681/20117 [6:06:45<6:41:41,  2.31s/it] 48%|███████████████████████████████████████▉                                           | 9682/20117 [6:06:47<6:38:33,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9683/20117 [6:06:49<6:34:50,  2.27s/it] 48%|███████████████████████████████████████▉                                           | 9684/20117 [6:06:52<6:54:00,  2.38s/it] 48%|███████████████████████████████████████▉                                           | 9685/20117 [6:06:54<6:45:17,  2.33s/it] 48%|███████████████████████████████████████▉                                           | 9686/20117 [6:06:56<6:40:58,  2.31s/it] 48%|███████████████████████████████████████▉                                           | 9687/20117 [6:06:59<6:38:53,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9688/20117 [6:07:01<6:38:51,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9689/20117 [6:07:03<6:35:52,  2.28s/it] 48%|███████████████████████████████████████▉                                           | 9690/20117 [6:07:05<6:37:11,  2.29s/it]                                                                                                                                 {'loss': 0.2305, 'grad_norm': 0.38930168747901917, 'learning_rate': 0.00010657913870432468, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.48, 'epoch': 0.96}
 48%|███████████████████████████████████████▉                                           | 9690/20117 [6:07:05<6:37:11,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9691/20117 [6:07:08<6:34:23,  2.27s/it] 48%|███████████████████████████████████████▉                                           | 9692/20117 [6:07:10<6:37:10,  2.29s/it] 48%|███████████████████████████████████████▉                                           | 9693/20117 [6:07:12<6:35:11,  2.27s/it] 48%|███████████████████████████████████████▉                                           | 9694/20117 [6:07:14<6:31:41,  2.25s/it] 48%|████████████████████████████████████████                                           | 9695/20117 [6:07:17<6:34:32,  2.27s/it] 48%|████████████████████████████████████████                                           | 9696/20117 [6:07:19<6:33:28,  2.27s/it] 48%|████████████████████████████████████████                                           | 9697/20117 [6:07:21<6:32:10,  2.26s/it] 48%|████████████████████████████████████████                                           | 9698/20117 [6:07:24<6:33:06,  2.26s/it] 48%|████████████████████████████████████████                                           | 9699/20117 [6:07:26<6:34:38,  2.27s/it] 48%|████████████████████████████████████████                                           | 9700/20117 [6:07:28<6:38:02,  2.29s/it]                                                                                                                                 {'loss': 0.2731, 'grad_norm': 0.4527674913406372, 'learning_rate': 0.00010642252447708563, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.69, 'epoch': 0.96}
 48%|████████████████████████████████████████                                           | 9700/20117 [6:07:28<6:38:02,  2.29s/it] 48%|████████████████████████████████████████                                           | 9701/20117 [6:07:30<6:37:38,  2.29s/it] 48%|████████████████████████████████████████                                           | 9702/20117 [6:07:33<6:36:49,  2.29s/it] 48%|████████████████████████████████████████                                           | 9703/20117 [6:07:35<6:34:55,  2.28s/it] 48%|████████████████████████████████████████                                           | 9704/20117 [6:07:37<6:33:09,  2.27s/it] 48%|████████████████████████████████████████                                           | 9705/20117 [6:07:40<6:33:12,  2.27s/it] 48%|████████████████████████████████████████                                           | 9706/20117 [6:07:42<6:35:33,  2.28s/it] 48%|████████████████████████████████████████                                           | 9707/20117 [6:07:44<6:33:15,  2.27s/it] 48%|████████████████████████████████████████                                           | 9708/20117 [6:07:46<6:37:05,  2.29s/it] 48%|████████████████████████████████████████                                           | 9709/20117 [6:07:49<6:39:07,  2.30s/it] 48%|████████████████████████████████████████                                           | 9710/20117 [6:07:51<6:35:25,  2.28s/it]                                                                                                                                 {'loss': 0.2635, 'grad_norm': 0.4257194697856903, 'learning_rate': 0.00010626589442981138, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 414.72, 'epoch': 0.97}
 48%|████████████████████████████████████████                                           | 9710/20117 [6:07:51<6:35:25,  2.28s/it] 48%|████████████████████████████████████████                                           | 9711/20117 [6:07:53<6:37:18,  2.29s/it] 48%|████████████████████████████████████████                                           | 9712/20117 [6:07:56<6:37:27,  2.29s/it] 48%|████████████████████████████████████████                                           | 9713/20117 [6:07:58<6:38:08,  2.30s/it] 48%|████████████████████████████████████████                                           | 9714/20117 [6:08:00<6:38:46,  2.30s/it] 48%|████████████████████████████████████████                                           | 9715/20117 [6:08:02<6:37:08,  2.29s/it] 48%|████████████████████████████████████████                                           | 9716/20117 [6:08:05<6:36:23,  2.29s/it] 48%|████████████████████████████████████████                                           | 9717/20117 [6:08:07<6:33:57,  2.27s/it] 48%|████████████████████████████████████████                                           | 9718/20117 [6:08:09<6:33:28,  2.27s/it] 48%|████████████████████████████████████████                                           | 9719/20117 [6:08:12<6:35:12,  2.28s/it] 48%|████████████████████████████████████████                                           | 9720/20117 [6:08:14<6:36:31,  2.29s/it]                                                                                                                                 {'loss': 0.2862, 'grad_norm': 0.3962489068508148, 'learning_rate': 0.00010610924894831483, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.4, 'epoch': 0.97}
 48%|████████████████████████████████████████                                           | 9720/20117 [6:08:14<6:36:31,  2.29s/it] 48%|████████████████████████████████████████                                           | 9721/20117 [6:08:16<6:34:43,  2.28s/it] 48%|████████████████████████████████████████                                           | 9722/20117 [6:08:18<6:32:21,  2.26s/it] 48%|████████████████████████████████████████                                           | 9723/20117 [6:08:21<6:34:52,  2.28s/it] 48%|████████████████████████████████████████                                           | 9724/20117 [6:08:23<6:31:33,  2.26s/it] 48%|████████████████████████████████████████                                           | 9725/20117 [6:08:25<6:33:12,  2.27s/it] 48%|████████████████████████████████████████▏                                          | 9726/20117 [6:08:28<6:37:50,  2.30s/it] 48%|████████████████████████████████████████▏                                          | 9727/20117 [6:08:30<6:39:45,  2.31s/it] 48%|████████████████████████████████████████▏                                          | 9728/20117 [6:08:32<6:47:32,  2.35s/it] 48%|████████████████████████████████████████▏                                          | 9729/20117 [6:08:35<6:46:45,  2.35s/it] 48%|████████████████████████████████████████▏                                          | 9730/20117 [6:08:37<6:52:03,  2.38s/it]                                                                                                                                 {'loss': 0.2353, 'grad_norm': 0.29012176394462585, 'learning_rate': 0.00010595258841844688, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.81, 'epoch': 0.97}
 48%|████████████████████████████████████████▏                                          | 9730/20117 [6:08:37<6:52:03,  2.38s/it] 48%|████████████████████████████████████████▏                                          | 9731/20117 [6:08:39<6:48:12,  2.36s/it] 48%|████████████████████████████████████████▏                                          | 9732/20117 [6:08:42<6:43:42,  2.33s/it] 48%|████████████████████████████████████████▏                                          | 9733/20117 [6:08:44<6:42:46,  2.33s/it] 48%|████████████████████████████████████████▏                                          | 9734/20117 [6:08:46<6:40:00,  2.31s/it] 48%|████████████████████████████████████████▏                                          | 9735/20117 [6:08:49<6:42:07,  2.32s/it] 48%|████████████████████████████████████████▏                                          | 9736/20117 [6:08:51<6:39:57,  2.31s/it] 48%|████████████████████████████████████████▏                                          | 9737/20117 [6:08:54<6:55:04,  2.40s/it] 48%|████████████████████████████████████████▏                                          | 9738/20117 [6:08:56<6:50:52,  2.38s/it] 48%|████████████████████████████████████████▏                                          | 9739/20117 [6:08:58<6:47:50,  2.36s/it] 48%|████████████████████████████████████████▏                                          | 9740/20117 [6:09:00<6:45:44,  2.35s/it]                                                                                                                                 {'loss': 0.1947, 'grad_norm': 0.5301884412765503, 'learning_rate': 0.00010579591322609559, 'memory/max_active (GiB)': 20.61, 'memory/max_allocated (GiB)': 20.61, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.66, 'epoch': 0.97}
 48%|████████████████████████████████████████▏                                          | 9740/20117 [6:09:00<6:45:44,  2.35s/it] 48%|████████████████████████████████████████▏                                          | 9741/20117 [6:09:03<6:44:24,  2.34s/it] 48%|████████████████████████████████████████▏                                          | 9742/20117 [6:09:05<6:41:50,  2.32s/it] 48%|████████████████████████████████████████▏                                          | 9743/20117 [6:09:07<6:43:53,  2.34s/it] 48%|████████████████████████████████████████▏                                          | 9744/20117 [6:09:10<6:43:15,  2.33s/it] 48%|████████████████████████████████████████▏                                          | 9745/20117 [6:09:12<6:38:34,  2.31s/it] 48%|████████████████████████████████████████▏                                          | 9746/20117 [6:09:14<6:38:10,  2.30s/it] 48%|████████████████████████████████████████▏                                          | 9747/20117 [6:09:17<6:39:17,  2.31s/it] 48%|████████████████████████████████████████▏                                          | 9748/20117 [6:09:19<6:38:47,  2.31s/it] 48%|████████████████████████████████████████▏                                          | 9749/20117 [6:09:21<6:35:55,  2.29s/it] 48%|████████████████████████████████████████▏                                          | 9750/20117 [6:09:23<6:35:36,  2.29s/it]                                                                                                                                 {'loss': 0.1859, 'grad_norm': 0.41399112343788147, 'learning_rate': 0.000105639223757185, 'memory/max_active (GiB)': 19.98, 'memory/max_allocated (GiB)': 19.98, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.55, 'epoch': 0.97}
 48%|████████████████████████████████████████▏                                          | 9750/20117 [6:09:23<6:35:36,  2.29s/it] 48%|████████████████████████████████████████▏                                          | 9751/20117 [6:09:26<6:34:38,  2.28s/it] 48%|████████████████████████████████████████▏                                          | 9752/20117 [6:09:28<6:36:14,  2.29s/it] 48%|████████████████████████████████████████▏                                          | 9753/20117 [6:09:30<6:33:34,  2.28s/it] 48%|████████████████████████████████████████▏                                          | 9754/20117 [6:09:33<6:34:39,  2.29s/it] 48%|████████████████████████████████████████▏                                          | 9755/20117 [6:09:35<6:34:49,  2.29s/it] 48%|████████████████████████████████████████▎                                          | 9756/20117 [6:09:37<6:34:02,  2.28s/it] 49%|████████████████████████████████████████▎                                          | 9757/20117 [6:09:39<6:35:34,  2.29s/it] 49%|████████████████████████████████████████▎                                          | 9758/20117 [6:09:42<6:34:00,  2.28s/it] 49%|████████████████████████████████████████▎                                          | 9759/20117 [6:09:44<6:30:30,  2.26s/it] 49%|████████████████████████████████████████▎                                          | 9760/20117 [6:09:46<6:37:33,  2.30s/it]                                                                                                                                 {'loss': 0.1971, 'grad_norm': 0.3453201353549957, 'learning_rate': 0.00010548252039767443, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.71, 'epoch': 0.97}
 49%|████████████████████████████████████████▎                                          | 9760/20117 [6:09:46<6:37:33,  2.30s/it] 49%|████████████████████████████████████████▎                                          | 9761/20117 [6:09:49<6:38:21,  2.31s/it] 49%|████████████████████████████████████████▎                                          | 9762/20117 [6:09:51<6:35:01,  2.29s/it] 49%|████████████████████████████████████████▎                                          | 9763/20117 [6:09:53<6:39:21,  2.31s/it] 49%|████████████████████████████████████████▎                                          | 9764/20117 [6:09:56<6:36:03,  2.30s/it] 49%|████████████████████████████████████████▎                                          | 9765/20117 [6:09:58<6:33:56,  2.28s/it] 49%|████████████████████████████████████████▎                                          | 9766/20117 [6:10:00<6:36:40,  2.30s/it] 49%|████████████████████████████████████████▎                                          | 9767/20117 [6:10:02<6:36:27,  2.30s/it] 49%|████████████████████████████████████████▎                                          | 9768/20117 [6:10:05<6:36:00,  2.30s/it] 49%|████████████████████████████████████████▎                                          | 9769/20117 [6:10:07<6:34:18,  2.29s/it] 49%|████████████████████████████████████████▎                                          | 9770/20117 [6:10:09<6:31:35,  2.27s/it]                                                                                                                                 {'loss': 0.3006, 'grad_norm': 0.5453227162361145, 'learning_rate': 0.00010532580353355734, 'memory/max_active (GiB)': 21.47, 'memory/max_allocated (GiB)': 21.47, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 420.65, 'epoch': 0.97}
 49%|████████████████████████████████████████▎                                          | 9770/20117 [6:10:09<6:31:35,  2.27s/it] 49%|████████████████████████████████████████▎                                          | 9771/20117 [6:10:12<6:35:02,  2.29s/it] 49%|████████████████████████████████████████▎                                          | 9772/20117 [6:10:14<6:35:30,  2.29s/it] 49%|████████████████████████████████████████▎                                          | 9773/20117 [6:10:16<6:33:27,  2.28s/it] 49%|████████████████████████████████████████▎                                          | 9774/20117 [6:10:18<6:37:06,  2.30s/it] 49%|████████████████████████████████████████▎                                          | 9775/20117 [6:10:21<6:36:16,  2.30s/it] 49%|████████████████████████████████████████▎                                          | 9776/20117 [6:10:23<6:33:33,  2.28s/it] 49%|████████████████████████████████████████▎                                          | 9777/20117 [6:10:25<6:33:50,  2.29s/it] 49%|████████████████████████████████████████▎                                          | 9778/20117 [6:10:28<6:31:20,  2.27s/it] 49%|████████████████████████████████████████▎                                          | 9779/20117 [6:10:30<6:30:43,  2.27s/it] 49%|████████████████████████████████████████▎                                          | 9780/20117 [6:10:32<6:24:17,  2.23s/it]                                                                                                                                 {'loss': 0.2027, 'grad_norm': 0.42438754439353943, 'learning_rate': 0.00010516907355086055, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.2, 'epoch': 0.97}
 49%|████████████████████████████████████████▎                                          | 9780/20117 [6:10:32<6:24:17,  2.23s/it] 49%|████████████████████████████████████████▎                                          | 9781/20117 [6:10:34<6:22:54,  2.22s/it] 49%|████████████████████████████████████████▎                                          | 9782/20117 [6:10:36<6:19:06,  2.20s/it] 49%|████████████████████████████████████████▎                                          | 9783/20117 [6:10:39<6:21:27,  2.21s/it] 49%|████████████████████████████████████████▎                                          | 9784/20117 [6:10:41<6:24:13,  2.23s/it] 49%|████████████████████████████████████████▎                                          | 9785/20117 [6:10:43<6:34:43,  2.29s/it] 49%|████████████████████████████████████████▍                                          | 9786/20117 [6:10:46<6:35:30,  2.30s/it] 49%|████████████████████████████████████████▍                                          | 9787/20117 [6:10:48<6:36:28,  2.30s/it] 49%|████████████████████████████████████████▍                                          | 9788/20117 [6:10:50<6:32:32,  2.28s/it] 49%|████████████████████████████████████████▍                                          | 9789/20117 [6:10:52<6:35:25,  2.30s/it] 49%|████████████████████████████████████████▍                                          | 9790/20117 [6:10:55<6:33:38,  2.29s/it]                                                                                                                                 {'loss': 0.1968, 'grad_norm': 0.5116370916366577, 'learning_rate': 0.00010501233083564306, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.38, 'epoch': 0.97}
 49%|████████████████████████████████████████▍                                          | 9790/20117 [6:10:55<6:33:38,  2.29s/it] 49%|████████████████████████████████████████▍                                          | 9791/20117 [6:10:57<6:52:10,  2.39s/it] 49%|████████████████████████████████████████▍                                          | 9792/20117 [6:11:00<6:44:03,  2.35s/it] 49%|████████████████████████████████████████▍                                          | 9793/20117 [6:11:02<6:37:00,  2.31s/it] 49%|████████████████████████████████████████▍                                          | 9794/20117 [6:11:04<6:29:27,  2.26s/it] 49%|████████████████████████████████████████▍                                          | 9795/20117 [6:11:06<6:25:22,  2.24s/it] 49%|████████████████████████████████████████▍                                          | 9796/20117 [6:11:08<6:29:27,  2.26s/it] 49%|████████████████████████████████████████▍                                          | 9797/20117 [6:11:11<6:38:27,  2.32s/it] 49%|████████████████████████████████████████▍                                          | 9798/20117 [6:11:13<6:39:24,  2.32s/it] 49%|████████████████████████████████████████▍                                          | 9799/20117 [6:11:16<6:42:33,  2.34s/it] 49%|████████████████████████████████████████▍                                          | 9800/20117 [6:11:18<6:38:25,  2.32s/it]                                                                                                                                 {'loss': 0.1676, 'grad_norm': 0.23473972082138062, 'learning_rate': 0.00010485557577399536, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 386.72, 'epoch': 0.97}
 49%|████████████████████████████████████████▍                                          | 9800/20117 [6:11:18<6:38:25,  2.32s/it] 49%|████████████████████████████████████████▍                                          | 9801/20117 [6:11:20<6:38:33,  2.32s/it] 49%|████████████████████████████████████████▍                                          | 9802/20117 [6:11:22<6:35:03,  2.30s/it] 49%|████████████████████████████████████████▍                                          | 9803/20117 [6:11:25<6:34:55,  2.30s/it] 49%|████████████████████████████████████████▍                                          | 9804/20117 [6:11:27<6:33:25,  2.29s/it] 49%|████████████████████████████████████████▍                                          | 9805/20117 [6:11:29<6:30:09,  2.27s/it] 49%|████████████████████████████████████████▍                                          | 9806/20117 [6:11:31<6:27:32,  2.26s/it] 49%|████████████████████████████████████████▍                                          | 9807/20117 [6:11:34<6:30:49,  2.27s/it] 49%|████████████████████████████████████████▍                                          | 9808/20117 [6:11:36<6:31:29,  2.28s/it] 49%|████████████████████████████████████████▍                                          | 9809/20117 [6:11:38<6:31:19,  2.28s/it] 49%|████████████████████████████████████████▍                                          | 9810/20117 [6:11:41<6:36:48,  2.31s/it]                                                                                                                                 {'loss': 0.2066, 'grad_norm': 0.5427513718605042, 'learning_rate': 0.00010469880875203827, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.28, 'epoch': 0.98}
 49%|████████████████████████████████████████▍                                          | 9810/20117 [6:11:41<6:36:48,  2.31s/it] 49%|████████████████████████████████████████▍                                          | 9811/20117 [6:11:43<6:33:22,  2.29s/it] 49%|████████████████████████████████████████▍                                          | 9812/20117 [6:11:45<6:32:00,  2.28s/it] 49%|████████████████████████████████████████▍                                          | 9813/20117 [6:11:47<6:29:49,  2.27s/it] 49%|████████████████████████████████████████▍                                          | 9814/20117 [6:11:50<6:33:15,  2.29s/it] 49%|████████████████████████████████████████▍                                          | 9815/20117 [6:11:52<6:31:11,  2.28s/it] 49%|████████████████████████████████████████▍                                          | 9816/20117 [6:11:54<6:31:12,  2.28s/it] 49%|████████████████████████████████████████▌                                          | 9817/20117 [6:11:57<6:31:16,  2.28s/it] 49%|████████████████████████████████████████▌                                          | 9818/20117 [6:11:59<6:30:00,  2.27s/it] 49%|████████████████████████████████████████▌                                          | 9819/20117 [6:12:01<6:28:36,  2.26s/it] 49%|████████████████████████████████████████▌                                          | 9820/20117 [6:12:03<6:27:31,  2.26s/it]                                                                                                                                 {'loss': 0.2097, 'grad_norm': 0.4717273712158203, 'learning_rate': 0.00010454203015592214, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.9, 'epoch': 0.98}
 49%|████████████████████████████████████████▌                                          | 9820/20117 [6:12:03<6:27:31,  2.26s/it] 49%|████████████████████████████████████████▌                                          | 9821/20117 [6:12:06<6:28:32,  2.26s/it] 49%|████████████████████████████████████████▌                                          | 9822/20117 [6:12:08<6:30:26,  2.28s/it] 49%|████████████████████████████████████████▌                                          | 9823/20117 [6:12:10<6:32:51,  2.29s/it] 49%|████████████████████████████████████████▌                                          | 9824/20117 [6:12:13<6:31:35,  2.28s/it] 49%|████████████████████████████████████████▌                                          | 9825/20117 [6:12:15<6:28:27,  2.26s/it] 49%|████████████████████████████████████████▌                                          | 9826/20117 [6:12:17<6:26:59,  2.26s/it] 49%|████████████████████████████████████████▌                                          | 9827/20117 [6:12:19<6:27:12,  2.26s/it] 49%|████████████████████████████████████████▌                                          | 9828/20117 [6:12:22<6:28:30,  2.27s/it] 49%|████████████████████████████████████████▌                                          | 9829/20117 [6:12:24<6:30:41,  2.28s/it] 49%|████████████████████████████████████████▌                                          | 9830/20117 [6:12:26<6:35:13,  2.31s/it]                                                                                                                                 {'loss': 0.3073, 'grad_norm': 0.2797994017601013, 'learning_rate': 0.00010438524037182573, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.67, 'epoch': 0.98}
 49%|████████████████████████████████████████▌                                          | 9830/20117 [6:12:26<6:35:13,  2.31s/it] 49%|████████████████████████████████████████▌                                          | 9831/20117 [6:12:29<6:34:01,  2.30s/it] 49%|████████████████████████████████████████▌                                          | 9832/20117 [6:12:31<6:37:20,  2.32s/it] 49%|████████████████████████████████████████▌                                          | 9833/20117 [6:12:33<6:35:55,  2.31s/it] 49%|████████████████████████████████████████▌                                          | 9834/20117 [6:12:36<6:39:18,  2.33s/it] 49%|████████████████████████████████████████▌                                          | 9835/20117 [6:12:38<6:34:39,  2.30s/it] 49%|████████████████████████████████████████▌                                          | 9836/20117 [6:12:40<6:34:14,  2.30s/it] 49%|████████████████████████████████████████▌                                          | 9837/20117 [6:12:42<6:31:03,  2.28s/it] 49%|████████████████████████████████████████▌                                          | 9838/20117 [6:12:45<6:30:19,  2.28s/it] 49%|████████████████████████████████████████▌                                          | 9839/20117 [6:12:47<6:27:37,  2.26s/it] 49%|████████████████████████████████████████▌                                          | 9840/20117 [6:12:49<6:29:20,  2.27s/it]                                                                                                                                 {'loss': 0.2309, 'grad_norm': 0.3694850504398346, 'learning_rate': 0.00010422843978595542, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 426.94, 'epoch': 0.98}
 49%|████████████████████████████████████████▌                                          | 9840/20117 [6:12:49<6:29:20,  2.27s/it] 49%|████████████████████████████████████████▌                                          | 9841/20117 [6:12:51<6:29:05,  2.27s/it] 49%|████████████████████████████████████████▌                                          | 9842/20117 [6:12:54<6:44:39,  2.36s/it] 49%|████████████████████████████████████████▌                                          | 9843/20117 [6:12:56<6:40:37,  2.34s/it] 49%|████████████████████████████████████████▌                                          | 9844/20117 [6:12:59<6:41:57,  2.35s/it] 49%|████████████████████████████████████████▌                                          | 9845/20117 [6:13:01<6:37:53,  2.32s/it] 49%|████████████████████████████████████████▌                                          | 9846/20117 [6:13:03<6:37:06,  2.32s/it] 49%|████████████████████████████████████████▋                                          | 9847/20117 [6:13:05<6:34:59,  2.31s/it] 49%|████████████████████████████████████████▋                                          | 9848/20117 [6:13:08<6:33:49,  2.30s/it] 49%|████████████████████████████████████████▋                                          | 9849/20117 [6:13:10<6:32:21,  2.29s/it] 49%|████████████████████████████████████████▋                                          | 9850/20117 [6:13:12<6:33:37,  2.30s/it]                                                                                                                                 {'loss': 0.2697, 'grad_norm': 0.3863958716392517, 'learning_rate': 0.00010407162878454423, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.41, 'epoch': 0.98}
 49%|████████████████████████████████████████▋                                          | 9850/20117 [6:13:12<6:33:37,  2.30s/it] 49%|████████████████████████████████████████▋                                          | 9851/20117 [6:13:15<6:32:26,  2.29s/it] 49%|████████████████████████████████████████▋                                          | 9852/20117 [6:13:17<6:28:26,  2.27s/it] 49%|████████████████████████████████████████▋                                          | 9853/20117 [6:13:19<6:28:50,  2.27s/it] 49%|████████████████████████████████████████▋                                          | 9854/20117 [6:13:22<6:34:22,  2.31s/it] 49%|████████████████████████████████████████▋                                          | 9855/20117 [6:13:24<6:33:50,  2.30s/it] 49%|████████████████████████████████████████▋                                          | 9856/20117 [6:13:26<6:33:15,  2.30s/it] 49%|████████████████████████████████████████▋                                          | 9857/20117 [6:13:28<6:30:09,  2.28s/it] 49%|████████████████████████████████████████▋                                          | 9858/20117 [6:13:31<6:26:58,  2.26s/it] 49%|████████████████████████████████████████▋                                          | 9859/20117 [6:13:33<6:25:39,  2.26s/it] 49%|████████████████████████████████████████▋                                          | 9860/20117 [6:13:35<6:26:41,  2.26s/it]                                                                                                                                 {'loss': 0.1866, 'grad_norm': 0.22021318972110748, 'learning_rate': 0.00010391480775385078, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.0, 'epoch': 0.98}
 49%|████████████████████████████████████████▋                                          | 9860/20117 [6:13:35<6:26:41,  2.26s/it] 49%|████████████████████████████████████████▋                                          | 9861/20117 [6:13:37<6:28:50,  2.27s/it] 49%|████████████████████████████████████████▋                                          | 9862/20117 [6:13:40<6:28:09,  2.27s/it] 49%|████████████████████████████████████████▋                                          | 9863/20117 [6:13:42<6:28:03,  2.27s/it] 49%|████████████████████████████████████████▋                                          | 9864/20117 [6:13:44<6:26:38,  2.26s/it] 49%|████████████████████████████████████████▋                                          | 9865/20117 [6:13:46<6:24:15,  2.25s/it] 49%|████████████████████████████████████████▋                                          | 9866/20117 [6:13:49<6:27:50,  2.27s/it] 49%|████████████████████████████████████████▋                                          | 9867/20117 [6:13:51<6:26:53,  2.26s/it] 49%|████████████████████████████████████████▋                                          | 9868/20117 [6:13:53<6:25:26,  2.26s/it] 49%|████████████████████████████████████████▋                                          | 9869/20117 [6:13:55<6:27:29,  2.27s/it] 49%|████████████████████████████████████████▋                                          | 9870/20117 [6:13:58<6:29:52,  2.28s/it]                                                                                                                                 {'loss': 0.1992, 'grad_norm': 0.3698722720146179, 'learning_rate': 0.00010375797708015844, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 328.05, 'epoch': 0.98}
 49%|████████████████████████████████████████▋                                          | 9870/20117 [6:13:58<6:29:52,  2.28s/it] 49%|████████████████████████████████████████▋                                          | 9871/20117 [6:14:00<6:30:28,  2.29s/it] 49%|████████████████████████████████████████▋                                          | 9872/20117 [6:14:02<6:30:05,  2.28s/it] 49%|████████████████████████████████████████▋                                          | 9873/20117 [6:14:05<6:31:22,  2.29s/it] 49%|████████████████████████████████████████▋                                          | 9874/20117 [6:14:07<6:30:12,  2.29s/it] 49%|████████████████████████████████████████▋                                          | 9875/20117 [6:14:09<6:30:02,  2.28s/it] 49%|████████████████████████████████████████▋                                          | 9876/20117 [6:14:12<6:29:10,  2.28s/it] 49%|████████████████████████████████████████▊                                          | 9877/20117 [6:14:14<6:30:10,  2.29s/it] 49%|████████████████████████████████████████▊                                          | 9878/20117 [6:14:16<6:28:31,  2.28s/it] 49%|████████████████████████████████████████▊                                          | 9879/20117 [6:14:18<6:27:31,  2.27s/it] 49%|████████████████████████████████████████▊                                          | 9880/20117 [6:14:21<6:30:52,  2.29s/it]                                                                                                                                 {'loss': 0.2508, 'grad_norm': 0.45455843210220337, 'learning_rate': 0.00010360113714977428, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 409.92, 'epoch': 0.98}
 49%|████████████████████████████████████████▊                                          | 9880/20117 [6:14:21<6:30:52,  2.29s/it] 49%|████████████████████████████████████████▊                                          | 9881/20117 [6:14:23<6:28:16,  2.28s/it] 49%|████████████████████████████████████████▊                                          | 9882/20117 [6:14:25<6:30:23,  2.29s/it] 49%|████████████████████████████████████████▊                                          | 9883/20117 [6:14:27<6:29:16,  2.28s/it] 49%|████████████████████████████████████████▊                                          | 9884/20117 [6:14:30<6:28:01,  2.28s/it] 49%|████████████████████████████████████████▊                                          | 9885/20117 [6:14:32<6:28:45,  2.28s/it] 49%|████████████████████████████████████████▊                                          | 9886/20117 [6:14:34<6:27:48,  2.27s/it] 49%|████████████████████████████████████████▊                                          | 9887/20117 [6:14:37<6:26:02,  2.26s/it] 49%|████████████████████████████████████████▊                                          | 9888/20117 [6:14:39<6:24:16,  2.25s/it] 49%|████████████████████████████████████████▊                                          | 9889/20117 [6:14:41<6:25:10,  2.26s/it] 49%|████████████████████████████████████████▊                                          | 9890/20117 [6:14:43<6:29:45,  2.29s/it]                                                                                                                                 {'loss': 0.187, 'grad_norm': 0.3562302887439728, 'learning_rate': 0.00010344428834902822, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 262.76, 'epoch': 0.98}
 49%|████████████████████████████████████████▊                                          | 9890/20117 [6:14:43<6:29:45,  2.29s/it] 49%|████████████████████████████████████████▊                                          | 9891/20117 [6:14:46<6:31:04,  2.29s/it] 49%|████████████████████████████████████████▊                                          | 9892/20117 [6:14:48<6:29:57,  2.29s/it] 49%|████████████████████████████████████████▊                                          | 9893/20117 [6:14:50<6:31:26,  2.30s/it] 49%|████████████████████████████████████████▊                                          | 9894/20117 [6:14:53<6:49:06,  2.40s/it] 49%|████████████████████████████████████████▊                                          | 9895/20117 [6:14:55<6:42:36,  2.36s/it] 49%|████████████████████████████████████████▊                                          | 9896/20117 [6:14:57<6:38:23,  2.34s/it] 49%|████████████████████████████████████████▊                                          | 9897/20117 [6:15:00<6:37:05,  2.33s/it] 49%|████████████████████████████████████████▊                                          | 9898/20117 [6:15:02<6:35:29,  2.32s/it] 49%|████████████████████████████████████████▊                                          | 9899/20117 [6:15:04<6:35:50,  2.32s/it] 49%|████████████████████████████████████████▊                                          | 9900/20117 [6:15:07<6:36:28,  2.33s/it]                                                                                                                                 {'loss': 0.2517, 'grad_norm': 0.5735921263694763, 'learning_rate': 0.00010328743106427197, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.79, 'epoch': 0.98}
 49%|████████████████████████████████████████▊                                          | 9900/20117 [6:15:07<6:36:28,  2.33s/it] 49%|████████████████████████████████████████▊                                          | 9901/20117 [6:15:09<6:34:41,  2.32s/it] 49%|████████████████████████████████████████▊                                          | 9902/20117 [6:15:11<6:36:18,  2.33s/it] 49%|████████████████████████████████████████▊                                          | 9903/20117 [6:15:14<6:36:38,  2.33s/it] 49%|████████████████████████████████████████▊                                          | 9904/20117 [6:15:16<6:34:39,  2.32s/it] 49%|████████████████████████████████████████▊                                          | 9905/20117 [6:15:18<6:36:16,  2.33s/it] 49%|████████████████████████████████████████▊                                          | 9906/20117 [6:15:21<6:33:57,  2.31s/it] 49%|████████████████████████████████████████▊                                          | 9907/20117 [6:15:23<6:37:39,  2.34s/it] 49%|████████████████████████████████████████▉                                          | 9908/20117 [6:15:25<6:37:51,  2.34s/it] 49%|████████████████████████████████████████▉                                          | 9909/20117 [6:15:28<6:33:22,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9910/20117 [6:15:30<6:32:11,  2.31s/it]                                                                                                                                 {'loss': 0.2298, 'grad_norm': 0.47819098830223083, 'learning_rate': 0.00010313056568187818, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.88, 'epoch': 0.99}
 49%|████████████████████████████████████████▉                                          | 9910/20117 [6:15:30<6:32:11,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9911/20117 [6:15:32<6:30:50,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9912/20117 [6:15:34<6:28:04,  2.28s/it] 49%|████████████████████████████████████████▉                                          | 9913/20117 [6:15:37<6:32:32,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9914/20117 [6:15:39<6:31:40,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9915/20117 [6:15:41<6:31:00,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9916/20117 [6:15:44<6:30:48,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9917/20117 [6:15:46<6:30:56,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9918/20117 [6:15:48<6:29:54,  2.29s/it] 49%|████████████████████████████████████████▉                                          | 9919/20117 [6:15:51<6:31:54,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9920/20117 [6:15:53<6:30:42,  2.30s/it]                                                                                                                                 {'loss': 0.1617, 'grad_norm': 0.29087749123573303, 'learning_rate': 0.00010297369258823948, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 311.57, 'epoch': 0.99}
 49%|████████████████████████████████████████▉                                          | 9920/20117 [6:15:53<6:30:42,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9921/20117 [6:15:55<6:33:04,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9922/20117 [6:15:58<6:32:36,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9923/20117 [6:16:00<6:29:44,  2.29s/it] 49%|████████████████████████████████████████▉                                          | 9924/20117 [6:16:02<6:30:09,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9925/20117 [6:16:04<6:32:54,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9926/20117 [6:16:07<6:34:51,  2.32s/it] 49%|████████████████████████████████████████▉                                          | 9927/20117 [6:16:09<6:38:01,  2.34s/it] 49%|████████████████████████████████████████▉                                          | 9928/20117 [6:16:12<6:40:31,  2.36s/it] 49%|████████████████████████████████████████▉                                          | 9929/20117 [6:16:14<6:39:11,  2.35s/it] 49%|████████████████████████████████████████▉                                          | 9930/20117 [6:16:16<6:34:10,  2.32s/it]                                                                                                                                 {'loss': 0.2215, 'grad_norm': 0.5746156573295593, 'learning_rate': 0.00010281681216976742, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.68, 'epoch': 0.99}
 49%|████████████████████████████████████████▉                                          | 9930/20117 [6:16:16<6:34:10,  2.32s/it] 49%|████████████████████████████████████████▉                                          | 9931/20117 [6:16:18<6:31:06,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9932/20117 [6:16:21<6:31:50,  2.31s/it] 49%|████████████████████████████████████████▉                                          | 9933/20117 [6:16:23<6:29:38,  2.30s/it] 49%|████████████████████████████████████████▉                                          | 9934/20117 [6:16:25<6:27:54,  2.29s/it] 49%|████████████████████████████████████████▉                                          | 9935/20117 [6:16:28<6:33:09,  2.32s/it] 49%|████████████████████████████████████████▉                                          | 9936/20117 [6:16:30<6:33:19,  2.32s/it] 49%|████████████████████████████████████████▉                                          | 9937/20117 [6:16:32<6:32:37,  2.31s/it] 49%|█████████████████████████████████████████                                          | 9938/20117 [6:16:35<6:33:25,  2.32s/it] 49%|█████████████████████████████████████████                                          | 9939/20117 [6:16:37<6:34:12,  2.32s/it] 49%|█████████████████████████████████████████                                          | 9940/20117 [6:16:39<6:32:38,  2.31s/it]                                                                                                                                 {'loss': 0.2165, 'grad_norm': 0.5306881666183472, 'learning_rate': 0.00010265992481289164, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.01, 'epoch': 0.99}
 49%|█████████████████████████████████████████                                          | 9940/20117 [6:16:39<6:32:38,  2.31s/it] 49%|█████████████████████████████████████████                                          | 9941/20117 [6:16:42<6:31:19,  2.31s/it] 49%|█████████████████████████████████████████                                          | 9942/20117 [6:16:44<6:30:19,  2.30s/it] 49%|█████████████████████████████████████████                                          | 9943/20117 [6:16:46<6:28:40,  2.29s/it] 49%|█████████████████████████████████████████                                          | 9944/20117 [6:16:48<6:32:24,  2.31s/it] 49%|█████████████████████████████████████████                                          | 9945/20117 [6:16:51<6:29:41,  2.30s/it] 49%|█████████████████████████████████████████                                          | 9946/20117 [6:16:53<6:47:30,  2.40s/it] 49%|█████████████████████████████████████████                                          | 9947/20117 [6:16:56<6:41:50,  2.37s/it] 49%|█████████████████████████████████████████                                          | 9948/20117 [6:16:58<6:36:27,  2.34s/it] 49%|█████████████████████████████████████████                                          | 9949/20117 [6:17:00<6:30:37,  2.31s/it] 49%|█████████████████████████████████████████                                          | 9950/20117 [6:17:02<6:28:56,  2.30s/it]                                                                                                                                 {'loss': 0.2246, 'grad_norm': 0.5628572106361389, 'learning_rate': 0.00010250303090405886, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.08, 'epoch': 0.99}
 49%|█████████████████████████████████████████                                          | 9950/20117 [6:17:02<6:28:56,  2.30s/it] 49%|█████████████████████████████████████████                                          | 9951/20117 [6:17:05<6:32:03,  2.31s/it] 49%|█████████████████████████████████████████                                          | 9952/20117 [6:17:07<6:28:01,  2.29s/it] 49%|█████████████████████████████████████████                                          | 9953/20117 [6:17:09<6:25:57,  2.28s/it] 49%|█████████████████████████████████████████                                          | 9954/20117 [6:17:12<6:27:49,  2.29s/it] 49%|█████████████████████████████████████████                                          | 9955/20117 [6:17:14<6:25:29,  2.28s/it] 49%|█████████████████████████████████████████                                          | 9956/20117 [6:17:16<6:25:15,  2.27s/it] 49%|█████████████████████████████████████████                                          | 9957/20117 [6:17:18<6:23:54,  2.27s/it] 50%|█████████████████████████████████████████                                          | 9958/20117 [6:17:21<6:25:31,  2.28s/it] 50%|█████████████████████████████████████████                                          | 9959/20117 [6:17:23<6:24:38,  2.27s/it] 50%|█████████████████████████████████████████                                          | 9960/20117 [6:17:25<6:27:47,  2.29s/it]                                                                                                                                 {'loss': 0.2294, 'grad_norm': 0.38296106457710266, 'learning_rate': 0.00010234613082973195, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.12, 'epoch': 0.99}
 50%|█████████████████████████████████████████                                          | 9960/20117 [6:17:25<6:27:47,  2.29s/it] 50%|█████████████████████████████████████████                                          | 9961/20117 [6:17:28<6:26:39,  2.28s/it] 50%|█████████████████████████████████████████                                          | 9962/20117 [6:17:30<6:24:52,  2.27s/it] 50%|█████████████████████████████████████████                                          | 9963/20117 [6:17:32<6:25:53,  2.28s/it] 50%|█████████████████████████████████████████                                          | 9964/20117 [6:17:34<6:18:54,  2.24s/it] 50%|█████████████████████████████████████████                                          | 9965/20117 [6:17:36<6:15:02,  2.22s/it] 50%|█████████████████████████████████████████                                          | 9966/20117 [6:17:39<6:14:27,  2.21s/it] 50%|█████████████████████████████████████████                                          | 9967/20117 [6:17:41<6:11:36,  2.20s/it] 50%|█████████████████████████████████████████▏                                         | 9968/20117 [6:17:43<6:10:11,  2.19s/it] 50%|█████████████████████████████████████████▏                                         | 9969/20117 [6:17:45<6:13:50,  2.21s/it] 50%|█████████████████████████████████████████▏                                         | 9970/20117 [6:17:47<6:18:20,  2.24s/it]                                                                                                                                 {'loss': 0.2263, 'grad_norm': 0.5048671960830688, 'learning_rate': 0.00010218922497638893, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.03, 'epoch': 0.99}
 50%|█████████████████████████████████████████▏                                         | 9970/20117 [6:17:47<6:18:20,  2.24s/it] 50%|█████████████████████████████████████████▏                                         | 9971/20117 [6:17:50<6:27:03,  2.29s/it] 50%|█████████████████████████████████████████▏                                         | 9972/20117 [6:17:52<6:33:06,  2.32s/it] 50%|█████████████████████████████████████████▏                                         | 9973/20117 [6:17:55<6:28:19,  2.30s/it] 50%|█████████████████████████████████████████▏                                         | 9974/20117 [6:17:57<6:29:12,  2.30s/it] 50%|█████████████████████████████████████████▏                                         | 9975/20117 [6:17:59<6:33:30,  2.33s/it] 50%|█████████████████████████████████████████▏                                         | 9976/20117 [6:18:02<6:35:16,  2.34s/it] 50%|█████████████████████████████████████████▏                                         | 9977/20117 [6:18:04<6:33:01,  2.33s/it] 50%|█████████████████████████████████████████▏                                         | 9978/20117 [6:18:06<6:27:09,  2.29s/it] 50%|█████████████████████████████████████████▏                                         | 9979/20117 [6:18:08<6:19:42,  2.25s/it] 50%|█████████████████████████████████████████▏                                         | 9980/20117 [6:18:10<6:16:03,  2.23s/it]                                                                                                                                 {'loss': 0.2391, 'grad_norm': 0.2620450258255005, 'learning_rate': 0.00010203231373052205, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.95, 'epoch': 0.99}
 50%|█████████████████████████████████████████▏                                         | 9980/20117 [6:18:10<6:16:03,  2.23s/it] 50%|█████████████████████████████████████████▏                                         | 9981/20117 [6:18:13<6:15:40,  2.22s/it] 50%|█████████████████████████████████████████▏                                         | 9982/20117 [6:18:15<6:22:22,  2.26s/it] 50%|█████████████████████████████████████████▏                                         | 9983/20117 [6:18:17<6:25:55,  2.28s/it] 50%|█████████████████████████████████████████▏                                         | 9984/20117 [6:18:20<6:29:35,  2.31s/it] 50%|█████████████████████████████████████████▏                                         | 9985/20117 [6:18:22<6:31:14,  2.32s/it] 50%|█████████████████████████████████████████▏                                         | 9986/20117 [6:18:24<6:28:31,  2.30s/it] 50%|█████████████████████████████████████████▏                                         | 9987/20117 [6:18:27<6:31:10,  2.32s/it] 50%|█████████████████████████████████████████▏                                         | 9988/20117 [6:18:29<6:33:36,  2.33s/it] 50%|█████████████████████████████████████████▏                                         | 9989/20117 [6:18:31<6:29:57,  2.31s/it] 50%|█████████████████████████████████████████▏                                         | 9990/20117 [6:18:34<6:30:17,  2.31s/it]                                                                                                                                 {'loss': 0.2654, 'grad_norm': 0.37807098031044006, 'learning_rate': 0.00010187539747863693, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 413.64, 'epoch': 0.99}
 50%|█████████████████████████████████████████▏                                         | 9990/20117 [6:18:34<6:30:17,  2.31s/it] 50%|█████████████████████████████████████████▏                                         | 9991/20117 [6:18:36<6:27:06,  2.29s/it] 50%|█████████████████████████████████████████▏                                         | 9992/20117 [6:18:38<6:24:56,  2.28s/it] 50%|█████████████████████████████████████████▏                                         | 9993/20117 [6:18:40<6:27:04,  2.29s/it] 50%|█████████████████████████████████████████▏                                         | 9994/20117 [6:18:43<6:31:14,  2.32s/it] 50%|█████████████████████████████████████████▏                                         | 9995/20117 [6:18:45<6:28:03,  2.30s/it] 50%|█████████████████████████████████████████▏                                         | 9996/20117 [6:18:47<6:27:26,  2.30s/it] 50%|█████████████████████████████████████████▏                                         | 9997/20117 [6:18:50<6:40:21,  2.37s/it] 50%|█████████████████████████████████████████▎                                         | 9998/20117 [6:18:52<6:37:14,  2.36s/it] 50%|█████████████████████████████████████████▎                                         | 9999/20117 [6:18:54<6:31:09,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10000/20117 [6:18:57<6:31:19,  2.32s/it]                                                                                                                                 {'loss': 0.2081, 'grad_norm': 0.3059203624725342, 'learning_rate': 0.00010171847660725147, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.3, 'epoch': 0.99}
 50%|████████████████████████████████████████▊                                         | 10000/20117 [6:18:57<6:31:19,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10001/20117 [6:18:59<6:29:39,  2.31s/it] 50%|████████████████████████████████████████▊                                         | 10002/20117 [6:19:01<6:30:28,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10003/20117 [6:19:04<6:36:32,  2.35s/it] 50%|████████████████████████████████████████▊                                         | 10004/20117 [6:19:06<6:33:33,  2.33s/it] 50%|████████████████████████████████████████▊                                         | 10005/20117 [6:19:08<6:30:03,  2.31s/it] 50%|████████████████████████████████████████▊                                         | 10006/20117 [6:19:11<6:33:20,  2.33s/it] 50%|████████████████████████████████████████▊                                         | 10007/20117 [6:19:13<6:34:12,  2.34s/it] 50%|████████████████████████████████████████▊                                         | 10008/20117 [6:19:15<6:35:16,  2.35s/it] 50%|████████████████████████████████████████▊                                         | 10009/20117 [6:19:18<6:31:49,  2.33s/it] 50%|████████████████████████████████████████▊                                         | 10010/20117 [6:19:20<6:32:18,  2.33s/it]                                                                                                                                 {'loss': 0.2826, 'grad_norm': 0.38892289996147156, 'learning_rate': 0.0001015615515028949, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.68, 'epoch': 1.0}
 50%|████████████████████████████████████████▊                                         | 10010/20117 [6:19:20<6:32:18,  2.33s/it] 50%|████████████████████████████████████████▊                                         | 10011/20117 [6:19:22<6:29:33,  2.31s/it] 50%|████████████████████████████████████████▊                                         | 10012/20117 [6:19:25<6:30:06,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10013/20117 [6:19:27<6:27:09,  2.30s/it] 50%|████████████████████████████████████████▊                                         | 10014/20117 [6:19:29<6:30:52,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10015/20117 [6:19:32<6:29:20,  2.31s/it] 50%|████████████████████████████████████████▊                                         | 10016/20117 [6:19:34<6:30:28,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10017/20117 [6:19:36<6:30:48,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10018/20117 [6:19:39<6:29:10,  2.31s/it] 50%|████████████████████████████████████████▊                                         | 10019/20117 [6:19:41<6:27:13,  2.30s/it] 50%|████████████████████████████████████████▊                                         | 10020/20117 [6:19:43<6:26:34,  2.30s/it]                                                                                                                                 {'loss': 0.2674, 'grad_norm': 0.22751818597316742, 'learning_rate': 0.00010140462255210696, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.08, 'epoch': 1.0}
 50%|████████████████████████████████████████▊                                         | 10020/20117 [6:19:43<6:26:34,  2.30s/it] 50%|████████████████████████████████████████▊                                         | 10021/20117 [6:19:45<6:23:49,  2.28s/it] 50%|████████████████████████████████████████▊                                         | 10022/20117 [6:19:48<6:21:46,  2.27s/it] 50%|████████████████████████████████████████▊                                         | 10023/20117 [6:19:50<6:24:03,  2.28s/it] 50%|████████████████████████████████████████▊                                         | 10024/20117 [6:19:52<6:21:53,  2.27s/it] 50%|████████████████████████████████████████▊                                         | 10025/20117 [6:19:54<6:23:38,  2.28s/it] 50%|████████████████████████████████████████▊                                         | 10026/20117 [6:19:57<6:30:59,  2.32s/it] 50%|████████████████████████████████████████▊                                         | 10027/20117 [6:19:59<6:30:02,  2.32s/it] 50%|████████████████████████████████████████▉                                         | 10028/20117 [6:20:02<6:29:55,  2.32s/it] 50%|████████████████████████████████████████▉                                         | 10029/20117 [6:20:04<6:27:43,  2.31s/it] 50%|████████████████████████████████████████▉                                         | 10030/20117 [6:20:06<6:28:32,  2.31s/it]                                                                                                                                 {'loss': 0.2125, 'grad_norm': 0.3578786551952362, 'learning_rate': 0.00010124769014143678, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.81, 'epoch': 1.0}
 50%|████████████████████████████████████████▉                                         | 10030/20117 [6:20:06<6:28:32,  2.31s/it] 50%|████████████████████████████████████████▉                                         | 10031/20117 [6:20:08<6:26:20,  2.30s/it] 50%|████████████████████████████████████████▉                                         | 10032/20117 [6:20:11<6:32:26,  2.33s/it] 50%|████████████████████████████████████████▉                                         | 10033/20117 [6:20:13<6:33:31,  2.34s/it] 50%|████████████████████████████████████████▉                                         | 10034/20117 [6:20:16<6:32:15,  2.33s/it] 50%|████████████████████████████████████████▉                                         | 10035/20117 [6:20:18<6:29:04,  2.32s/it] 50%|████████████████████████████████████████▉                                         | 10036/20117 [6:20:20<6:29:51,  2.32s/it] 50%|████████████████████████████████████████▉                                         | 10037/20117 [6:20:22<6:28:52,  2.31s/it] 50%|████████████████████████████████████████▉                                         | 10038/20117 [6:20:25<6:30:35,  2.33s/it] 50%|████████████████████████████████████████▉                                         | 10039/20117 [6:20:27<6:35:51,  2.36s/it] 50%|████████████████████████████████████████▉                                         | 10040/20117 [6:20:30<6:36:14,  2.36s/it]                                                                                                                                 {'loss': 0.1599, 'grad_norm': 0.32971230149269104, 'learning_rate': 0.00010109075465744208, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.7, 'epoch': 1.0}
 50%|████████████████████████████████████████▉                                         | 10040/20117 [6:20:30<6:36:14,  2.36s/it] 50%|████████████████████████████████████████▉                                         | 10041/20117 [6:20:32<6:36:35,  2.36s/it] 50%|████████████████████████████████████████▉                                         | 10042/20117 [6:20:34<6:37:01,  2.36s/it] 50%|████████████████████████████████████████▉                                         | 10043/20117 [6:20:37<6:34:33,  2.35s/it] 50%|████████████████████████████████████████▉                                         | 10044/20117 [6:20:39<6:33:00,  2.34s/it] 50%|████████████████████████████████████████▉                                         | 10045/20117 [6:20:41<6:29:17,  2.32s/it] 50%|████████████████████████████████████████▉                                         | 10046/20117 [6:20:44<6:28:49,  2.32s/it] 50%|████████████████████████████████████████▉                                         | 10047/20117 [6:20:46<6:27:42,  2.31s/it] 50%|████████████████████████████████████████▉                                         | 10048/20117 [6:20:48<6:27:28,  2.31s/it] 50%|████████████████████████████████████████▉                                         | 10049/20117 [6:20:50<6:29:20,  2.32s/it] 50%|████████████████████████████████████████▉                                         | 10050/20117 [6:20:53<6:30:48,  2.33s/it]                                                                                                                                 {'loss': 0.2485, 'grad_norm': 0.4897179901599884, 'learning_rate': 0.00010093381648668813, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.41, 'epoch': 1.0}
 50%|████████████████████████████████████████▉                                         | 10050/20117 [6:20:53<6:30:48,  2.33s/it] 50%|████████████████████████████████████████▉                                         | 10051/20117 [6:20:55<6:48:46,  2.44s/it] 50%|████████████████████████████████████████▉                                         | 10052/20117 [6:20:58<6:46:00,  2.42s/it] 50%|████████████████████████████████████████▉                                         | 10053/20117 [6:21:00<6:44:27,  2.41s/it] 50%|████████████████████████████████████████▉                                         | 10054/20117 [6:21:03<6:40:13,  2.39s/it] 50%|████████████████████████████████████████▉                                         | 10055/20117 [6:21:05<6:37:43,  2.37s/it] 50%|████████████████████████████████████████▉                                         | 10056/20117 [6:21:07<6:32:34,  2.34s/it] 50%|████████████████████████████████████████▉                                         | 10057/20117 [6:21:10<6:31:51,  2.34s/it] 50%|████████████████████████████████████████▉                                         | 10058/20117 [6:21:12<6:31:40,  2.34s/it] 50%|█████████████████████████████████████████                                         | 10059/20117 [6:21:13<5:37:15,  2.01s/it][2026-04-15 21:24:28,457] [INFO] [axolotl.core.trainers.base._save:671] [PID:2788] Saving model checkpoint to ./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent/checkpoint-10059
[2026-04-15 21:24:29,998] [WARNING] [py.warnings._showwarnmsg:110] [PID:2788] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:186: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")

 50%|█████████████████████████████████████████                                         | 10060/20117 [6:21:17<7:15:57,  2.60s/it]                                                                                                                                 {'loss': 0.2974, 'grad_norm': 0.23543600738048553, 'learning_rate': 0.00010077687601574678, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.91, 'epoch': 1.0}
 50%|█████████████████████████████████████████                                         | 10060/20117 [6:21:17<7:15:57,  2.60s/it] 50%|█████████████████████████████████████████                                         | 10061/20117 [6:21:19<6:59:35,  2.50s/it] 50%|█████████████████████████████████████████                                         | 10062/20117 [6:21:22<6:48:02,  2.43s/it] 50%|█████████████████████████████████████████                                         | 10063/20117 [6:21:24<6:41:01,  2.39s/it] 50%|█████████████████████████████████████████                                         | 10064/20117 [6:21:26<6:32:35,  2.34s/it] 50%|█████████████████████████████████████████                                         | 10065/20117 [6:21:29<6:31:58,  2.34s/it] 50%|█████████████████████████████████████████                                         | 10066/20117 [6:21:31<6:32:26,  2.34s/it] 50%|█████████████████████████████████████████                                         | 10067/20117 [6:21:33<6:31:08,  2.34s/it] 50%|█████████████████████████████████████████                                         | 10068/20117 [6:21:35<6:29:49,  2.33s/it] 50%|█████████████████████████████████████████                                         | 10069/20117 [6:21:38<6:27:06,  2.31s/it] 50%|█████████████████████████████████████████                                         | 10070/20117 [6:21:40<6:27:05,  2.31s/it]                                                                                                                                 {'loss': 0.155, 'grad_norm': 0.30785781145095825, 'learning_rate': 0.00010061993363119566, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.67, 'epoch': 1.0}
 50%|█████████████████████████████████████████                                         | 10070/20117 [6:21:40<6:27:05,  2.31s/it] 50%|█████████████████████████████████████████                                         | 10071/20117 [6:21:42<6:24:55,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10072/20117 [6:21:45<6:18:58,  2.26s/it] 50%|█████████████████████████████████████████                                         | 10073/20117 [6:21:47<6:22:56,  2.29s/it] 50%|█████████████████████████████████████████                                         | 10074/20117 [6:21:49<6:23:40,  2.29s/it] 50%|█████████████████████████████████████████                                         | 10075/20117 [6:21:51<6:24:43,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10076/20117 [6:21:54<6:24:51,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10077/20117 [6:21:56<6:24:24,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10078/20117 [6:21:58<6:25:49,  2.31s/it] 50%|█████████████████████████████████████████                                         | 10079/20117 [6:22:01<6:22:29,  2.29s/it] 50%|█████████████████████████████████████████                                         | 10080/20117 [6:22:03<6:20:47,  2.28s/it]                                                                                                                                 {'loss': 0.2054, 'grad_norm': 0.36587706208229065, 'learning_rate': 0.00010046298971961695, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 388.65, 'epoch': 1.0}
 50%|█████████████████████████████████████████                                         | 10080/20117 [6:22:03<6:20:47,  2.28s/it] 50%|█████████████████████████████████████████                                         | 10081/20117 [6:22:05<6:25:25,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10082/20117 [6:22:08<6:23:53,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10083/20117 [6:22:10<6:24:44,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10084/20117 [6:22:12<6:23:04,  2.29s/it] 50%|█████████████████████████████████████████                                         | 10085/20117 [6:22:14<6:24:31,  2.30s/it] 50%|█████████████████████████████████████████                                         | 10086/20117 [6:22:17<6:23:32,  2.29s/it] 50%|█████████████████████████████████████████                                         | 10087/20117 [6:22:19<6:21:04,  2.28s/it] 50%|█████████████████████████████████████████                                         | 10088/20117 [6:22:21<6:21:59,  2.29s/it] 50%|█████████████████████████████████████████                                         | 10089/20117 [6:22:24<6:20:06,  2.27s/it] 50%|█████████████████████████████████████████▏                                        | 10090/20117 [6:22:26<6:18:28,  2.26s/it]                                                                                                                                 {'loss': 0.1678, 'grad_norm': 0.5172699093818665, 'learning_rate': 0.0001003060446675967, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.25, 'epoch': 1.0}
 50%|█████████████████████████████████████████▏                                        | 10090/20117 [6:22:26<6:18:28,  2.26s/it] 50%|█████████████████████████████████████████▏                                        | 10091/20117 [6:22:28<6:16:17,  2.25s/it] 50%|█████████████████████████████████████████▏                                        | 10092/20117 [6:22:30<6:18:44,  2.27s/it] 50%|█████████████████████████████████████████▏                                        | 10093/20117 [6:22:33<6:20:01,  2.27s/it] 50%|█████████████████████████████████████████▏                                        | 10094/20117 [6:22:35<6:19:16,  2.27s/it] 50%|█████████████████████████████████████████▏                                        | 10095/20117 [6:22:37<6:19:51,  2.27s/it] 50%|█████████████████████████████████████████▏                                        | 10096/20117 [6:22:39<6:22:24,  2.29s/it] 50%|█████████████████████████████████████████▏                                        | 10097/20117 [6:22:42<6:22:13,  2.29s/it] 50%|█████████████████████████████████████████▏                                        | 10098/20117 [6:22:44<6:24:57,  2.31s/it] 50%|█████████████████████████████████████████▏                                        | 10099/20117 [6:22:46<6:25:34,  2.31s/it] 50%|█████████████████████████████████████████▏                                        | 10100/20117 [6:22:49<6:22:38,  2.29s/it]                                                                                                                                 {'loss': 0.1414, 'grad_norm': 0.3362785279750824, 'learning_rate': 0.00010014909886172377, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.06, 'epoch': 1.0}
 50%|█████████████████████████████████████████▏                                        | 10100/20117 [6:22:49<6:22:38,  2.29s/it] 50%|█████████████████████████████████████████▏                                        | 10101/20117 [6:22:51<6:22:18,  2.29s/it] 50%|█████████████████████████████████████████▏                                        | 10102/20117 [6:22:53<6:23:47,  2.30s/it] 50%|█████████████████████████████████████████▏                                        | 10103/20117 [6:22:56<6:44:15,  2.42s/it] 50%|█████████████████████████████████████████▏                                        | 10104/20117 [6:22:58<6:37:22,  2.38s/it] 50%|█████████████████████████████████████████▏                                        | 10105/20117 [6:23:00<6:30:15,  2.34s/it] 50%|█████████████████████████████████████████▏                                        | 10106/20117 [6:23:03<6:26:47,  2.32s/it] 50%|█████████████████████████████████████████▏                                        | 10107/20117 [6:23:05<6:27:22,  2.32s/it] 50%|█████████████████████████████████████████▏                                        | 10108/20117 [6:23:07<6:29:28,  2.33s/it] 50%|█████████████████████████████████████████▏                                        | 10109/20117 [6:23:10<6:29:20,  2.33s/it] 50%|█████████████████████████████████████████▏                                        | 10110/20117 [6:23:12<6:26:54,  2.32s/it]                                                                                                                                 {'loss': 0.2086, 'grad_norm': 0.5255736112594604, 'learning_rate': 9.99921526885888e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 428.98, 'epoch': 1.01}
 50%|█████████████████████████████████████████▏                                        | 10110/20117 [6:23:12<6:26:54,  2.32s/it] 50%|█████████████████████████████████████████▏                                        | 10111/20117 [6:23:14<6:25:32,  2.31s/it] 50%|█████████████████████████████████████████▏                                        | 10112/20117 [6:23:17<6:26:45,  2.32s/it] 50%|█████████████████████████████████████████▏                                        | 10113/20117 [6:23:19<6:23:51,  2.30s/it] 50%|█████████████████████████████████████████▏                                        | 10114/20117 [6:23:21<6:29:47,  2.34s/it] 50%|█████████████████████████████████████████▏                                        | 10115/20117 [6:23:24<6:27:44,  2.33s/it] 50%|█████████████████████████████████████████▏                                        | 10116/20117 [6:23:26<6:24:53,  2.31s/it] 50%|█████████████████████████████████████████▏                                        | 10117/20117 [6:23:28<6:19:19,  2.28s/it] 50%|█████████████████████████████████████████▏                                        | 10118/20117 [6:23:30<6:16:49,  2.26s/it] 50%|█████████████████████████████████████████▏                                        | 10119/20117 [6:23:33<6:14:22,  2.25s/it] 50%|█████████████████████████████████████████▎                                        | 10120/20117 [6:23:35<6:13:27,  2.24s/it]                                                                                                                                 {'loss': 0.1895, 'grad_norm': 0.4546601474285126, 'learning_rate': 9.983520653478343e-05, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.97, 'epoch': 1.01}
 50%|█████████████████████████████████████████▎                                        | 10120/20117 [6:23:35<6:13:27,  2.24s/it] 50%|█████████████████████████████████████████▎                                        | 10121/20117 [6:23:37<6:14:56,  2.25s/it] 50%|█████████████████████████████████████████▎                                        | 10122/20117 [6:23:39<6:16:51,  2.26s/it] 50%|█████████████████████████████████████████▎                                        | 10123/20117 [6:23:42<6:17:19,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10124/20117 [6:23:44<6:15:29,  2.25s/it] 50%|█████████████████████████████████████████▎                                        | 10125/20117 [6:23:46<6:17:50,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10126/20117 [6:23:48<6:16:05,  2.26s/it] 50%|█████████████████████████████████████████▎                                        | 10127/20117 [6:23:51<6:14:40,  2.25s/it] 50%|█████████████████████████████████████████▎                                        | 10128/20117 [6:23:53<6:14:42,  2.25s/it] 50%|█████████████████████████████████████████▎                                        | 10129/20117 [6:23:55<6:16:21,  2.26s/it] 50%|█████████████████████████████████████████▎                                        | 10130/20117 [6:23:57<6:17:50,  2.27s/it]                                                                                                                                 {'loss': 0.1643, 'grad_norm': 0.33736681938171387, 'learning_rate': 9.967826078689919e-05, 'memory/max_active (GiB)': 19.67, 'memory/max_allocated (GiB)': 19.67, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.84, 'epoch': 1.01}
 50%|█████████████████████████████████████████▎                                        | 10130/20117 [6:23:57<6:17:50,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10131/20117 [6:24:00<6:17:06,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10132/20117 [6:24:02<6:15:31,  2.26s/it] 50%|█████████████████████████████████████████▎                                        | 10133/20117 [6:24:04<6:15:39,  2.26s/it] 50%|█████████████████████████████████████████▎                                        | 10134/20117 [6:24:07<6:17:40,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10135/20117 [6:24:09<6:18:17,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10136/20117 [6:24:11<6:19:47,  2.28s/it] 50%|█████████████████████████████████████████▎                                        | 10137/20117 [6:24:13<6:19:34,  2.28s/it] 50%|█████████████████████████████████████████▎                                        | 10138/20117 [6:24:16<6:17:34,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10139/20117 [6:24:18<6:18:41,  2.28s/it] 50%|█████████████████████████████████████████▎                                        | 10140/20117 [6:24:20<6:16:56,  2.27s/it]                                                                                                                                 {'loss': 0.1171, 'grad_norm': 0.2856805622577667, 'learning_rate': 9.952131583152665e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 320.63, 'epoch': 1.01}
 50%|█████████████████████████████████████████▎                                        | 10140/20117 [6:24:20<6:16:56,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10141/20117 [6:24:22<6:19:19,  2.28s/it] 50%|█████████████████████████████████████████▎                                        | 10142/20117 [6:24:25<6:19:00,  2.28s/it] 50%|█████████████████████████████████████████▎                                        | 10143/20117 [6:24:27<6:20:32,  2.29s/it] 50%|█████████████████████████████████████████▎                                        | 10144/20117 [6:24:29<6:19:49,  2.29s/it] 50%|█████████████████████████████████████████▎                                        | 10145/20117 [6:24:32<6:21:19,  2.29s/it] 50%|█████████████████████████████████████████▎                                        | 10146/20117 [6:24:34<6:21:59,  2.30s/it] 50%|█████████████████████████████████████████▎                                        | 10147/20117 [6:24:36<6:28:39,  2.34s/it] 50%|█████████████████████████████████████████▎                                        | 10148/20117 [6:24:39<6:21:46,  2.30s/it] 50%|█████████████████████████████████████████▎                                        | 10149/20117 [6:24:41<6:16:24,  2.27s/it] 50%|█████████████████████████████████████████▎                                        | 10150/20117 [6:24:43<6:11:57,  2.24s/it]                                                                                                                                 {'loss': 0.1597, 'grad_norm': 0.35713866353034973, 'learning_rate': 9.936437205525437e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.04, 'epoch': 1.01}
 50%|█████████████████████████████████████████▎                                        | 10150/20117 [6:24:43<6:11:57,  2.24s/it] 50%|█████████████████████████████████████████▍                                        | 10151/20117 [6:24:45<6:10:26,  2.23s/it] 50%|█████████████████████████████████████████▍                                        | 10152/20117 [6:24:47<6:08:02,  2.22s/it] 50%|█████████████████████████████████████████▍                                        | 10153/20117 [6:24:50<6:12:49,  2.25s/it] 50%|█████████████████████████████████████████▍                                        | 10154/20117 [6:24:52<6:13:22,  2.25s/it] 50%|█████████████████████████████████████████▍                                        | 10155/20117 [6:24:55<6:33:17,  2.37s/it] 50%|█████████████████████████████████████████▍                                        | 10156/20117 [6:24:57<6:28:36,  2.34s/it] 50%|█████████████████████████████████████████▍                                        | 10157/20117 [6:24:59<6:23:52,  2.31s/it] 50%|█████████████████████████████████████████▍                                        | 10158/20117 [6:25:01<6:24:07,  2.31s/it] 50%|█████████████████████████████████████████▍                                        | 10159/20117 [6:25:04<6:22:39,  2.31s/it] 51%|█████████████████████████████████████████▍                                        | 10160/20117 [6:25:06<6:22:00,  2.30s/it]                                                                                                                                 {'loss': 0.2034, 'grad_norm': 0.4092197120189667, 'learning_rate': 9.920742984466809e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.18, 'epoch': 1.01}
 51%|█████████████████████████████████████████▍                                        | 10160/20117 [6:25:06<6:22:00,  2.30s/it] 51%|█████████████████████████████████████████▍                                        | 10161/20117 [6:25:08<6:21:24,  2.30s/it] 51%|█████████████████████████████████████████▍                                        | 10162/20117 [6:25:11<6:22:09,  2.30s/it] 51%|█████████████████████████████████████████▍                                        | 10163/20117 [6:25:13<6:19:06,  2.29s/it] 51%|█████████████████████████████████████████▍                                        | 10164/20117 [6:25:15<6:17:04,  2.27s/it] 51%|█████████████████████████████████████████▍                                        | 10165/20117 [6:25:17<6:14:03,  2.26s/it] 51%|█████████████████████████████████████████▍                                        | 10166/20117 [6:25:20<6:12:04,  2.24s/it] 51%|█████████████████████████████████████████▍                                        | 10167/20117 [6:25:22<6:14:33,  2.26s/it] 51%|█████████████████████████████████████████▍                                        | 10168/20117 [6:25:24<6:18:42,  2.28s/it] 51%|█████████████████████████████████████████▍                                        | 10169/20117 [6:25:26<6:19:52,  2.29s/it] 51%|█████████████████████████████████████████▍                                        | 10170/20117 [6:25:29<6:21:10,  2.30s/it]                                                                                                                                 {'loss': 0.1466, 'grad_norm': 0.36163678765296936, 'learning_rate': 9.905048958634958e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 416.08, 'epoch': 1.01}
 51%|█████████████████████████████████████████▍                                        | 10170/20117 [6:25:29<6:21:10,  2.30s/it] 51%|█████████████████████████████████████████▍                                        | 10171/20117 [6:25:31<6:18:28,  2.28s/it] 51%|█████████████████████████████████████████▍                                        | 10172/20117 [6:25:33<6:19:28,  2.29s/it] 51%|█████████████████████████████████████████▍                                        | 10173/20117 [6:25:36<6:16:54,  2.27s/it] 51%|█████████████████████████████████████████▍                                        | 10174/20117 [6:25:38<6:20:24,  2.30s/it] 51%|█████████████████████████████████████████▍                                        | 10175/20117 [6:25:40<6:20:09,  2.29s/it] 51%|█████████████████████████████████████████▍                                        | 10176/20117 [6:25:42<6:18:28,  2.28s/it] 51%|█████████████████████████████████████████▍                                        | 10177/20117 [6:25:45<6:20:28,  2.30s/it] 51%|█████████████████████████████████████████▍                                        | 10178/20117 [6:25:47<6:18:58,  2.29s/it] 51%|█████████████████████████████████████████▍                                        | 10179/20117 [6:25:49<6:16:14,  2.27s/it] 51%|█████████████████████████████████████████▍                                        | 10180/20117 [6:25:52<6:15:07,  2.26s/it]                                                                                                                                 {'loss': 0.1191, 'grad_norm': 0.5861209630966187, 'learning_rate': 9.889355166687593e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.4, 'epoch': 1.01}
 51%|█████████████████████████████████████████▍                                        | 10180/20117 [6:25:52<6:15:07,  2.26s/it] 51%|█████████████████████████████████████████▍                                        | 10181/20117 [6:25:54<6:17:56,  2.28s/it] 51%|█████████████████████████████████████████▌                                        | 10182/20117 [6:25:56<6:16:56,  2.28s/it] 51%|█████████████████████████████████████████▌                                        | 10183/20117 [6:25:58<6:15:38,  2.27s/it] 51%|█████████████████████████████████████████▌                                        | 10184/20117 [6:26:01<6:16:54,  2.28s/it] 51%|█████████████████████████████████████████▌                                        | 10185/20117 [6:26:03<6:13:39,  2.26s/it] 51%|█████████████████████████████████████████▌                                        | 10186/20117 [6:26:05<6:16:08,  2.27s/it] 51%|█████████████████████████████████████████▌                                        | 10187/20117 [6:26:07<6:14:39,  2.26s/it] 51%|█████████████████████████████████████████▌                                        | 10188/20117 [6:26:10<6:16:07,  2.27s/it] 51%|█████████████████████████████████████████▌                                        | 10189/20117 [6:26:12<6:19:38,  2.29s/it] 51%|█████████████████████████████████████████▌                                        | 10190/20117 [6:26:14<6:18:20,  2.29s/it]                                                                                                                                 {'loss': 0.1707, 'grad_norm': 0.4099638760089874, 'learning_rate': 9.873661647281836e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.48, 'epoch': 1.01}
 51%|█████████████████████████████████████████▌                                        | 10190/20117 [6:26:14<6:18:20,  2.29s/it] 51%|█████████████████████████████████████████▌                                        | 10191/20117 [6:26:17<6:22:35,  2.31s/it] 51%|█████████████████████████████████████████▌                                        | 10192/20117 [6:26:19<6:24:44,  2.33s/it] 51%|█████████████████████████████████████████▌                                        | 10193/20117 [6:26:21<6:23:35,  2.32s/it] 51%|█████████████████████████████████████████▌                                        | 10194/20117 [6:26:24<6:21:06,  2.30s/it] 51%|█████████████████████████████████████████▌                                        | 10195/20117 [6:26:26<6:16:30,  2.28s/it] 51%|█████████████████████████████████████████▌                                        | 10196/20117 [6:26:28<6:18:56,  2.29s/it] 51%|█████████████████████████████████████████▌                                        | 10197/20117 [6:26:30<6:18:49,  2.29s/it] 51%|█████████████████████████████████████████▌                                        | 10198/20117 [6:26:33<6:18:31,  2.29s/it] 51%|█████████████████████████████████████████▌                                        | 10199/20117 [6:26:35<6:20:12,  2.30s/it] 51%|█████████████████████████████████████████▌                                        | 10200/20117 [6:26:37<6:21:12,  2.31s/it]                                                                                                                                 {'loss': 0.2481, 'grad_norm': 0.5533301830291748, 'learning_rate': 9.857968439074142e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.99, 'epoch': 1.01}
 51%|█████████████████████████████████████████▌                                        | 10200/20117 [6:26:37<6:21:12,  2.31s/it] 51%|█████████████████████████████████████████▌                                        | 10201/20117 [6:26:40<6:20:05,  2.30s/it] 51%|█████████████████████████████████████████▌                                        | 10202/20117 [6:26:42<6:25:44,  2.33s/it] 51%|█████████████████████████████████████████▌                                        | 10203/20117 [6:26:44<6:27:12,  2.34s/it] 51%|█████████████████████████████████████████▌                                        | 10204/20117 [6:26:47<6:24:02,  2.32s/it] 51%|█████████████████████████████████████████▌                                        | 10205/20117 [6:26:49<6:24:36,  2.33s/it] 51%|█████████████████████████████████████████▌                                        | 10206/20117 [6:26:51<6:27:47,  2.35s/it] 51%|█████████████████████████████████████████▌                                        | 10207/20117 [6:26:54<6:45:31,  2.46s/it] 51%|█████████████████████████████████████████▌                                        | 10208/20117 [6:26:57<6:42:03,  2.43s/it] 51%|█████████████████████████████████████████▌                                        | 10209/20117 [6:26:59<6:41:00,  2.43s/it] 51%|█████████████████████████████████████████▌                                        | 10210/20117 [6:27:01<6:34:59,  2.39s/it]                                                                                                                                 {'loss': 0.1997, 'grad_norm': 0.4641655385494232, 'learning_rate': 9.842275580720205e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 392.6, 'epoch': 1.02}
 51%|█████████████████████████████████████████▌                                        | 10210/20117 [6:27:01<6:34:59,  2.39s/it] 51%|█████████████████████████████████████████▌                                        | 10211/20117 [6:27:04<6:32:08,  2.38s/it] 51%|█████████████████████████████████████████▋                                        | 10212/20117 [6:27:06<6:24:49,  2.33s/it] 51%|█████████████████████████████████████████▋                                        | 10213/20117 [6:27:08<6:21:26,  2.31s/it] 51%|█████████████████████████████████████████▋                                        | 10214/20117 [6:27:10<6:20:40,  2.31s/it] 51%|█████████████████████████████████████████▋                                        | 10215/20117 [6:27:13<6:20:11,  2.30s/it] 51%|█████████████████████████████████████████▋                                        | 10216/20117 [6:27:15<6:16:00,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10217/20117 [6:27:17<6:18:02,  2.29s/it] 51%|█████████████████████████████████████████▋                                        | 10218/20117 [6:27:20<6:16:03,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10219/20117 [6:27:22<6:17:18,  2.29s/it] 51%|█████████████████████████████████████████▋                                        | 10220/20117 [6:27:24<6:16:21,  2.28s/it]                                                                                                                                 {'loss': 0.1672, 'grad_norm': 0.5175334215164185, 'learning_rate': 9.826583110874847e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.18, 'epoch': 1.02}
 51%|█████████████████████████████████████████▋                                        | 10220/20117 [6:27:24<6:16:21,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10221/20117 [6:27:26<6:15:49,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10222/20117 [6:27:29<6:14:20,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10223/20117 [6:27:31<6:14:03,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10224/20117 [6:27:33<6:14:29,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10225/20117 [6:27:35<6:15:28,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10226/20117 [6:27:38<6:16:09,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10227/20117 [6:27:40<6:18:06,  2.29s/it] 51%|█████████████████████████████████████████▋                                        | 10228/20117 [6:27:42<6:13:29,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10229/20117 [6:27:45<6:17:51,  2.29s/it] 51%|█████████████████████████████████████████▋                                        | 10230/20117 [6:27:47<6:14:00,  2.27s/it]                                                                                                                                 {'loss': 0.152, 'grad_norm': 0.2824496924877167, 'learning_rate': 9.810891068191942e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 384.17, 'epoch': 1.02}
 51%|█████████████████████████████████████████▋                                        | 10230/20117 [6:27:47<6:14:00,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10231/20117 [6:27:49<6:14:07,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10232/20117 [6:27:51<6:13:56,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10233/20117 [6:27:54<6:16:23,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10234/20117 [6:27:56<6:17:12,  2.29s/it] 51%|█████████████████████████████████████████▋                                        | 10235/20117 [6:27:58<6:15:50,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10236/20117 [6:28:00<6:12:44,  2.26s/it] 51%|█████████████████████████████████████████▋                                        | 10237/20117 [6:28:03<6:13:35,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10238/20117 [6:28:05<6:13:49,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10239/20117 [6:28:07<6:13:34,  2.27s/it] 51%|█████████████████████████████████████████▋                                        | 10240/20117 [6:28:10<6:15:08,  2.28s/it]                                                                                                                                 {'loss': 0.1287, 'grad_norm': 0.5829957127571106, 'learning_rate': 9.795199491324302e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.4, 'epoch': 1.02}
 51%|█████████████████████████████████████████▋                                        | 10240/20117 [6:28:10<6:15:08,  2.28s/it] 51%|█████████████████████████████████████████▋                                        | 10241/20117 [6:28:12<6:11:53,  2.26s/it] 51%|█████████████████████████████████████████▋                                        | 10242/20117 [6:28:14<6:14:32,  2.28s/it] 51%|█████████████████████████████████████████▊                                        | 10243/20117 [6:28:16<6:13:01,  2.27s/it] 51%|█████████████████████████████████████████▊                                        | 10244/20117 [6:28:19<6:12:34,  2.26s/it] 51%|█████████████████████████████████████████▊                                        | 10245/20117 [6:28:21<6:13:41,  2.27s/it] 51%|█████████████████████████████████████████▊                                        | 10246/20117 [6:28:23<6:16:45,  2.29s/it] 51%|█████████████████████████████████████████▊                                        | 10247/20117 [6:28:26<6:14:41,  2.28s/it] 51%|█████████████████████████████████████████▊                                        | 10248/20117 [6:28:28<6:13:28,  2.27s/it] 51%|█████████████████████████████████████████▊                                        | 10249/20117 [6:28:30<6:10:41,  2.25s/it] 51%|█████████████████████████████████████████▊                                        | 10250/20117 [6:28:32<6:08:47,  2.24s/it]                                                                                                                                 {'loss': 0.1509, 'grad_norm': 0.33510005474090576, 'learning_rate': 9.779508418923604e-05, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.11, 'epoch': 1.02}
 51%|█████████████████████████████████████████▊                                        | 10250/20117 [6:28:32<6:08:47,  2.24s/it] 51%|█████████████████████████████████████████▊                                        | 10251/20117 [6:28:34<6:10:27,  2.25s/it] 51%|█████████████████████████████████████████▊                                        | 10252/20117 [6:28:37<6:12:46,  2.27s/it] 51%|█████████████████████████████████████████▊                                        | 10253/20117 [6:28:39<6:14:55,  2.28s/it] 51%|█████████████████████████████████████████▊                                        | 10254/20117 [6:28:41<6:17:08,  2.29s/it] 51%|█████████████████████████████████████████▊                                        | 10255/20117 [6:28:44<6:14:26,  2.28s/it] 51%|█████████████████████████████████████████▊                                        | 10256/20117 [6:28:46<6:14:27,  2.28s/it] 51%|█████████████████████████████████████████▊                                        | 10257/20117 [6:28:48<6:13:59,  2.28s/it] 51%|█████████████████████████████████████████▊                                        | 10258/20117 [6:28:50<6:13:33,  2.27s/it] 51%|█████████████████████████████████████████▊                                        | 10259/20117 [6:28:53<6:14:58,  2.28s/it] 51%|█████████████████████████████████████████▊                                        | 10260/20117 [6:28:55<6:13:21,  2.27s/it]                                                                                                                                 {'loss': 0.1762, 'grad_norm': 0.4509134888648987, 'learning_rate': 9.763817889640267e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.17, 'epoch': 1.02}
 51%|█████████████████████████████████████████▊                                        | 10260/20117 [6:28:55<6:13:21,  2.27s/it] 51%|█████████████████████████████████████████▊                                        | 10261/20117 [6:28:57<6:12:00,  2.26s/it] 51%|█████████████████████████████████████████▊                                        | 10262/20117 [6:29:00<6:31:24,  2.38s/it] 51%|█████████████████████████████████████████▊                                        | 10263/20117 [6:29:02<6:25:03,  2.34s/it] 51%|█████████████████████████████████████████▊                                        | 10264/20117 [6:29:04<6:21:44,  2.32s/it] 51%|█████████████████████████████████████████▊                                        | 10265/20117 [6:29:07<6:18:55,  2.31s/it] 51%|█████████████████████████████████████████▊                                        | 10266/20117 [6:29:09<6:17:28,  2.30s/it] 51%|█████████████████████████████████████████▊                                        | 10267/20117 [6:29:11<6:17:30,  2.30s/it] 51%|█████████████████████████████████████████▊                                        | 10268/20117 [6:29:14<6:17:15,  2.30s/it] 51%|█████████████████████████████████████████▊                                        | 10269/20117 [6:29:16<6:16:36,  2.29s/it] 51%|█████████████████████████████████████████▊                                        | 10270/20117 [6:29:18<6:18:13,  2.30s/it]                                                                                                                                 {'loss': 0.1569, 'grad_norm': 0.5804489254951477, 'learning_rate': 9.74812794212339e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 321.78, 'epoch': 1.02}
 51%|█████████████████████████████████████████▊                                        | 10270/20117 [6:29:18<6:18:13,  2.30s/it] 51%|█████████████████████████████████████████▊                                        | 10271/20117 [6:29:21<6:18:02,  2.30s/it] 51%|█████████████████████████████████████████▊                                        | 10272/20117 [6:29:23<6:16:04,  2.29s/it] 51%|█████████████████████████████████████████▊                                        | 10273/20117 [6:29:25<6:18:59,  2.31s/it] 51%|█████████████████████████████████████████▉                                        | 10274/20117 [6:29:27<6:16:53,  2.30s/it] 51%|█████████████████████████████████████████▉                                        | 10275/20117 [6:29:30<6:15:15,  2.29s/it] 51%|█████████████████████████████████████████▉                                        | 10276/20117 [6:29:32<6:18:16,  2.31s/it] 51%|█████████████████████████████████████████▉                                        | 10277/20117 [6:29:34<6:16:43,  2.30s/it] 51%|█████████████████████████████████████████▉                                        | 10278/20117 [6:29:37<6:16:41,  2.30s/it] 51%|█████████████████████████████████████████▉                                        | 10279/20117 [6:29:39<6:13:43,  2.28s/it] 51%|█████████████████████████████████████████▉                                        | 10280/20117 [6:29:41<6:12:17,  2.27s/it]                                                                                                                                 {'loss': 0.1669, 'grad_norm': 0.5096241235733032, 'learning_rate': 9.732438615020623e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.99, 'epoch': 1.02}
 51%|█████████████████████████████████████████▉                                        | 10280/20117 [6:29:41<6:12:17,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10281/20117 [6:29:43<6:14:39,  2.29s/it] 51%|█████████████████████████████████████████▉                                        | 10282/20117 [6:29:46<6:10:26,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10283/20117 [6:29:48<6:15:26,  2.29s/it] 51%|█████████████████████████████████████████▉                                        | 10284/20117 [6:29:50<6:14:11,  2.28s/it] 51%|█████████████████████████████████████████▉                                        | 10285/20117 [6:29:52<6:10:36,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10286/20117 [6:29:55<6:10:06,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10287/20117 [6:29:57<6:10:33,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10288/20117 [6:29:59<6:08:47,  2.25s/it] 51%|█████████████████████████████████████████▉                                        | 10289/20117 [6:30:01<6:10:14,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10290/20117 [6:30:04<6:12:49,  2.28s/it]                                                                                                                                 {'loss': 0.1857, 'grad_norm': 0.6617136001586914, 'learning_rate': 9.716749946978102e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 260.5, 'epoch': 1.02}
 51%|█████████████████████████████████████████▉                                        | 10290/20117 [6:30:04<6:12:49,  2.28s/it] 51%|█████████████████████████████████████████▉                                        | 10291/20117 [6:30:06<6:12:33,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10292/20117 [6:30:08<6:13:46,  2.28s/it] 51%|█████████████████████████████████████████▉                                        | 10293/20117 [6:30:11<6:11:50,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10294/20117 [6:30:13<6:11:33,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10295/20117 [6:30:15<6:10:45,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10296/20117 [6:30:17<6:10:24,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10297/20117 [6:30:20<6:10:58,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10298/20117 [6:30:22<6:13:15,  2.28s/it] 51%|█████████████████████████████████████████▉                                        | 10299/20117 [6:30:24<6:12:13,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10300/20117 [6:30:26<6:10:35,  2.27s/it]                                                                                                                                 {'loss': 0.1787, 'grad_norm': 0.5293213725090027, 'learning_rate': 9.701061976640323e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.76, 'epoch': 1.02}
 51%|█████████████████████████████████████████▉                                        | 10300/20117 [6:30:26<6:10:35,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10301/20117 [6:30:29<6:11:43,  2.27s/it] 51%|█████████████████████████████████████████▉                                        | 10302/20117 [6:30:31<6:09:31,  2.26s/it] 51%|█████████████████████████████████████████▉                                        | 10303/20117 [6:30:33<6:14:10,  2.29s/it] 51%|██████████████████████████████████████████                                        | 10304/20117 [6:30:36<6:16:38,  2.30s/it] 51%|██████████████████████████████████████████                                        | 10305/20117 [6:30:38<6:12:19,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10306/20117 [6:30:40<6:13:33,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10307/20117 [6:30:42<6:13:07,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10308/20117 [6:30:45<6:12:44,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10309/20117 [6:30:47<6:11:21,  2.27s/it] 51%|██████████████████████████████████████████                                        | 10310/20117 [6:30:49<6:09:27,  2.26s/it]                                                                                                                                 {'loss': 0.1412, 'grad_norm': 0.6056047081947327, 'learning_rate': 9.685374742650083e-05, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.37, 'epoch': 1.02}
 51%|██████████████████████████████████████████                                        | 10310/20117 [6:30:49<6:09:27,  2.26s/it] 51%|██████████████████████████████████████████                                        | 10311/20117 [6:30:52<6:10:30,  2.27s/it] 51%|██████████████████████████████████████████                                        | 10312/20117 [6:30:54<6:08:29,  2.25s/it] 51%|██████████████████████████████████████████                                        | 10313/20117 [6:30:56<6:24:09,  2.35s/it] 51%|██████████████████████████████████████████                                        | 10314/20117 [6:30:59<6:18:38,  2.32s/it] 51%|██████████████████████████████████████████                                        | 10315/20117 [6:31:01<6:12:08,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10316/20117 [6:31:03<6:11:18,  2.27s/it] 51%|██████████████████████████████████████████                                        | 10317/20117 [6:31:05<6:13:55,  2.29s/it] 51%|██████████████████████████████████████████                                        | 10318/20117 [6:31:08<6:12:58,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10319/20117 [6:31:10<6:12:45,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10320/20117 [6:31:12<6:12:59,  2.28s/it]                                                                                                                                 {'loss': 0.1149, 'grad_norm': 0.2792462110519409, 'learning_rate': 9.669688283648344e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 269.87, 'epoch': 1.03}
 51%|██████████████████████████████████████████                                        | 10320/20117 [6:31:12<6:12:59,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10321/20117 [6:31:14<6:15:16,  2.30s/it] 51%|██████████████████████████████████████████                                        | 10322/20117 [6:31:17<6:14:26,  2.29s/it] 51%|██████████████████████████████████████████                                        | 10323/20117 [6:31:19<6:13:39,  2.29s/it] 51%|██████████████████████████████████████████                                        | 10324/20117 [6:31:21<6:12:30,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10325/20117 [6:31:24<6:10:11,  2.27s/it] 51%|██████████████████████████████████████████                                        | 10326/20117 [6:31:26<6:06:39,  2.25s/it] 51%|██████████████████████████████████████████                                        | 10327/20117 [6:31:28<6:07:45,  2.25s/it] 51%|██████████████████████████████████████████                                        | 10328/20117 [6:31:30<6:08:53,  2.26s/it] 51%|██████████████████████████████████████████                                        | 10329/20117 [6:31:33<6:07:58,  2.26s/it] 51%|██████████████████████████████████████████                                        | 10330/20117 [6:31:35<6:11:27,  2.28s/it]                                                                                                                                 {'loss': 0.1829, 'grad_norm': 0.4464641511440277, 'learning_rate': 9.654002638274176e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 411.8, 'epoch': 1.03}
 51%|██████████████████████████████████████████                                        | 10330/20117 [6:31:35<6:11:27,  2.28s/it] 51%|██████████████████████████████████████████                                        | 10331/20117 [6:31:37<6:10:12,  2.27s/it] 51%|██████████████████████████████████████████                                        | 10332/20117 [6:31:39<6:08:40,  2.26s/it] 51%|██████████████████████████████████████████                                        | 10333/20117 [6:31:42<6:08:16,  2.26s/it] 51%|██████████████████████████████████████████                                        | 10334/20117 [6:31:44<6:06:01,  2.24s/it] 51%|██████████████████████████████████████████▏                                       | 10335/20117 [6:31:46<6:00:35,  2.21s/it] 51%|██████████████████████████████████████████▏                                       | 10336/20117 [6:31:48<6:04:05,  2.23s/it] 51%|██████████████████████████████████████████▏                                       | 10337/20117 [6:31:50<5:59:42,  2.21s/it] 51%|██████████████████████████████████████████▏                                       | 10338/20117 [6:31:53<5:55:52,  2.18s/it] 51%|██████████████████████████████████████████▏                                       | 10339/20117 [6:31:55<5:56:48,  2.19s/it] 51%|██████████████████████████████████████████▏                                       | 10340/20117 [6:31:57<6:04:53,  2.24s/it]                                                                                                                                 {'loss': 0.1695, 'grad_norm': 0.4715180993080139, 'learning_rate': 9.638317845164639e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 271.86, 'epoch': 1.03}
 51%|██████████████████████████████████████████▏                                       | 10340/20117 [6:31:57<6:04:53,  2.24s/it] 51%|██████████████████████████████████████████▏                                       | 10341/20117 [6:31:59<6:10:03,  2.27s/it] 51%|██████████████████████████████████████████▏                                       | 10342/20117 [6:32:02<6:11:23,  2.28s/it] 51%|██████████████████████████████████████████▏                                       | 10343/20117 [6:32:04<6:11:01,  2.28s/it] 51%|██████████████████████████████████████████▏                                       | 10344/20117 [6:32:06<6:12:16,  2.29s/it] 51%|██████████████████████████████████████████▏                                       | 10345/20117 [6:32:09<6:16:25,  2.31s/it] 51%|██████████████████████████████████████████▏                                       | 10346/20117 [6:32:11<6:20:24,  2.34s/it] 51%|██████████████████████████████████████████▏                                       | 10347/20117 [6:32:13<6:15:08,  2.30s/it] 51%|██████████████████████████████████████████▏                                       | 10348/20117 [6:32:16<6:12:59,  2.29s/it] 51%|██████████████████████████████████████████▏                                       | 10349/20117 [6:32:18<6:07:24,  2.26s/it] 51%|██████████████████████████████████████████▏                                       | 10350/20117 [6:32:20<6:02:16,  2.23s/it]                                                                                                                                 {'loss': 0.1405, 'grad_norm': 0.43288466334342957, 'learning_rate': 9.622633942954693e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.24, 'epoch': 1.03}
 51%|██████████████████████████████████████████▏                                       | 10350/20117 [6:32:20<6:02:16,  2.23s/it] 51%|██████████████████████████████████████████▏                                       | 10351/20117 [6:32:22<5:58:55,  2.21s/it] 51%|██████████████████████████████████████████▏                                       | 10352/20117 [6:32:24<5:55:05,  2.18s/it] 51%|██████████████████████████████████████████▏                                       | 10353/20117 [6:32:26<5:57:18,  2.20s/it] 51%|██████████████████████████████████████████▏                                       | 10354/20117 [6:32:29<6:03:29,  2.23s/it] 51%|██████████████████████████████████████████▏                                       | 10355/20117 [6:32:31<6:08:13,  2.26s/it] 51%|██████████████████████████████████████████▏                                       | 10356/20117 [6:32:33<6:14:01,  2.30s/it] 51%|██████████████████████████████████████████▏                                       | 10357/20117 [6:32:36<6:14:08,  2.30s/it] 51%|██████████████████████████████████████████▏                                       | 10358/20117 [6:32:38<6:16:15,  2.31s/it] 51%|██████████████████████████████████████████▏                                       | 10359/20117 [6:32:40<6:16:52,  2.32s/it] 51%|██████████████████████████████████████████▏                                       | 10360/20117 [6:32:43<6:15:56,  2.31s/it]                                                                                                                                 {'loss': 0.1613, 'grad_norm': 0.4133097529411316, 'learning_rate': 9.606950970277106e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.26, 'epoch': 1.03}
 51%|██████████████████████████████████████████▏                                       | 10360/20117 [6:32:43<6:15:56,  2.31s/it] 52%|██████████████████████████████████████████▏                                       | 10361/20117 [6:32:45<6:10:24,  2.28s/it] 52%|██████████████████████████████████████████▏                                       | 10362/20117 [6:32:47<6:08:17,  2.27s/it] 52%|██████████████████████████████████████████▏                                       | 10363/20117 [6:32:49<6:06:10,  2.25s/it] 52%|██████████████████████████████████████████▏                                       | 10364/20117 [6:32:52<6:06:34,  2.26s/it] 52%|██████████████████████████████████████████▏                                       | 10365/20117 [6:32:54<6:20:13,  2.34s/it] 52%|██████████████████████████████████████████▎                                       | 10366/20117 [6:32:57<6:19:46,  2.34s/it] 52%|██████████████████████████████████████████▎                                       | 10367/20117 [6:32:59<6:18:05,  2.33s/it] 52%|██████████████████████████████████████████▎                                       | 10368/20117 [6:33:01<6:15:42,  2.31s/it] 52%|██████████████████████████████████████████▎                                       | 10369/20117 [6:33:03<6:14:00,  2.30s/it] 52%|██████████████████████████████████████████▎                                       | 10370/20117 [6:33:06<6:11:34,  2.29s/it]                                                                                                                                 {'loss': 0.1809, 'grad_norm': 0.36935463547706604, 'learning_rate': 9.591268965762348e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.08, 'epoch': 1.03}
 52%|██████████████████████████████████████████▎                                       | 10370/20117 [6:33:06<6:11:34,  2.29s/it] 52%|██████████████████████████████████████████▎                                       | 10371/20117 [6:33:08<6:10:15,  2.28s/it] 52%|██████████████████████████████████████████▎                                       | 10372/20117 [6:33:10<6:11:51,  2.29s/it] 52%|██████████████████████████████████████████▎                                       | 10373/20117 [6:33:12<6:08:55,  2.27s/it] 52%|██████████████████████████████████████████▎                                       | 10374/20117 [6:33:15<6:05:26,  2.25s/it] 52%|██████████████████████████████████████████▎                                       | 10375/20117 [6:33:17<6:05:17,  2.25s/it] 52%|██████████████████████████████████████████▎                                       | 10376/20117 [6:33:19<6:06:28,  2.26s/it] 52%|██████████████████████████████████████████▎                                       | 10377/20117 [6:33:21<6:04:01,  2.24s/it] 52%|██████████████████████████████████████████▎                                       | 10378/20117 [6:33:24<6:06:20,  2.26s/it] 52%|██████████████████████████████████████████▎                                       | 10379/20117 [6:33:26<6:13:15,  2.30s/it] 52%|██████████████████████████████████████████▎                                       | 10380/20117 [6:33:28<6:14:47,  2.31s/it]                                                                                                                                 {'loss': 0.1507, 'grad_norm': 0.6092913746833801, 'learning_rate': 9.57558796803852e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 296.26, 'epoch': 1.03}
 52%|██████████████████████████████████████████▎                                       | 10380/20117 [6:33:28<6:14:47,  2.31s/it] 52%|██████████████████████████████████████████▎                                       | 10381/20117 [6:33:31<6:14:46,  2.31s/it] 52%|██████████████████████████████████████████▎                                       | 10382/20117 [6:33:33<6:12:36,  2.30s/it] 52%|██████████████████████████████████████████▎                                       | 10383/20117 [6:33:35<6:09:43,  2.28s/it] 52%|██████████████████████████████████████████▎                                       | 10384/20117 [6:33:37<6:07:15,  2.26s/it] 52%|██████████████████████████████████████████▎                                       | 10385/20117 [6:33:40<6:07:36,  2.27s/it] 52%|██████████████████████████████████████████▎                                       | 10386/20117 [6:33:42<6:08:20,  2.27s/it] 52%|██████████████████████████████████████████▎                                       | 10387/20117 [6:33:44<6:07:36,  2.27s/it] 52%|██████████████████████████████████████████▎                                       | 10388/20117 [6:33:47<6:09:07,  2.28s/it] 52%|██████████████████████████████████████████▎                                       | 10389/20117 [6:33:49<6:09:29,  2.28s/it] 52%|██████████████████████████████████████████▎                                       | 10390/20117 [6:33:51<6:06:46,  2.26s/it]                                                                                                                                 {'loss': 0.1674, 'grad_norm': 0.24895969033241272, 'learning_rate': 9.559908015731223e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.56, 'epoch': 1.03}
 52%|██████████████████████████████████████████▎                                       | 10390/20117 [6:33:51<6:06:46,  2.26s/it] 52%|██████████████████████████████████████████▎                                       | 10391/20117 [6:33:53<6:09:46,  2.28s/it] 52%|██████████████████████████████████████████▎                                       | 10392/20117 [6:33:56<6:09:51,  2.28s/it] 52%|██████████████████████████████████████████▎                                       | 10393/20117 [6:33:58<6:10:09,  2.28s/it] 52%|██████████████████████████████████████████▎                                       | 10394/20117 [6:34:00<6:07:27,  2.27s/it] 52%|██████████████████████████████████████████▎                                       | 10395/20117 [6:34:02<6:07:09,  2.27s/it] 52%|██████████████████████████████████████████▍                                       | 10396/20117 [6:34:05<6:08:13,  2.27s/it] 52%|██████████████████████████████████████████▍                                       | 10397/20117 [6:34:07<6:08:11,  2.27s/it] 52%|██████████████████████████████████████████▍                                       | 10398/20117 [6:34:09<6:12:07,  2.30s/it] 52%|██████████████████████████████████████████▍                                       | 10399/20117 [6:34:12<6:11:16,  2.29s/it] 52%|██████████████████████████████████████████▍                                       | 10400/20117 [6:34:14<6:12:30,  2.30s/it]                                                                                                                                 {'loss': 0.1683, 'grad_norm': 0.3319539725780487, 'learning_rate': 9.544229147463502e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.18, 'epoch': 1.03}
 52%|██████████████████████████████████████████▍                                       | 10400/20117 [6:34:14<6:12:30,  2.30s/it] 52%|██████████████████████████████████████████▍                                       | 10401/20117 [6:34:16<6:12:16,  2.30s/it] 52%|██████████████████████████████████████████▍                                       | 10402/20117 [6:34:18<6:09:24,  2.28s/it] 52%|██████████████████████████████████████████▍                                       | 10403/20117 [6:34:21<6:08:26,  2.28s/it] 52%|██████████████████████████████████████████▍                                       | 10404/20117 [6:34:23<6:09:10,  2.28s/it] 52%|██████████████████████████████████████████▍                                       | 10405/20117 [6:34:25<6:06:55,  2.27s/it] 52%|██████████████████████████████████████████▍                                       | 10406/20117 [6:34:28<6:06:11,  2.26s/it] 52%|██████████████████████████████████████████▍                                       | 10407/20117 [6:34:30<6:05:49,  2.26s/it] 52%|██████████████████████████████████████████▍                                       | 10408/20117 [6:34:32<6:05:28,  2.26s/it] 52%|██████████████████████████████████████████▍                                       | 10409/20117 [6:34:34<6:03:12,  2.24s/it] 52%|██████████████████████████████████████████▍                                       | 10410/20117 [6:34:36<6:03:04,  2.24s/it]                                                                                                                                 {'loss': 0.1347, 'grad_norm': 0.3731245994567871, 'learning_rate': 9.528551401855718e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.91, 'epoch': 1.03}
 52%|██████████████████████████████████████████▍                                       | 10410/20117 [6:34:36<6:03:04,  2.24s/it] 52%|██████████████████████████████████████████▍                                       | 10411/20117 [6:34:39<6:05:25,  2.26s/it] 52%|██████████████████████████████████████████▍                                       | 10412/20117 [6:34:41<6:09:10,  2.28s/it] 52%|██████████████████████████████████████████▍                                       | 10413/20117 [6:34:43<6:11:43,  2.30s/it] 52%|██████████████████████████████████████████▍                                       | 10414/20117 [6:34:46<6:09:23,  2.28s/it] 52%|██████████████████████████████████████████▍                                       | 10415/20117 [6:34:48<6:06:46,  2.27s/it] 52%|██████████████████████████████████████████▍                                       | 10416/20117 [6:34:50<6:10:01,  2.29s/it] 52%|██████████████████████████████████████████▍                                       | 10417/20117 [6:34:53<6:25:48,  2.39s/it] 52%|██████████████████████████████████████████▍                                       | 10418/20117 [6:34:55<6:19:24,  2.35s/it] 52%|██████████████████████████████████████████▍                                       | 10419/20117 [6:34:57<6:15:03,  2.32s/it] 52%|██████████████████████████████████████████▍                                       | 10420/20117 [6:35:00<6:08:15,  2.28s/it]                                                                                                                                 {'loss': 0.1558, 'grad_norm': 0.42913174629211426, 'learning_rate': 9.512874817525474e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.64, 'epoch': 1.04}
 52%|██████████████████████████████████████████▍                                       | 10420/20117 [6:35:00<6:08:15,  2.28s/it] 52%|██████████████████████████████████████████▍                                       | 10421/20117 [6:35:02<6:06:40,  2.27s/it] 52%|██████████████████████████████████████████▍                                       | 10422/20117 [6:35:04<6:05:17,  2.26s/it] 52%|██████████████████████████████████████████▍                                       | 10423/20117 [6:35:06<6:05:47,  2.26s/it] 52%|██████████████████████████████████████████▍                                       | 10424/20117 [6:35:09<6:04:14,  2.25s/it] 52%|██████████████████████████████████████████▍                                       | 10425/20117 [6:35:11<6:06:36,  2.27s/it] 52%|██████████████████████████████████████████▍                                       | 10426/20117 [6:35:13<6:05:00,  2.26s/it] 52%|██████████████████████████████████████████▌                                       | 10427/20117 [6:35:15<6:05:52,  2.27s/it] 52%|██████████████████████████████████████████▌                                       | 10428/20117 [6:35:18<6:04:46,  2.26s/it] 52%|██████████████████████████████████████████▌                                       | 10429/20117 [6:35:20<6:04:57,  2.26s/it] 52%|██████████████████████████████████████████▌                                       | 10430/20117 [6:35:22<6:05:34,  2.26s/it]                                                                                                                                 {'loss': 0.1816, 'grad_norm': 0.5695579648017883, 'learning_rate': 9.49719943308751e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.9, 'epoch': 1.04}
 52%|██████████████████████████████████████████▌                                       | 10430/20117 [6:35:22<6:05:34,  2.26s/it] 52%|██████████████████████████████████████████▌                                       | 10431/20117 [6:35:24<6:06:41,  2.27s/it] 52%|██████████████████████████████████████████▌                                       | 10432/20117 [6:35:27<6:08:21,  2.28s/it] 52%|██████████████████████████████████████████▌                                       | 10433/20117 [6:35:29<6:11:55,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10434/20117 [6:35:31<6:10:30,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10435/20117 [6:35:34<6:09:49,  2.29s/it] 52%|██████████████████████████████████████████▌                                       | 10436/20117 [6:35:36<6:08:17,  2.28s/it] 52%|██████████████████████████████████████████▌                                       | 10437/20117 [6:35:38<6:07:20,  2.28s/it] 52%|██████████████████████████████████████████▌                                       | 10438/20117 [6:35:40<6:05:03,  2.26s/it] 52%|██████████████████████████████████████████▌                                       | 10439/20117 [6:35:43<6:01:59,  2.24s/it] 52%|██████████████████████████████████████████▌                                       | 10440/20117 [6:35:45<6:05:24,  2.27s/it]                                                                                                                                 {'loss': 0.1885, 'grad_norm': 0.659853458404541, 'learning_rate': 9.481525287153616e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.22, 'epoch': 1.04}
 52%|██████████████████████████████████████████▌                                       | 10440/20117 [6:35:45<6:05:24,  2.27s/it] 52%|██████████████████████████████████████████▌                                       | 10441/20117 [6:35:47<6:08:17,  2.28s/it] 52%|██████████████████████████████████████████▌                                       | 10442/20117 [6:35:50<6:10:25,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10443/20117 [6:35:52<6:10:55,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10444/20117 [6:35:54<6:10:43,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10445/20117 [6:35:56<6:09:50,  2.29s/it] 52%|██████████████████████████████████████████▌                                       | 10446/20117 [6:35:59<6:10:20,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10447/20117 [6:36:01<6:10:53,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10448/20117 [6:36:03<6:10:12,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10449/20117 [6:36:06<6:08:40,  2.29s/it] 52%|██████████████████████████████████████████▌                                       | 10450/20117 [6:36:08<6:09:23,  2.29s/it]                                                                                                                                 {'loss': 0.192, 'grad_norm': 0.5355104207992554, 'learning_rate': 9.465852418332518e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 422.56, 'epoch': 1.04}
 52%|██████████████████████████████████████████▌                                       | 10450/20117 [6:36:08<6:09:23,  2.29s/it] 52%|██████████████████████████████████████████▌                                       | 10451/20117 [6:36:10<6:09:09,  2.29s/it] 52%|██████████████████████████████████████████▌                                       | 10452/20117 [6:36:13<6:11:57,  2.31s/it] 52%|██████████████████████████████████████████▌                                       | 10453/20117 [6:36:15<6:10:54,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10454/20117 [6:36:17<6:07:55,  2.28s/it] 52%|██████████████████████████████████████████▌                                       | 10455/20117 [6:36:19<6:10:24,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10456/20117 [6:36:22<6:10:48,  2.30s/it] 52%|██████████████████████████████████████████▌                                       | 10457/20117 [6:36:24<6:10:00,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10458/20117 [6:36:26<6:07:44,  2.28s/it] 52%|██████████████████████████████████████████▋                                       | 10459/20117 [6:36:29<6:09:59,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10460/20117 [6:36:31<6:09:17,  2.29s/it]                                                                                                                                 {'loss': 0.1468, 'grad_norm': 0.6618303060531616, 'learning_rate': 9.450180865229807e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.93, 'epoch': 1.04}
 52%|██████████████████████████████████████████▋                                       | 10460/20117 [6:36:31<6:09:17,  2.29s/it] 52%|██████████████████████████████████████████▋                                       | 10461/20117 [6:36:33<6:11:59,  2.31s/it] 52%|██████████████████████████████████████████▋                                       | 10462/20117 [6:36:36<6:10:39,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10463/20117 [6:36:38<6:11:15,  2.31s/it] 52%|██████████████████████████████████████████▋                                       | 10464/20117 [6:36:40<6:09:45,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10465/20117 [6:36:42<6:10:43,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10466/20117 [6:36:45<6:12:13,  2.31s/it] 52%|██████████████████████████████████████████▋                                       | 10467/20117 [6:36:47<6:09:52,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10468/20117 [6:36:50<6:30:14,  2.43s/it] 52%|██████████████████████████████████████████▋                                       | 10469/20117 [6:36:52<6:21:19,  2.37s/it] 52%|██████████████████████████████████████████▋                                       | 10470/20117 [6:36:54<6:18:57,  2.36s/it]                                                                                                                                 {'loss': 0.1703, 'grad_norm': 0.4398798644542694, 'learning_rate': 9.434510666447838e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.46, 'epoch': 1.04}
 52%|██████████████████████████████████████████▋                                       | 10470/20117 [6:36:54<6:18:57,  2.36s/it] 52%|██████████████████████████████████████████▋                                       | 10471/20117 [6:36:57<6:17:12,  2.35s/it] 52%|██████████████████████████████████████████▋                                       | 10472/20117 [6:36:59<6:15:11,  2.33s/it] 52%|██████████████████████████████████████████▋                                       | 10473/20117 [6:37:01<6:09:19,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10474/20117 [6:37:03<6:06:44,  2.28s/it] 52%|██████████████████████████████████████████▋                                       | 10475/20117 [6:37:06<6:03:27,  2.26s/it] 52%|██████████████████████████████████████████▋                                       | 10476/20117 [6:37:08<6:05:06,  2.27s/it] 52%|██████████████████████████████████████████▋                                       | 10477/20117 [6:37:10<6:10:08,  2.30s/it] 52%|██████████████████████████████████████████▋                                       | 10478/20117 [6:37:13<6:07:06,  2.29s/it] 52%|██████████████████████████████████████████▋                                       | 10479/20117 [6:37:15<6:04:13,  2.27s/it] 52%|██████████████████████████████████████████▋                                       | 10480/20117 [6:37:17<6:02:23,  2.26s/it]                                                                                                                                 {'loss': 0.1676, 'grad_norm': 0.25756803154945374, 'learning_rate': 9.41884186058561e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.3, 'epoch': 1.04}
 52%|██████████████████████████████████████████▋                                       | 10480/20117 [6:37:17<6:02:23,  2.26s/it] 52%|██████████████████████████████████████████▋                                       | 10481/20117 [6:37:19<6:01:09,  2.25s/it] 52%|██████████████████████████████████████████▋                                       | 10482/20117 [6:37:22<6:01:14,  2.25s/it] 52%|██████████████████████████████████████████▋                                       | 10483/20117 [6:37:24<6:04:10,  2.27s/it] 52%|██████████████████████████████████████████▋                                       | 10484/20117 [6:37:26<6:03:31,  2.26s/it] 52%|██████████████████████████████████████████▋                                       | 10485/20117 [6:37:28<6:05:43,  2.28s/it] 52%|██████████████████████████████████████████▋                                       | 10486/20117 [6:37:31<6:05:02,  2.27s/it] 52%|██████████████████████████████████████████▋                                       | 10487/20117 [6:37:33<6:06:34,  2.28s/it] 52%|██████████████████████████████████████████▊                                       | 10488/20117 [6:37:35<6:08:36,  2.30s/it] 52%|██████████████████████████████████████████▊                                       | 10489/20117 [6:37:38<6:07:15,  2.29s/it] 52%|██████████████████████████████████████████▊                                       | 10490/20117 [6:37:40<6:13:32,  2.33s/it]                                                                                                                                 {'loss': 0.1372, 'grad_norm': 0.2897964417934418, 'learning_rate': 9.403174486238714e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 315.04, 'epoch': 1.04}
 52%|██████████████████████████████████████████▊                                       | 10490/20117 [6:37:40<6:13:32,  2.33s/it] 52%|██████████████████████████████████████████▊                                       | 10491/20117 [6:37:42<6:11:43,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10492/20117 [6:37:45<6:10:45,  2.31s/it] 52%|██████████████████████████████████████████▊                                       | 10493/20117 [6:37:47<6:12:58,  2.33s/it] 52%|██████████████████████████████████████████▊                                       | 10494/20117 [6:37:49<6:10:21,  2.31s/it] 52%|██████████████████████████████████████████▊                                       | 10495/20117 [6:37:51<6:07:20,  2.29s/it] 52%|██████████████████████████████████████████▊                                       | 10496/20117 [6:37:54<6:07:40,  2.29s/it] 52%|██████████████████████████████████████████▊                                       | 10497/20117 [6:37:56<6:08:19,  2.30s/it] 52%|██████████████████████████████████████████▊                                       | 10498/20117 [6:37:58<6:09:06,  2.30s/it] 52%|██████████████████████████████████████████▊                                       | 10499/20117 [6:38:01<6:09:36,  2.31s/it] 52%|██████████████████████████████████████████▊                                       | 10500/20117 [6:38:03<6:10:55,  2.31s/it]                                                                                                                                 {'loss': 0.1843, 'grad_norm': 0.2930709719657898, 'learning_rate': 9.387508581999197e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.97, 'epoch': 1.04}
 52%|██████████████████████████████████████████▊                                       | 10500/20117 [6:38:03<6:10:55,  2.31s/it] 52%|██████████████████████████████████████████▊                                       | 10501/20117 [6:38:05<6:11:18,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10502/20117 [6:38:08<6:11:08,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10503/20117 [6:38:10<6:10:11,  2.31s/it] 52%|██████████████████████████████████████████▊                                       | 10504/20117 [6:38:12<6:10:16,  2.31s/it] 52%|██████████████████████████████████████████▊                                       | 10505/20117 [6:38:15<6:10:02,  2.31s/it] 52%|██████████████████████████████████████████▊                                       | 10506/20117 [6:38:17<6:11:55,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10507/20117 [6:38:19<6:12:07,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10508/20117 [6:38:22<6:08:30,  2.30s/it] 52%|██████████████████████████████████████████▊                                       | 10509/20117 [6:38:24<6:07:43,  2.30s/it] 52%|██████████████████████████████████████████▊                                       | 10510/20117 [6:38:26<6:10:44,  2.32s/it]                                                                                                                                 {'loss': 0.2168, 'grad_norm': 0.3903422951698303, 'learning_rate': 9.371844186455501e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 377.16, 'epoch': 1.04}
 52%|██████████████████████████████████████████▊                                       | 10510/20117 [6:38:26<6:10:44,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10511/20117 [6:38:28<6:11:51,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10512/20117 [6:38:31<6:11:06,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10513/20117 [6:38:33<6:13:38,  2.33s/it] 52%|██████████████████████████████████████████▊                                       | 10514/20117 [6:38:36<6:13:58,  2.34s/it] 52%|██████████████████████████████████████████▊                                       | 10515/20117 [6:38:38<6:12:57,  2.33s/it] 52%|██████████████████████████████████████████▊                                       | 10516/20117 [6:38:40<6:13:56,  2.34s/it] 52%|██████████████████████████████████████████▊                                       | 10517/20117 [6:38:42<6:11:24,  2.32s/it] 52%|██████████████████████████████████████████▊                                       | 10518/20117 [6:38:45<6:12:20,  2.33s/it] 52%|██████████████████████████████████████████▉                                       | 10519/20117 [6:38:47<6:11:38,  2.32s/it] 52%|██████████████████████████████████████████▉                                       | 10520/20117 [6:38:49<6:12:08,  2.33s/it]                                                                                                                                 {'loss': 0.1676, 'grad_norm': 0.3957577049732208, 'learning_rate': 9.356181338192332e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.27, 'epoch': 1.05}
 52%|██████████████████████████████████████████▉                                       | 10520/20117 [6:38:49<6:12:08,  2.33s/it] 52%|██████████████████████████████████████████▉                                       | 10521/20117 [6:38:52<6:05:51,  2.29s/it] 52%|██████████████████████████████████████████▉                                       | 10522/20117 [6:38:54<6:16:05,  2.35s/it] 52%|██████████████████████████████████████████▉                                       | 10523/20117 [6:38:56<6:10:57,  2.32s/it] 52%|██████████████████████████████████████████▉                                       | 10524/20117 [6:38:59<6:06:04,  2.29s/it] 52%|██████████████████████████████████████████▉                                       | 10525/20117 [6:39:01<5:59:32,  2.25s/it] 52%|██████████████████████████████████████████▉                                       | 10526/20117 [6:39:03<5:58:32,  2.24s/it] 52%|██████████████████████████████████████████▉                                       | 10527/20117 [6:39:05<5:58:00,  2.24s/it] 52%|██████████████████████████████████████████▉                                       | 10528/20117 [6:39:07<5:57:09,  2.23s/it] 52%|██████████████████████████████████████████▉                                       | 10529/20117 [6:39:10<5:58:37,  2.24s/it] 52%|██████████████████████████████████████████▉                                       | 10530/20117 [6:39:12<6:03:39,  2.28s/it]                                                                                                                                 {'loss': 0.1574, 'grad_norm': 0.5264449119567871, 'learning_rate': 9.340520075790606e-05, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.3, 'epoch': 1.05}
 52%|██████████████████████████████████████████▉                                       | 10530/20117 [6:39:12<6:03:39,  2.28s/it] 52%|██████████████████████████████████████████▉                                       | 10531/20117 [6:39:14<6:04:50,  2.28s/it] 52%|██████████████████████████████████████████▉                                       | 10532/20117 [6:39:17<6:03:34,  2.28s/it] 52%|██████████████████████████████████████████▉                                       | 10533/20117 [6:39:19<6:03:11,  2.27s/it] 52%|██████████████████████████████████████████▉                                       | 10534/20117 [6:39:21<6:03:08,  2.27s/it] 52%|██████████████████████████████████████████▉                                       | 10535/20117 [6:39:23<6:00:54,  2.26s/it] 52%|██████████████████████████████████████████▉                                       | 10536/20117 [6:39:26<5:59:52,  2.25s/it] 52%|██████████████████████████████████████████▉                                       | 10537/20117 [6:39:28<5:59:55,  2.25s/it] 52%|██████████████████████████████████████████▉                                       | 10538/20117 [6:39:30<5:58:09,  2.24s/it] 52%|██████████████████████████████████████████▉                                       | 10539/20117 [6:39:32<5:57:47,  2.24s/it] 52%|██████████████████████████████████████████▉                                       | 10540/20117 [6:39:35<5:54:48,  2.22s/it]                                                                                                                                 {'loss': 0.1808, 'grad_norm': 0.5075907111167908, 'learning_rate': 9.324860437827312e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.27, 'epoch': 1.05}
 52%|██████████████████████████████████████████▉                                       | 10540/20117 [6:39:35<5:54:48,  2.22s/it] 52%|██████████████████████████████████████████▉                                       | 10541/20117 [6:39:37<5:52:05,  2.21s/it] 52%|██████████████████████████████████████████▉                                       | 10542/20117 [6:39:39<5:53:08,  2.21s/it] 52%|██████████████████████████████████████████▉                                       | 10543/20117 [6:39:41<5:58:44,  2.25s/it] 52%|██████████████████████████████████████████▉                                       | 10544/20117 [6:39:44<5:58:33,  2.25s/it] 52%|██████████████████████████████████████████▉                                       | 10545/20117 [6:39:46<6:01:43,  2.27s/it] 52%|██████████████████████████████████████████▉                                       | 10546/20117 [6:39:48<6:03:38,  2.28s/it] 52%|██████████████████████████████████████████▉                                       | 10547/20117 [6:39:50<6:02:04,  2.27s/it] 52%|██████████████████████████████████████████▉                                       | 10548/20117 [6:39:53<6:00:49,  2.26s/it] 52%|██████████████████████████████████████████▉                                       | 10549/20117 [6:39:55<6:01:36,  2.27s/it] 52%|███████████████████████████████████████████                                       | 10550/20117 [6:39:57<6:03:20,  2.28s/it]                                                                                                                                 {'loss': 0.1528, 'grad_norm': 0.28377336263656616, 'learning_rate': 9.309202462875457e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.22, 'epoch': 1.05}
 52%|███████████████████████████████████████████                                       | 10550/20117 [6:39:57<6:03:20,  2.28s/it] 52%|███████████████████████████████████████████                                       | 10551/20117 [6:39:59<6:03:00,  2.28s/it] 52%|███████████████████████████████████████████                                       | 10552/20117 [6:40:02<6:02:44,  2.28s/it] 52%|███████████████████████████████████████████                                       | 10553/20117 [6:40:04<6:03:10,  2.28s/it] 52%|███████████████████████████████████████████                                       | 10554/20117 [6:40:06<6:03:35,  2.28s/it] 52%|███████████████████████████████████████████                                       | 10555/20117 [6:40:09<6:01:15,  2.27s/it] 52%|███████████████████████████████████████████                                       | 10556/20117 [6:40:11<5:59:08,  2.25s/it] 52%|███████████████████████████████████████████                                       | 10557/20117 [6:40:13<5:57:11,  2.24s/it] 52%|███████████████████████████████████████████                                       | 10558/20117 [6:40:15<5:56:11,  2.24s/it] 52%|███████████████████████████████████████████                                       | 10559/20117 [6:40:17<5:57:55,  2.25s/it] 52%|███████████████████████████████████████████                                       | 10560/20117 [6:40:20<5:56:20,  2.24s/it]                                                                                                                                 {'loss': 0.1717, 'grad_norm': 0.4088289439678192, 'learning_rate': 9.293546189503938e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.64, 'epoch': 1.05}
 52%|███████████████████████████████████████████                                       | 10560/20117 [6:40:20<5:56:20,  2.24s/it] 52%|███████████████████████████████████████████                                       | 10561/20117 [6:40:22<5:56:21,  2.24s/it] 53%|███████████████████████████████████████████                                       | 10562/20117 [6:40:24<5:58:54,  2.25s/it] 53%|███████████████████████████████████████████                                       | 10563/20117 [6:40:27<6:00:31,  2.26s/it] 53%|███████████████████████████████████████████                                       | 10564/20117 [6:40:29<5:56:53,  2.24s/it] 53%|███████████████████████████████████████████                                       | 10565/20117 [6:40:31<5:57:15,  2.24s/it] 53%|███████████████████████████████████████████                                       | 10566/20117 [6:40:33<5:59:36,  2.26s/it] 53%|███████████████████████████████████████████                                       | 10567/20117 [6:40:36<6:01:25,  2.27s/it] 53%|███████████████████████████████████████████                                       | 10568/20117 [6:40:38<6:01:18,  2.27s/it] 53%|███████████████████████████████████████████                                       | 10569/20117 [6:40:40<5:57:35,  2.25s/it] 53%|███████████████████████████████████████████                                       | 10570/20117 [6:40:42<5:57:32,  2.25s/it]                                                                                                                                 {'loss': 0.1417, 'grad_norm': 0.1233486533164978, 'learning_rate': 9.27789165627747e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.24, 'epoch': 1.05}
 53%|███████████████████████████████████████████                                       | 10570/20117 [6:40:42<5:57:32,  2.25s/it] 53%|███████████████████████████████████████████                                       | 10571/20117 [6:40:45<5:59:16,  2.26s/it] 53%|███████████████████████████████████████████                                       | 10572/20117 [6:40:47<5:57:54,  2.25s/it] 53%|███████████████████████████████████████████                                       | 10573/20117 [6:40:49<6:01:58,  2.28s/it] 53%|███████████████████████████████████████████                                       | 10574/20117 [6:40:52<6:16:53,  2.37s/it] 53%|███████████████████████████████████████████                                       | 10575/20117 [6:40:54<6:13:41,  2.35s/it] 53%|███████████████████████████████████████████                                       | 10576/20117 [6:40:56<6:06:43,  2.31s/it] 53%|███████████████████████████████████████████                                       | 10577/20117 [6:40:58<6:03:52,  2.29s/it] 53%|███████████████████████████████████████████                                       | 10578/20117 [6:41:01<6:01:44,  2.28s/it] 53%|███████████████████████████████████████████                                       | 10579/20117 [6:41:03<5:58:34,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10580/20117 [6:41:05<5:56:54,  2.25s/it]                                                                                                                                 {'loss': 0.131, 'grad_norm': 0.54539555311203, 'learning_rate': 9.26223890175647e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 321.17, 'epoch': 1.05}
 53%|███████████████████████████████████████████▏                                      | 10580/20117 [6:41:05<5:56:54,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10581/20117 [6:41:07<5:57:07,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10582/20117 [6:41:10<5:57:29,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10583/20117 [6:41:12<5:58:52,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10584/20117 [6:41:14<5:58:25,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10585/20117 [6:41:16<5:58:51,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10586/20117 [6:41:19<5:56:44,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10587/20117 [6:41:21<5:55:17,  2.24s/it] 53%|███████████████████████████████████████████▏                                      | 10588/20117 [6:41:23<5:59:36,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10589/20117 [6:41:25<5:58:06,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10590/20117 [6:41:28<5:57:28,  2.25s/it]                                                                                                                                 {'loss': 0.1601, 'grad_norm': 0.3036106526851654, 'learning_rate': 9.246587964496984e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.08, 'epoch': 1.05}
 53%|███████████████████████████████████████████▏                                      | 10590/20117 [6:41:28<5:57:28,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10591/20117 [6:41:30<5:57:27,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10592/20117 [6:41:32<5:58:33,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10593/20117 [6:41:34<5:56:34,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10594/20117 [6:41:37<5:56:01,  2.24s/it] 53%|███████████████████████████████████████████▏                                      | 10595/20117 [6:41:39<5:57:32,  2.25s/it] 53%|███████████████████████████████████████████▏                                      | 10596/20117 [6:41:41<5:58:46,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10597/20117 [6:41:43<5:59:51,  2.27s/it] 53%|███████████████████████████████████████████▏                                      | 10598/20117 [6:41:46<6:00:19,  2.27s/it] 53%|███████████████████████████████████████████▏                                      | 10599/20117 [6:41:48<6:01:10,  2.28s/it] 53%|███████████████████████████████████████████▏                                      | 10600/20117 [6:41:50<6:00:43,  2.27s/it]                                                                                                                                 {'loss': 0.1604, 'grad_norm': 0.27965182065963745, 'learning_rate': 9.230938883050581e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.41, 'epoch': 1.05}
 53%|███████████████████████████████████████████▏                                      | 10600/20117 [6:41:50<6:00:43,  2.27s/it] 53%|███████████████████████████████████████████▏                                      | 10601/20117 [6:41:53<5:59:47,  2.27s/it] 53%|███████████████████████████████████████████▏                                      | 10602/20117 [6:41:55<5:58:51,  2.26s/it] 53%|███████████████████████████████████████████▏                                      | 10603/20117 [6:41:57<6:00:58,  2.28s/it] 53%|███████████████████████████████████████████▏                                      | 10604/20117 [6:41:59<6:02:23,  2.29s/it] 53%|███████████████████████████████████████████▏                                      | 10605/20117 [6:42:02<6:04:49,  2.30s/it] 53%|███████████████████████████████████████████▏                                      | 10606/20117 [6:42:04<6:04:17,  2.30s/it] 53%|███████████████████████████████████████████▏                                      | 10607/20117 [6:42:06<6:04:35,  2.30s/it] 53%|███████████████████████████████████████████▏                                      | 10608/20117 [6:42:09<6:02:54,  2.29s/it] 53%|███████████████████████████████████████████▏                                      | 10609/20117 [6:42:11<6:06:35,  2.31s/it] 53%|███████████████████████████████████████████▏                                      | 10610/20117 [6:42:13<6:12:30,  2.35s/it]                                                                                                                                 {'loss': 0.1158, 'grad_norm': 0.33622997999191284, 'learning_rate': 9.215291695964252e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.77, 'epoch': 1.05}
 53%|███████████████████████████████████████████▏                                      | 10610/20117 [6:42:13<6:12:30,  2.35s/it] 53%|███████████████████████████████████████████▎                                      | 10611/20117 [6:42:16<6:10:07,  2.34s/it] 53%|███████████████████████████████████████████▎                                      | 10612/20117 [6:42:18<6:07:27,  2.32s/it] 53%|███████████████████████████████████████████▎                                      | 10613/20117 [6:42:20<6:03:40,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10614/20117 [6:42:23<6:04:52,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10615/20117 [6:42:25<6:03:28,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10616/20117 [6:42:27<6:02:14,  2.29s/it] 53%|███████████████████████████████████████████▎                                      | 10617/20117 [6:42:29<6:03:37,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10618/20117 [6:42:32<6:02:53,  2.29s/it] 53%|███████████████████████████████████████████▎                                      | 10619/20117 [6:42:34<6:03:18,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10620/20117 [6:42:36<6:02:51,  2.29s/it]                                                                                                                                 {'loss': 0.1452, 'grad_norm': 0.35641202330589294, 'learning_rate': 9.199646441780332e-05, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.54, 'epoch': 1.06}
 53%|███████████████████████████████████████████▎                                      | 10620/20117 [6:42:36<6:02:51,  2.29s/it] 53%|███████████████████████████████████████████▎                                      | 10621/20117 [6:42:39<6:00:03,  2.28s/it] 53%|███████████████████████████████████████████▎                                      | 10622/20117 [6:42:41<6:01:46,  2.29s/it] 53%|███████████████████████████████████████████▎                                      | 10623/20117 [6:42:43<6:05:02,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10624/20117 [6:42:46<6:09:37,  2.34s/it] 53%|███████████████████████████████████████████▎                                      | 10625/20117 [6:42:48<6:10:34,  2.34s/it] 53%|███████████████████████████████████████████▎                                      | 10626/20117 [6:42:50<6:08:21,  2.33s/it] 53%|███████████████████████████████████████████▎                                      | 10627/20117 [6:42:53<6:23:10,  2.42s/it] 53%|███████████████████████████████████████████▎                                      | 10628/20117 [6:42:55<6:15:16,  2.37s/it] 53%|███████████████████████████████████████████▎                                      | 10629/20117 [6:42:57<6:09:43,  2.34s/it] 53%|███████████████████████████████████████████▎                                      | 10630/20117 [6:43:00<6:05:55,  2.31s/it]                                                                                                                                 {'loss': 0.2526, 'grad_norm': 0.612156331539154, 'learning_rate': 9.184003159036379e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.46, 'epoch': 1.06}
 53%|███████████████████████████████████████████▎                                      | 10630/20117 [6:43:00<6:05:55,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10631/20117 [6:43:02<6:05:34,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10632/20117 [6:43:04<6:04:32,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10633/20117 [6:43:07<6:04:10,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10634/20117 [6:43:09<6:05:23,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10635/20117 [6:43:11<6:05:40,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10636/20117 [6:43:14<6:03:50,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10637/20117 [6:43:16<6:05:39,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10638/20117 [6:43:18<6:05:24,  2.31s/it] 53%|███████████████████████████████████████████▎                                      | 10639/20117 [6:43:20<6:03:21,  2.30s/it] 53%|███████████████████████████████████████████▎                                      | 10640/20117 [6:43:23<6:00:22,  2.28s/it]                                                                                                                                 {'loss': 0.162, 'grad_norm': 0.38296809792518616, 'learning_rate': 9.168361886265113e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.6, 'epoch': 1.06}
 53%|███████████████████████████████████████████▎                                      | 10640/20117 [6:43:23<6:00:22,  2.28s/it] 53%|███████████████████████████████████████████▎                                      | 10641/20117 [6:43:25<5:59:47,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10642/20117 [6:43:27<6:01:45,  2.29s/it] 53%|███████████████████████████████████████████▍                                      | 10643/20117 [6:43:30<6:00:16,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10644/20117 [6:43:32<6:00:03,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10645/20117 [6:43:34<6:02:09,  2.29s/it] 53%|███████████████████████████████████████████▍                                      | 10646/20117 [6:43:36<6:00:11,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10647/20117 [6:43:39<6:01:26,  2.29s/it] 53%|███████████████████████████████████████████▍                                      | 10648/20117 [6:43:41<6:04:06,  2.31s/it] 53%|███████████████████████████████████████████▍                                      | 10649/20117 [6:43:43<6:01:48,  2.29s/it] 53%|███████████████████████████████████████████▍                                      | 10650/20117 [6:43:46<6:05:35,  2.32s/it]                                                                                                                                 {'loss': 0.1507, 'grad_norm': 0.35586997866630554, 'learning_rate': 9.15272266199429e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.67, 'epoch': 1.06}
 53%|███████████████████████████████████████████▍                                      | 10650/20117 [6:43:46<6:05:35,  2.32s/it] 53%|███████████████████████████████████████████▍                                      | 10651/20117 [6:43:48<6:05:03,  2.31s/it] 53%|███████████████████████████████████████████▍                                      | 10652/20117 [6:43:50<6:07:22,  2.33s/it] 53%|███████████████████████████████████████████▍                                      | 10653/20117 [6:43:53<6:05:27,  2.32s/it] 53%|███████████████████████████████████████████▍                                      | 10654/20117 [6:43:55<6:07:12,  2.33s/it] 53%|███████████████████████████████████████████▍                                      | 10655/20117 [6:43:57<6:05:16,  2.32s/it] 53%|███████████████████████████████████████████▍                                      | 10656/20117 [6:44:00<6:03:45,  2.31s/it] 53%|███████████████████████████████████████████▍                                      | 10657/20117 [6:44:02<6:00:46,  2.29s/it] 53%|███████████████████████████████████████████▍                                      | 10658/20117 [6:44:04<5:59:28,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10659/20117 [6:44:06<5:56:59,  2.26s/it] 53%|███████████████████████████████████████████▍                                      | 10660/20117 [6:44:09<5:57:11,  2.27s/it]                                                                                                                                 {'loss': 0.1771, 'grad_norm': 0.2866521179676056, 'learning_rate': 9.13708552474663e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.96, 'epoch': 1.06}
 53%|███████████████████████████████████████████▍                                      | 10660/20117 [6:44:09<5:57:11,  2.27s/it] 53%|███████████████████████████████████████████▍                                      | 10661/20117 [6:44:11<5:59:13,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10662/20117 [6:44:13<5:59:33,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10663/20117 [6:44:15<6:00:57,  2.29s/it] 53%|███████████████████████████████████████████▍                                      | 10664/20117 [6:44:18<5:58:44,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10665/20117 [6:44:20<5:59:49,  2.28s/it] 53%|███████████████████████████████████████████▍                                      | 10666/20117 [6:44:22<6:01:55,  2.30s/it] 53%|███████████████████████████████████████████▍                                      | 10667/20117 [6:44:25<6:02:58,  2.30s/it] 53%|███████████████████████████████████████████▍                                      | 10668/20117 [6:44:27<6:04:17,  2.31s/it] 53%|███████████████████████████████████████████▍                                      | 10669/20117 [6:44:29<6:06:11,  2.33s/it] 53%|███████████████████████████████████████████▍                                      | 10670/20117 [6:44:32<6:08:16,  2.34s/it]                                                                                                                                 {'loss': 0.1399, 'grad_norm': 0.3502759635448456, 'learning_rate': 9.1214505130397e-05, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.29, 'epoch': 1.06}
 53%|███████████████████████████████████████████▍                                      | 10670/20117 [6:44:32<6:08:16,  2.34s/it] 53%|███████████████████████████████████████████▍                                      | 10671/20117 [6:44:34<6:07:29,  2.33s/it] 53%|███████████████████████████████████████████▌                                      | 10672/20117 [6:44:36<6:07:58,  2.34s/it] 53%|███████████████████████████████████████████▌                                      | 10673/20117 [6:44:39<6:05:37,  2.32s/it] 53%|███████████████████████████████████████████▌                                      | 10674/20117 [6:44:41<6:04:20,  2.31s/it] 53%|███████████████████████████████████████████▌                                      | 10675/20117 [6:44:43<6:04:44,  2.32s/it] 53%|███████████████████████████████████████████▌                                      | 10676/20117 [6:44:46<6:04:59,  2.32s/it] 53%|███████████████████████████████████████████▌                                      | 10677/20117 [6:44:48<6:08:38,  2.34s/it] 53%|███████████████████████████████████████████▌                                      | 10678/20117 [6:44:50<6:07:21,  2.34s/it] 53%|███████████████████████████████████████████▌                                      | 10679/20117 [6:44:53<6:07:47,  2.34s/it] 53%|███████████████████████████████████████████▌                                      | 10680/20117 [6:44:55<6:10:01,  2.35s/it]                                                                                                                                 {'loss': 0.1584, 'grad_norm': 0.3619881570339203, 'learning_rate': 9.105817665385846e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.33, 'epoch': 1.06}
 53%|███████████████████████████████████████████▌                                      | 10680/20117 [6:44:55<6:10:01,  2.35s/it] 53%|███████████████████████████████████████████▌                                      | 10681/20117 [6:44:58<6:25:43,  2.45s/it] 53%|███████████████████████████████████████████▌                                      | 10682/20117 [6:45:00<6:19:31,  2.41s/it] 53%|███████████████████████████████████████████▌                                      | 10683/20117 [6:45:02<6:16:15,  2.39s/it] 53%|███████████████████████████████████████████▌                                      | 10684/20117 [6:45:05<6:14:35,  2.38s/it] 53%|███████████████████████████████████████████▌                                      | 10685/20117 [6:45:07<6:16:05,  2.39s/it] 53%|███████████████████████████████████████████▌                                      | 10686/20117 [6:45:09<6:11:03,  2.36s/it] 53%|███████████████████████████████████████████▌                                      | 10687/20117 [6:45:12<6:08:07,  2.34s/it] 53%|███████████████████████████████████████████▌                                      | 10688/20117 [6:45:14<6:03:55,  2.32s/it] 53%|███████████████████████████████████████████▌                                      | 10689/20117 [6:45:16<6:03:07,  2.31s/it] 53%|███████████████████████████████████████████▌                                      | 10690/20117 [6:45:19<6:00:46,  2.30s/it]                                                                                                                                 {'loss': 0.2043, 'grad_norm': 0.6374160051345825, 'learning_rate': 9.090187020292068e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.46, 'epoch': 1.06}
 53%|███████████████████████████████████████████▌                                      | 10690/20117 [6:45:19<6:00:46,  2.30s/it] 53%|███████████████████████████████████████████▌                                      | 10691/20117 [6:45:21<6:01:34,  2.30s/it] 53%|███████████████████████████████████████████▌                                      | 10692/20117 [6:45:23<6:01:31,  2.30s/it] 53%|███████████████████████████████████████████▌                                      | 10693/20117 [6:45:26<6:00:17,  2.29s/it] 53%|███████████████████████████████████████████▌                                      | 10694/20117 [6:45:28<5:59:40,  2.29s/it] 53%|███████████████████████████████████████████▌                                      | 10695/20117 [6:45:30<6:01:13,  2.30s/it] 53%|███████████████████████████████████████████▌                                      | 10696/20117 [6:45:32<5:59:05,  2.29s/it] 53%|███████████████████████████████████████████▌                                      | 10697/20117 [6:45:35<5:59:39,  2.29s/it] 53%|███████████████████████████████████████████▌                                      | 10698/20117 [6:45:37<6:00:12,  2.29s/it] 53%|███████████████████████████████████████████▌                                      | 10699/20117 [6:45:39<6:00:13,  2.29s/it] 53%|███████████████████████████████████████████▌                                      | 10700/20117 [6:45:42<6:02:20,  2.31s/it]                                                                                                                                 {'loss': 0.1564, 'grad_norm': 0.486200749874115, 'learning_rate': 9.074558616259954e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.08, 'epoch': 1.06}
 53%|███████████████████████████████████████████▌                                      | 10700/20117 [6:45:42<6:02:20,  2.31s/it] 53%|███████████████████████████████████████████▌                                      | 10701/20117 [6:45:44<6:04:04,  2.32s/it] 53%|███████████████████████████████████████████▌                                      | 10702/20117 [6:45:46<6:01:15,  2.30s/it] 53%|███████████████████████████████████████████▋                                      | 10703/20117 [6:45:48<5:59:48,  2.29s/it] 53%|███████████████████████████████████████████▋                                      | 10704/20117 [6:45:51<5:59:15,  2.29s/it] 53%|███████████████████████████████████████████▋                                      | 10705/20117 [6:45:53<6:00:31,  2.30s/it] 53%|███████████████████████████████████████████▋                                      | 10706/20117 [6:45:55<6:00:22,  2.30s/it] 53%|███████████████████████████████████████████▋                                      | 10707/20117 [6:45:58<5:59:22,  2.29s/it] 53%|███████████████████████████████████████████▋                                      | 10708/20117 [6:46:00<5:56:52,  2.28s/it] 53%|███████████████████████████████████████████▋                                      | 10709/20117 [6:46:02<5:58:14,  2.28s/it] 53%|███████████████████████████████████████████▋                                      | 10710/20117 [6:46:04<5:55:25,  2.27s/it]                                                                                                                                 {'loss': 0.1719, 'grad_norm': 0.8452271223068237, 'learning_rate': 9.058932491785564e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.56, 'epoch': 1.06}
 53%|███████████████████████████████████████████▋                                      | 10710/20117 [6:46:04<5:55:25,  2.27s/it] 53%|███████████████████████████████████████████▋                                      | 10711/20117 [6:46:07<5:58:05,  2.28s/it] 53%|███████████████████████████████████████████▋                                      | 10712/20117 [6:46:09<5:52:19,  2.25s/it] 53%|███████████████████████████████████████████▋                                      | 10713/20117 [6:46:11<5:52:59,  2.25s/it] 53%|███████████████████████████████████████████▋                                      | 10714/20117 [6:46:13<5:52:12,  2.25s/it] 53%|███████████████████████████████████████████▋                                      | 10715/20117 [6:46:16<5:51:13,  2.24s/it] 53%|███████████████████████████████████████████▋                                      | 10716/20117 [6:46:18<5:51:35,  2.24s/it] 53%|███████████████████████████████████████████▋                                      | 10717/20117 [6:46:20<5:52:30,  2.25s/it] 53%|███████████████████████████████████████████▋                                      | 10718/20117 [6:46:22<5:55:27,  2.27s/it] 53%|███████████████████████████████████████████▋                                      | 10719/20117 [6:46:25<5:58:48,  2.29s/it] 53%|███████████████████████████████████████████▋                                      | 10720/20117 [6:46:27<5:59:40,  2.30s/it]                                                                                                                                 {'loss': 0.184, 'grad_norm': 0.5657733082771301, 'learning_rate': 9.043308685359344e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.5, 'epoch': 1.07}
 53%|███████████████████████████████████████████▋                                      | 10720/20117 [6:46:27<5:59:40,  2.30s/it] 53%|███████████████████████████████████████████▋                                      | 10721/20117 [6:46:29<5:59:52,  2.30s/it] 53%|███████████████████████████████████████████▋                                      | 10722/20117 [6:46:32<5:58:33,  2.29s/it] 53%|███████████████████████████████████████████▋                                      | 10723/20117 [6:46:34<5:57:39,  2.28s/it] 53%|███████████████████████████████████████████▋                                      | 10724/20117 [6:46:36<5:59:59,  2.30s/it] 53%|███████████████████████████████████████████▋                                      | 10725/20117 [6:46:39<6:01:20,  2.31s/it] 53%|███████████████████████████████████████████▋                                      | 10726/20117 [6:46:41<5:57:28,  2.28s/it] 53%|███████████████████████████████████████████▋                                      | 10727/20117 [6:46:43<5:52:42,  2.25s/it] 53%|███████████████████████████████████████████▋                                      | 10728/20117 [6:46:45<5:48:39,  2.23s/it] 53%|███████████████████████████████████████████▋                                      | 10729/20117 [6:46:47<5:50:09,  2.24s/it] 53%|███████████████████████████████████████████▋                                      | 10730/20117 [6:46:50<5:55:06,  2.27s/it]                                                                                                                                 {'loss': 0.1483, 'grad_norm': 0.3643028438091278, 'learning_rate': 9.027687235466038e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.73, 'epoch': 1.07}
 53%|███████████████████████████████████████████▋                                      | 10730/20117 [6:46:50<5:55:06,  2.27s/it] 53%|███████████████████████████████████████████▋                                      | 10731/20117 [6:46:52<5:57:36,  2.29s/it] 53%|███████████████████████████████████████████▋                                      | 10732/20117 [6:46:54<5:58:42,  2.29s/it] 53%|███████████████████████████████████████████▋                                      | 10733/20117 [6:46:57<6:18:00,  2.42s/it] 53%|███████████████████████████████████████████▊                                      | 10734/20117 [6:47:00<6:16:00,  2.40s/it] 53%|███████████████████████████████████████████▊                                      | 10735/20117 [6:47:02<6:10:30,  2.37s/it] 53%|███████████████████████████████████████████▊                                      | 10736/20117 [6:47:04<6:07:30,  2.35s/it] 53%|███████████████████████████████████████████▊                                      | 10737/20117 [6:47:06<6:07:52,  2.35s/it] 53%|███████████████████████████████████████████▊                                      | 10738/20117 [6:47:09<6:11:53,  2.38s/it] 53%|███████████████████████████████████████████▊                                      | 10739/20117 [6:47:11<6:10:10,  2.37s/it] 53%|███████████████████████████████████████████▊                                      | 10740/20117 [6:47:14<6:05:37,  2.34s/it]                                                                                                                                 {'loss': 0.1743, 'grad_norm': 0.6992392539978027, 'learning_rate': 9.012068180584569e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.01, 'epoch': 1.07}
 53%|███████████████████████████████████████████▊                                      | 10740/20117 [6:47:14<6:05:37,  2.34s/it] 53%|███████████████████████████████████████████▊                                      | 10741/20117 [6:47:16<6:01:09,  2.31s/it] 53%|███████████████████████████████████████████▊                                      | 10742/20117 [6:47:18<5:58:30,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10743/20117 [6:47:20<5:58:05,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10744/20117 [6:47:23<5:57:07,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10745/20117 [6:47:25<5:58:26,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10746/20117 [6:47:27<5:57:13,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10747/20117 [6:47:29<5:56:13,  2.28s/it] 53%|███████████████████████████████████████████▊                                      | 10748/20117 [6:47:32<5:56:31,  2.28s/it] 53%|███████████████████████████████████████████▊                                      | 10749/20117 [6:47:34<5:56:31,  2.28s/it] 53%|███████████████████████████████████████████▊                                      | 10750/20117 [6:47:36<5:57:39,  2.29s/it]                                                                                                                                 {'loss': 0.1825, 'grad_norm': 0.5826465487480164, 'learning_rate': 8.996451559187981e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.47, 'epoch': 1.07}
 53%|███████████████████████████████████████████▊                                      | 10750/20117 [6:47:36<5:57:39,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10751/20117 [6:47:39<5:57:40,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10752/20117 [6:47:41<5:57:59,  2.29s/it] 53%|███████████████████████████████████████████▊                                      | 10753/20117 [6:47:43<5:56:07,  2.28s/it] 53%|███████████████████████████████████████████▊                                      | 10754/20117 [6:47:46<5:59:31,  2.30s/it] 53%|███████████████████████████████████████████▊                                      | 10755/20117 [6:47:48<5:58:12,  2.30s/it] 53%|███████████████████████████████████████████▊                                      | 10756/20117 [6:47:50<6:00:51,  2.31s/it] 53%|███████████████████████████████████████████▊                                      | 10757/20117 [6:47:52<5:58:02,  2.30s/it] 53%|███████████████████████████████████████████▊                                      | 10758/20117 [6:47:55<5:55:21,  2.28s/it] 53%|███████████████████████████████████████████▊                                      | 10759/20117 [6:47:57<5:56:01,  2.28s/it] 53%|███████████████████████████████████████████▊                                      | 10760/20117 [6:47:59<5:55:55,  2.28s/it]                                                                                                                                 {'loss': 0.205, 'grad_norm': 0.3841560184955597, 'learning_rate': 8.980837409743304e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.05, 'epoch': 1.07}
 53%|███████████████████████████████████████████▊                                      | 10760/20117 [6:47:59<5:55:55,  2.28s/it] 53%|███████████████████████████████████████████▊                                      | 10761/20117 [6:48:02<5:59:56,  2.31s/it] 53%|███████████████████████████████████████████▊                                      | 10762/20117 [6:48:04<5:57:07,  2.29s/it] 54%|███████████████████████████████████████████▊                                      | 10763/20117 [6:48:06<5:55:01,  2.28s/it] 54%|███████████████████████████████████████████▉                                      | 10764/20117 [6:48:08<5:54:59,  2.28s/it] 54%|███████████████████████████████████████████▉                                      | 10765/20117 [6:48:11<5:58:19,  2.30s/it] 54%|███████████████████████████████████████████▉                                      | 10766/20117 [6:48:13<5:56:49,  2.29s/it] 54%|███████████████████████████████████████████▉                                      | 10767/20117 [6:48:15<5:57:48,  2.30s/it] 54%|███████████████████████████████████████████▉                                      | 10768/20117 [6:48:18<5:59:43,  2.31s/it] 54%|███████████████████████████████████████████▉                                      | 10769/20117 [6:48:20<6:01:03,  2.32s/it] 54%|███████████████████████████████████████████▉                                      | 10770/20117 [6:48:22<6:00:44,  2.32s/it]                                                                                                                                 {'loss': 0.1274, 'grad_norm': 0.6366297006607056, 'learning_rate': 8.965225770711493e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.68, 'epoch': 1.07}
 54%|███████████████████████████████████████████▉                                      | 10770/20117 [6:48:22<6:00:44,  2.32s/it] 54%|███████████████████████████████████████████▉                                      | 10771/20117 [6:48:25<6:01:42,  2.32s/it] 54%|███████████████████████████████████████████▉                                      | 10772/20117 [6:48:27<6:03:00,  2.33s/it] 54%|███████████████████████████████████████████▉                                      | 10773/20117 [6:48:29<5:59:44,  2.31s/it] 54%|███████████████████████████████████████████▉                                      | 10774/20117 [6:48:32<5:58:47,  2.30s/it] 54%|███████████████████████████████████████████▉                                      | 10775/20117 [6:48:34<5:59:37,  2.31s/it] 54%|███████████████████████████████████████████▉                                      | 10776/20117 [6:48:36<5:59:10,  2.31s/it] 54%|███████████████████████████████████████████▉                                      | 10777/20117 [6:48:38<5:57:40,  2.30s/it] 54%|███████████████████████████████████████████▉                                      | 10778/20117 [6:48:41<5:55:36,  2.28s/it] 54%|███████████████████████████████████████████▉                                      | 10779/20117 [6:48:43<5:56:01,  2.29s/it] 54%|███████████████████████████████████████████▉                                      | 10780/20117 [6:48:45<5:55:10,  2.28s/it]                                                                                                                                 {'loss': 0.1986, 'grad_norm': 0.3577064275741577, 'learning_rate': 8.94961668054731e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.85, 'epoch': 1.07}
 54%|███████████████████████████████████████████▉                                      | 10780/20117 [6:48:45<5:55:10,  2.28s/it] 54%|███████████████████████████████████████████▉                                      | 10781/20117 [6:48:48<5:56:51,  2.29s/it] 54%|███████████████████████████████████████████▉                                      | 10782/20117 [6:48:50<5:56:11,  2.29s/it] 54%|███████████████████████████████████████████▉                                      | 10783/20117 [6:48:52<5:54:17,  2.28s/it] 54%|███████████████████████████████████████████▉                                      | 10784/20117 [6:48:54<5:53:18,  2.27s/it] 54%|███████████████████████████████████████████▉                                      | 10785/20117 [6:48:57<5:53:29,  2.27s/it] 54%|███████████████████████████████████████████▉                                      | 10786/20117 [6:48:59<6:10:47,  2.38s/it] 54%|███████████████████████████████████████████▉                                      | 10787/20117 [6:49:02<6:09:44,  2.38s/it] 54%|███████████████████████████████████████████▉                                      | 10788/20117 [6:49:04<6:07:00,  2.36s/it] 54%|███████████████████████████████████████████▉                                      | 10789/20117 [6:49:06<6:05:46,  2.35s/it] 54%|███████████████████████████████████████████▉                                      | 10790/20117 [6:49:09<6:01:28,  2.33s/it]                                                                                                                                 {'loss': 0.1747, 'grad_norm': 0.4104318618774414, 'learning_rate': 8.934010177699252e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.88, 'epoch': 1.07}
 54%|███████████████████████████████████████████▉                                      | 10790/20117 [6:49:09<6:01:28,  2.33s/it] 54%|███████████████████████████████████████████▉                                      | 10791/20117 [6:49:11<6:03:08,  2.34s/it] 54%|███████████████████████████████████████████▉                                      | 10792/20117 [6:49:13<5:59:20,  2.31s/it] 54%|███████████████████████████████████████████▉                                      | 10793/20117 [6:49:15<5:57:49,  2.30s/it] 54%|███████████████████████████████████████████▉                                      | 10794/20117 [6:49:18<5:56:02,  2.29s/it] 54%|████████████████████████████████████████████                                      | 10795/20117 [6:49:20<5:53:30,  2.28s/it] 54%|████████████████████████████████████████████                                      | 10796/20117 [6:49:22<5:53:40,  2.28s/it] 54%|████████████████████████████████████████████                                      | 10797/20117 [6:49:25<5:54:52,  2.28s/it] 54%|████████████████████████████████████████████                                      | 10798/20117 [6:49:27<5:58:25,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10799/20117 [6:49:29<6:00:24,  2.32s/it] 54%|████████████████████████████████████████████                                      | 10800/20117 [6:49:32<5:58:14,  2.31s/it]                                                                                                                                 {'loss': 0.1995, 'grad_norm': 0.6288489103317261, 'learning_rate': 8.918406300609424e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.51, 'epoch': 1.07}
 54%|████████████████████████████████████████████                                      | 10800/20117 [6:49:32<5:58:14,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10801/20117 [6:49:34<5:58:16,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10802/20117 [6:49:36<5:58:02,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10803/20117 [6:49:38<5:58:25,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10804/20117 [6:49:41<5:56:03,  2.29s/it] 54%|████████████████████████████████████████████                                      | 10805/20117 [6:49:43<5:54:24,  2.28s/it] 54%|████████████████████████████████████████████                                      | 10806/20117 [6:49:45<5:56:22,  2.30s/it] 54%|████████████████████████████████████████████                                      | 10807/20117 [6:49:48<5:59:21,  2.32s/it] 54%|████████████████████████████████████████████                                      | 10808/20117 [6:49:50<5:55:34,  2.29s/it] 54%|████████████████████████████████████████████                                      | 10809/20117 [6:49:52<5:55:24,  2.29s/it] 54%|████████████████████████████████████████████                                      | 10810/20117 [6:49:54<5:55:41,  2.29s/it]                                                                                                                                 {'loss': 0.1503, 'grad_norm': 0.4445495009422302, 'learning_rate': 8.902805087713482e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 300.06, 'epoch': 1.07}
 54%|████████████████████████████████████████████                                      | 10810/20117 [6:49:54<5:55:41,  2.29s/it] 54%|████████████████████████████████████████████                                      | 10811/20117 [6:49:57<5:55:52,  2.29s/it] 54%|████████████████████████████████████████████                                      | 10812/20117 [6:49:59<6:01:45,  2.33s/it] 54%|████████████████████████████████████████████                                      | 10813/20117 [6:50:02<6:02:51,  2.34s/it] 54%|████████████████████████████████████████████                                      | 10814/20117 [6:50:04<5:58:22,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10815/20117 [6:50:06<5:54:07,  2.28s/it] 54%|████████████████████████████████████████████                                      | 10816/20117 [6:50:08<5:53:58,  2.28s/it] 54%|████████████████████████████████████████████                                      | 10817/20117 [6:50:11<5:56:31,  2.30s/it] 54%|████████████████████████████████████████████                                      | 10818/20117 [6:50:13<5:59:30,  2.32s/it] 54%|████████████████████████████████████████████                                      | 10819/20117 [6:50:15<5:57:36,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10820/20117 [6:50:18<5:58:09,  2.31s/it]                                                                                                                                 {'loss': 0.1843, 'grad_norm': 0.4280208945274353, 'learning_rate': 8.887206577440502e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.92, 'epoch': 1.08}
 54%|████████████████████████████████████████████                                      | 10820/20117 [6:50:18<5:58:09,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10821/20117 [6:50:20<5:55:41,  2.30s/it] 54%|████████████████████████████████████████████                                      | 10822/20117 [6:50:22<5:53:24,  2.28s/it] 54%|████████████████████████████████████████████                                      | 10823/20117 [6:50:24<5:55:26,  2.29s/it] 54%|████████████████████████████████████████████                                      | 10824/20117 [6:50:27<5:58:11,  2.31s/it] 54%|████████████████████████████████████████████                                      | 10825/20117 [6:50:29<6:00:50,  2.33s/it] 54%|████████████████████████████████████████████▏                                     | 10826/20117 [6:50:31<5:59:24,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10827/20117 [6:50:34<5:59:31,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10828/20117 [6:50:36<6:02:22,  2.34s/it] 54%|████████████████████████████████████████████▏                                     | 10829/20117 [6:50:39<6:02:07,  2.34s/it] 54%|████████████████████████████████████████████▏                                     | 10830/20117 [6:50:41<6:00:36,  2.33s/it]                                                                                                                                 {'loss': 0.202, 'grad_norm': 0.5971205830574036, 'learning_rate': 8.871610808212918e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.5, 'epoch': 1.08}
 54%|████████████████████████████████████████████▏                                     | 10830/20117 [6:50:41<6:00:36,  2.33s/it] 54%|████████████████████████████████████████████▏                                     | 10831/20117 [6:50:43<5:58:31,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10832/20117 [6:50:45<5:58:49,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10833/20117 [6:50:48<5:57:45,  2.31s/it] 54%|████████████████████████████████████████████▏                                     | 10834/20117 [6:50:50<5:57:42,  2.31s/it] 54%|████████████████████████████████████████████▏                                     | 10835/20117 [6:50:52<6:00:08,  2.33s/it] 54%|████████████████████████████████████████████▏                                     | 10836/20117 [6:50:55<6:00:41,  2.33s/it] 54%|████████████████████████████████████████████▏                                     | 10837/20117 [6:50:57<5:59:17,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10838/20117 [6:50:59<5:59:25,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10839/20117 [6:51:02<6:01:20,  2.34s/it] 54%|████████████████████████████████████████████▏                                     | 10840/20117 [6:51:04<6:18:24,  2.45s/it]                                                                                                                                 {'loss': 0.191, 'grad_norm': 0.20293696224689484, 'learning_rate': 8.856017818446402e-05, 'memory/max_active (GiB)': 20.46, 'memory/max_allocated (GiB)': 20.46, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 292.31, 'epoch': 1.08}
 54%|████████████████████████████████████████████▏                                     | 10840/20117 [6:51:04<6:18:24,  2.45s/it] 54%|████████████████████████████████████████████▏                                     | 10841/20117 [6:51:07<6:12:22,  2.41s/it] 54%|████████████████████████████████████████████▏                                     | 10842/20117 [6:51:09<6:07:18,  2.38s/it] 54%|████████████████████████████████████████████▏                                     | 10843/20117 [6:51:12<6:10:49,  2.40s/it] 54%|████████████████████████████████████████████▏                                     | 10844/20117 [6:51:14<6:08:04,  2.38s/it] 54%|████████████████████████████████████████████▏                                     | 10845/20117 [6:51:16<6:02:53,  2.35s/it] 54%|████████████████████████████████████████████▏                                     | 10846/20117 [6:51:18<6:03:25,  2.35s/it] 54%|████████████████████████████████████████████▏                                     | 10847/20117 [6:51:21<5:59:15,  2.33s/it] 54%|████████████████████████████████████████████▏                                     | 10848/20117 [6:51:23<5:59:45,  2.33s/it] 54%|████████████████████████████████████████████▏                                     | 10849/20117 [6:51:25<5:59:57,  2.33s/it] 54%|████████████████████████████████████████████▏                                     | 10850/20117 [6:51:28<6:01:20,  2.34s/it]                                                                                                                                 {'loss': 0.1699, 'grad_norm': 0.5575738549232483, 'learning_rate': 8.840427646549788e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.54, 'epoch': 1.08}
 54%|████████████████████████████████████████████▏                                     | 10850/20117 [6:51:28<6:01:20,  2.34s/it] 54%|████████████████████████████████████████████▏                                     | 10851/20117 [6:51:30<5:57:48,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10852/20117 [6:51:32<5:58:46,  2.32s/it] 54%|████████████████████████████████████████████▏                                     | 10853/20117 [6:51:35<5:55:40,  2.30s/it] 54%|████████████████████████████████████████████▏                                     | 10854/20117 [6:51:37<5:57:00,  2.31s/it] 54%|████████████████████████████████████████████▏                                     | 10855/20117 [6:51:39<5:56:10,  2.31s/it] 54%|████████████████████████████████████████████▎                                     | 10856/20117 [6:51:42<5:55:54,  2.31s/it] 54%|████████████████████████████████████████████▎                                     | 10857/20117 [6:51:44<5:55:44,  2.30s/it] 54%|████████████████████████████████████████████▎                                     | 10858/20117 [6:51:46<5:57:30,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10859/20117 [6:51:49<5:57:29,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10860/20117 [6:51:51<5:54:07,  2.30s/it]                                                                                                                                 {'loss': 0.191, 'grad_norm': 0.5824712514877319, 'learning_rate': 8.824840330924959e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.34, 'epoch': 1.08}
 54%|████████████████████████████████████████████▎                                     | 10860/20117 [6:51:51<5:54:07,  2.30s/it] 54%|████████████████████████████████████████████▎                                     | 10861/20117 [6:51:53<5:56:03,  2.31s/it] 54%|████████████████████████████████████████████▎                                     | 10862/20117 [6:51:55<5:59:24,  2.33s/it] 54%|████████████████████████████████████████████▎                                     | 10863/20117 [6:51:58<5:55:48,  2.31s/it] 54%|████████████████████████████████████████████▎                                     | 10864/20117 [6:52:00<5:57:06,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10865/20117 [6:52:02<6:01:16,  2.34s/it] 54%|████████████████████████████████████████████▎                                     | 10866/20117 [6:52:05<5:59:00,  2.33s/it] 54%|████████████████████████████████████████████▎                                     | 10867/20117 [6:52:07<5:57:30,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10868/20117 [6:52:09<5:57:03,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10869/20117 [6:52:12<5:56:55,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10870/20117 [6:52:14<5:55:15,  2.31s/it]                                                                                                                                 {'loss': 0.1255, 'grad_norm': 0.6753024458885193, 'learning_rate': 8.809255909966771e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 298.62, 'epoch': 1.08}
 54%|████████████████████████████████████████████▎                                     | 10870/20117 [6:52:14<5:55:15,  2.31s/it] 54%|████████████████████████████████████████████▎                                     | 10871/20117 [6:52:16<5:55:04,  2.30s/it] 54%|████████████████████████████████████████████▎                                     | 10872/20117 [6:52:19<5:53:14,  2.29s/it] 54%|████████████████████████████████████████████▎                                     | 10873/20117 [6:52:21<5:58:09,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10874/20117 [6:52:23<5:57:59,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10875/20117 [6:52:26<5:57:54,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10876/20117 [6:52:28<5:57:14,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10877/20117 [6:52:30<5:57:26,  2.32s/it] 54%|████████████████████████████████████████████▎                                     | 10878/20117 [6:52:33<5:55:33,  2.31s/it] 54%|████████████████████████████████████████████▎                                     | 10879/20117 [6:52:35<5:55:16,  2.31s/it] 54%|████████████████████████████████████████████▎                                     | 10880/20117 [6:52:37<5:54:46,  2.30s/it]                                                                                                                                 {'loss': 0.1856, 'grad_norm': 0.6008898019790649, 'learning_rate': 8.793674422062949e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 399.03, 'epoch': 1.08}
 54%|████████████████████████████████████████████▎                                     | 10880/20117 [6:52:37<5:54:46,  2.30s/it] 54%|████████████████████████████████████████████▎                                     | 10881/20117 [6:52:39<5:51:56,  2.29s/it] 54%|████████████████████████████████████████████▎                                     | 10882/20117 [6:52:42<5:51:45,  2.29s/it] 54%|████████████████████████████████████████████▎                                     | 10883/20117 [6:52:44<5:49:21,  2.27s/it] 54%|████████████████████████████████████████████▎                                     | 10884/20117 [6:52:46<5:50:22,  2.28s/it] 54%|████████████████████████████████████████████▎                                     | 10885/20117 [6:52:48<5:48:44,  2.27s/it] 54%|████████████████████████████████████████████▎                                     | 10886/20117 [6:52:51<5:51:00,  2.28s/it] 54%|████████████████████████████████████████████▍                                     | 10887/20117 [6:52:53<5:52:56,  2.29s/it] 54%|████████████████████████████████████████████▍                                     | 10888/20117 [6:52:55<5:53:13,  2.30s/it] 54%|████████████████████████████████████████████▍                                     | 10889/20117 [6:52:58<5:53:18,  2.30s/it] 54%|████████████████████████████████████████████▍                                     | 10890/20117 [6:53:00<5:51:04,  2.28s/it]                                                                                                                                 {'loss': 0.1631, 'grad_norm': 0.507643461227417, 'learning_rate': 8.778095905593986e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.61, 'epoch': 1.08}
 54%|████████████████████████████████████████████▍                                     | 10890/20117 [6:53:00<5:51:04,  2.28s/it] 54%|████████████████████████████████████████████▍                                     | 10891/20117 [6:53:03<6:10:25,  2.41s/it] 54%|████████████████████████████████████████████▍                                     | 10892/20117 [6:53:05<6:02:38,  2.36s/it] 54%|████████████████████████████████████████████▍                                     | 10893/20117 [6:53:07<6:02:50,  2.36s/it] 54%|████████████████████████████████████████████▍                                     | 10894/20117 [6:53:10<6:03:51,  2.37s/it] 54%|████████████████████████████████████████████▍                                     | 10895/20117 [6:53:12<5:59:19,  2.34s/it] 54%|████████████████████████████████████████████▍                                     | 10896/20117 [6:53:14<5:54:37,  2.31s/it] 54%|████████████████████████████████████████████▍                                     | 10897/20117 [6:53:16<5:47:09,  2.26s/it] 54%|████████████████████████████████████████████▍                                     | 10898/20117 [6:53:19<5:47:31,  2.26s/it] 54%|████████████████████████████████████████████▍                                     | 10899/20117 [6:53:21<5:44:38,  2.24s/it] 54%|████████████████████████████████████████████▍                                     | 10900/20117 [6:53:23<5:40:55,  2.22s/it]                                                                                                                                 {'loss': 0.1861, 'grad_norm': 0.43057945370674133, 'learning_rate': 8.762520398933065e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.02, 'epoch': 1.08}
 54%|████████████████████████████████████████████▍                                     | 10900/20117 [6:53:23<5:40:55,  2.22s/it] 54%|████████████████████████████████████████████▍                                     | 10901/20117 [6:53:25<5:41:44,  2.22s/it] 54%|████████████████████████████████████████████▍                                     | 10902/20117 [6:53:27<5:44:58,  2.25s/it] 54%|████████████████████████████████████████████▍                                     | 10903/20117 [6:53:30<5:46:23,  2.26s/it] 54%|████████████████████████████████████████████▍                                     | 10904/20117 [6:53:32<5:48:30,  2.27s/it] 54%|████████████████████████████████████████████▍                                     | 10905/20117 [6:53:34<5:52:49,  2.30s/it] 54%|████████████████████████████████████████████▍                                     | 10906/20117 [6:53:37<5:50:34,  2.28s/it] 54%|████████████████████████████████████████████▍                                     | 10907/20117 [6:53:39<5:49:14,  2.28s/it] 54%|████████████████████████████████████████████▍                                     | 10908/20117 [6:53:41<5:47:52,  2.27s/it] 54%|████████████████████████████████████████████▍                                     | 10909/20117 [6:53:43<5:46:01,  2.25s/it] 54%|████████████████████████████████████████████▍                                     | 10910/20117 [6:53:46<5:50:10,  2.28s/it]                                                                                                                                 {'loss': 0.1294, 'grad_norm': 0.20500341057777405, 'learning_rate': 8.746947940445946e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.37, 'epoch': 1.08}
 54%|████████████████████████████████████████████▍                                     | 10910/20117 [6:53:46<5:50:10,  2.28s/it] 54%|████████████████████████████████████████████▍                                     | 10911/20117 [6:53:48<5:48:20,  2.27s/it] 54%|████████████████████████████████████████████▍                                     | 10912/20117 [6:53:50<5:46:34,  2.26s/it] 54%|████████████████████████████████████████████▍                                     | 10913/20117 [6:53:52<5:43:38,  2.24s/it] 54%|████████████████████████████████████████████▍                                     | 10914/20117 [6:53:54<5:38:56,  2.21s/it] 54%|████████████████████████████████████████████▍                                     | 10915/20117 [6:53:57<5:35:00,  2.18s/it] 54%|████████████████████████████████████████████▍                                     | 10916/20117 [6:53:59<5:41:47,  2.23s/it] 54%|████████████████████████████████████████████▍                                     | 10917/20117 [6:54:01<5:47:44,  2.27s/it] 54%|████████████████████████████████████████████▌                                     | 10918/20117 [6:54:04<5:49:30,  2.28s/it] 54%|████████████████████████████████████████████▌                                     | 10919/20117 [6:54:06<5:57:19,  2.33s/it] 54%|████████████████████████████████████████████▌                                     | 10920/20117 [6:54:08<5:58:58,  2.34s/it]                                                                                                                                 {'loss': 0.1633, 'grad_norm': 0.5553747415542603, 'learning_rate': 8.73137856849089e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 269.33, 'epoch': 1.09}
 54%|████████████████████████████████████████████▌                                     | 10920/20117 [6:54:08<5:58:58,  2.34s/it] 54%|████████████████████████████████████████████▌                                     | 10921/20117 [6:54:11<5:58:34,  2.34s/it] 54%|████████████████████████████████████████████▌                                     | 10922/20117 [6:54:13<5:59:12,  2.34s/it] 54%|████████████████████████████████████████████▌                                     | 10923/20117 [6:54:15<5:56:27,  2.33s/it] 54%|████████████████████████████████████████████▌                                     | 10924/20117 [6:54:18<5:53:42,  2.31s/it] 54%|████████████████████████████████████████████▌                                     | 10925/20117 [6:54:20<5:56:34,  2.33s/it] 54%|████████████████████████████████████████████▌                                     | 10926/20117 [6:54:22<5:57:09,  2.33s/it] 54%|████████████████████████████████████████████▌                                     | 10927/20117 [6:54:25<5:56:41,  2.33s/it] 54%|████████████████████████████████████████████▌                                     | 10928/20117 [6:54:27<5:54:39,  2.32s/it] 54%|████████████████████████████████████████████▌                                     | 10929/20117 [6:54:29<5:55:31,  2.32s/it] 54%|████████████████████████████████████████████▌                                     | 10930/20117 [6:54:32<5:53:13,  2.31s/it]                                                                                                                                 {'loss': 0.1754, 'grad_norm': 0.4564213156700134, 'learning_rate': 8.715812321418546e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.99, 'epoch': 1.09}
 54%|████████████████████████████████████████████▌                                     | 10930/20117 [6:54:32<5:53:13,  2.31s/it] 54%|████████████████████████████████████████████▌                                     | 10931/20117 [6:54:34<5:50:31,  2.29s/it] 54%|████████████████████████████████████████████▌                                     | 10932/20117 [6:54:36<5:50:39,  2.29s/it] 54%|████████████████████████████████████████████▌                                     | 10933/20117 [6:54:38<5:49:16,  2.28s/it] 54%|████████████████████████████████████████████▌                                     | 10934/20117 [6:54:41<5:51:46,  2.30s/it] 54%|████████████████████████████████████████████▌                                     | 10935/20117 [6:54:43<5:54:07,  2.31s/it] 54%|████████████████████████████████████████████▌                                     | 10936/20117 [6:54:45<5:52:32,  2.30s/it] 54%|████████████████████████████████████████████▌                                     | 10937/20117 [6:54:48<5:52:34,  2.30s/it] 54%|████████████████████████████████████████████▌                                     | 10938/20117 [6:54:50<5:53:57,  2.31s/it] 54%|████████████████████████████████████████████▌                                     | 10939/20117 [6:54:52<5:50:55,  2.29s/it] 54%|████████████████████████████████████████████▌                                     | 10940/20117 [6:54:55<5:52:28,  2.30s/it]                                                                                                                                 {'loss': 0.1829, 'grad_norm': 0.3952537178993225, 'learning_rate': 8.700249237571879e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.04, 'epoch': 1.09}
 54%|████████████████████████████████████████████▌                                     | 10940/20117 [6:54:55<5:52:28,  2.30s/it] 54%|████████████████████████████████████████████▌                                     | 10941/20117 [6:54:57<5:51:55,  2.30s/it] 54%|████████████████████████████████████████████▌                                     | 10942/20117 [6:54:59<5:53:15,  2.31s/it] 54%|████████████████████████████████████████████▌                                     | 10943/20117 [6:55:02<6:10:25,  2.42s/it] 54%|████████████████████████████████████████████▌                                     | 10944/20117 [6:55:04<6:02:43,  2.37s/it] 54%|████████████████████████████████████████████▌                                     | 10945/20117 [6:55:06<5:57:47,  2.34s/it] 54%|████████████████████████████████████████████▌                                     | 10946/20117 [6:55:09<5:53:57,  2.32s/it] 54%|████████████████████████████████████████████▌                                     | 10947/20117 [6:55:11<5:52:54,  2.31s/it] 54%|████████████████████████████████████████████▋                                     | 10948/20117 [6:55:14<6:08:41,  2.41s/it] 54%|████████████████████████████████████████████▋                                     | 10949/20117 [6:55:16<6:02:42,  2.37s/it] 54%|████████████████████████████████████████████▋                                     | 10950/20117 [6:55:18<5:58:54,  2.35s/it]                                                                                                                                 {'loss': 0.1488, 'grad_norm': 0.45536085963249207, 'learning_rate': 8.684689355286045e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.96, 'epoch': 1.09}
 54%|████████████████████████████████████████████▋                                     | 10950/20117 [6:55:18<5:58:54,  2.35s/it] 54%|████████████████████████████████████████████▋                                     | 10951/20117 [6:55:20<5:54:09,  2.32s/it] 54%|████████████████████████████████████████████▋                                     | 10952/20117 [6:55:23<5:54:39,  2.32s/it] 54%|████████████████████████████████████████████▋                                     | 10953/20117 [6:55:25<5:51:13,  2.30s/it] 54%|████████████████████████████████████████████▋                                     | 10954/20117 [6:55:27<5:51:54,  2.30s/it] 54%|████████████████████████████████████████████▋                                     | 10955/20117 [6:55:30<5:52:10,  2.31s/it] 54%|████████████████████████████████████████████▋                                     | 10956/20117 [6:55:32<5:49:46,  2.29s/it] 54%|████████████████████████████████████████████▋                                     | 10957/20117 [6:55:34<5:48:08,  2.28s/it] 54%|████████████████████████████████████████████▋                                     | 10958/20117 [6:55:36<5:46:31,  2.27s/it] 54%|████████████████████████████████████████████▋                                     | 10959/20117 [6:55:39<5:48:46,  2.29s/it] 54%|████████████████████████████████████████████▋                                     | 10960/20117 [6:55:41<5:46:37,  2.27s/it]                                                                                                                                 {'loss': 0.187, 'grad_norm': 0.45513296127319336, 'learning_rate': 8.669132712888328e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.08, 'epoch': 1.09}
 54%|████████████████████████████████████████████▋                                     | 10960/20117 [6:55:41<5:46:37,  2.27s/it] 54%|████████████████████████████████████████████▋                                     | 10961/20117 [6:55:43<5:45:37,  2.26s/it] 54%|████████████████████████████████████████████▋                                     | 10962/20117 [6:55:45<5:44:57,  2.26s/it] 54%|████████████████████████████████████████████▋                                     | 10963/20117 [6:55:48<5:44:41,  2.26s/it] 55%|████████████████████████████████████████████▋                                     | 10964/20117 [6:55:50<5:45:09,  2.26s/it] 55%|████████████████████████████████████████████▋                                     | 10965/20117 [6:55:52<5:46:59,  2.27s/it] 55%|████████████████████████████████████████████▋                                     | 10966/20117 [6:55:55<5:49:40,  2.29s/it] 55%|████████████████████████████████████████████▋                                     | 10967/20117 [6:55:57<5:48:00,  2.28s/it] 55%|████████████████████████████████████████████▋                                     | 10968/20117 [6:55:59<5:47:43,  2.28s/it] 55%|████████████████████████████████████████████▋                                     | 10969/20117 [6:56:01<5:49:43,  2.29s/it] 55%|████████████████████████████████████████████▋                                     | 10970/20117 [6:56:04<5:48:55,  2.29s/it]                                                                                                                                 {'loss': 0.142, 'grad_norm': 0.44525572657585144, 'learning_rate': 8.653579348698021e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 307.76, 'epoch': 1.09}
 55%|████████████████████████████████████████████▋                                     | 10970/20117 [6:56:04<5:48:55,  2.29s/it] 55%|████████████████████████████████████████████▋                                     | 10971/20117 [6:56:06<5:49:23,  2.29s/it] 55%|████████████████████████████████████████████▋                                     | 10972/20117 [6:56:08<5:49:09,  2.29s/it] 55%|████████████████████████████████████████████▋                                     | 10973/20117 [6:56:11<5:49:46,  2.30s/it] 55%|████████████████████████████████████████████▋                                     | 10974/20117 [6:56:13<5:51:03,  2.30s/it] 55%|████████████████████████████████████████████▋                                     | 10975/20117 [6:56:15<5:51:16,  2.31s/it] 55%|████████████████████████████████████████████▋                                     | 10976/20117 [6:56:18<5:48:55,  2.29s/it] 55%|████████████████████████████████████████████▋                                     | 10977/20117 [6:56:20<5:45:32,  2.27s/it] 55%|████████████████████████████████████████████▋                                     | 10978/20117 [6:56:22<5:48:05,  2.29s/it] 55%|████████████████████████████████████████████▊                                     | 10979/20117 [6:56:24<5:48:05,  2.29s/it] 55%|████████████████████████████████████████████▊                                     | 10980/20117 [6:56:27<5:50:14,  2.30s/it]                                                                                                                                 {'loss': 0.2135, 'grad_norm': 0.5557878613471985, 'learning_rate': 8.638029301026351e-05, 'memory/max_active (GiB)': 19.82, 'memory/max_allocated (GiB)': 19.82, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 408.3, 'epoch': 1.09}
 55%|████████████████████████████████████████████▊                                     | 10980/20117 [6:56:27<5:50:14,  2.30s/it] 55%|████████████████████████████████████████████▊                                     | 10981/20117 [6:56:29<5:50:41,  2.30s/it] 55%|████████████████████████████████████████████▊                                     | 10982/20117 [6:56:31<5:51:39,  2.31s/it] 55%|████████████████████████████████████████████▊                                     | 10983/20117 [6:56:34<5:53:42,  2.32s/it] 55%|████████████████████████████████████████████▊                                     | 10984/20117 [6:56:36<5:54:26,  2.33s/it] 55%|████████████████████████████████████████████▊                                     | 10985/20117 [6:56:38<5:53:59,  2.33s/it] 55%|████████████████████████████████████████████▊                                     | 10986/20117 [6:56:41<5:57:51,  2.35s/it] 55%|████████████████████████████████████████████▊                                     | 10987/20117 [6:56:43<5:53:54,  2.33s/it] 55%|████████████████████████████████████████████▊                                     | 10988/20117 [6:56:45<5:55:16,  2.34s/it] 55%|████████████████████████████████████████████▊                                     | 10989/20117 [6:56:48<5:53:20,  2.32s/it] 55%|████████████████████████████████████████████▊                                     | 10990/20117 [6:56:50<5:55:52,  2.34s/it]                                                                                                                                 {'loss': 0.174, 'grad_norm': 0.3728674054145813, 'learning_rate': 8.622482608176374e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.34, 'epoch': 1.09}
 55%|████████████████████████████████████████████▊                                     | 10990/20117 [6:56:50<5:55:52,  2.34s/it] 55%|████████████████████████████████████████████▊                                     | 10991/20117 [6:56:52<5:55:04,  2.33s/it] 55%|████████████████████████████████████████████▊                                     | 10992/20117 [6:56:55<5:52:58,  2.32s/it] 55%|████████████████████████████████████████████▊                                     | 10993/20117 [6:56:57<5:50:38,  2.31s/it] 55%|████████████████████████████████████████████▊                                     | 10994/20117 [6:56:59<5:49:26,  2.30s/it] 55%|████████████████████████████████████████████▊                                     | 10995/20117 [6:57:02<6:06:22,  2.41s/it] 55%|████████████████████████████████████████████▊                                     | 10996/20117 [6:57:04<6:03:22,  2.39s/it] 55%|████████████████████████████████████████████▊                                     | 10997/20117 [6:57:07<5:59:03,  2.36s/it] 55%|████████████████████████████████████████████▊                                     | 10998/20117 [6:57:09<5:54:55,  2.34s/it] 55%|████████████████████████████████████████████▊                                     | 10999/20117 [6:57:11<5:53:50,  2.33s/it] 55%|████████████████████████████████████████████▊                                     | 11000/20117 [6:57:13<5:51:47,  2.32s/it]                                                                                                                                 {'loss': 0.1752, 'grad_norm': 0.38084620237350464, 'learning_rate': 8.606939308442877e-05, 'memory/max_active (GiB)': 19.24, 'memory/max_allocated (GiB)': 19.24, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 280.4, 'epoch': 1.09}
 55%|████████████████████████████████████████████▊                                     | 11000/20117 [6:57:13<5:51:47,  2.32s/it] 55%|████████████████████████████████████████████▊                                     | 11001/20117 [6:57:16<5:51:13,  2.31s/it] 55%|████████████████████████████████████████████▊                                     | 11002/20117 [6:57:18<5:52:36,  2.32s/it] 55%|████████████████████████████████████████████▊                                     | 11003/20117 [6:57:20<5:49:07,  2.30s/it] 55%|████████████████████████████████████████████▊                                     | 11004/20117 [6:57:23<5:47:44,  2.29s/it] 55%|████████████████████████████████████████████▊                                     | 11005/20117 [6:57:25<5:50:45,  2.31s/it] 55%|████████████████████████████████████████████▊                                     | 11006/20117 [6:57:27<5:49:10,  2.30s/it] 55%|████████████████████████████████████████████▊                                     | 11007/20117 [6:57:30<5:50:10,  2.31s/it] 55%|████████████████████████████████████████████▊                                     | 11008/20117 [6:57:32<5:51:00,  2.31s/it] 55%|████████████████████████████████████████████▊                                     | 11009/20117 [6:57:34<5:48:52,  2.30s/it] 55%|████████████████████████████████████████████▉                                     | 11010/20117 [6:57:36<5:50:40,  2.31s/it]                                                                                                                                 {'loss': 0.1787, 'grad_norm': 0.47729748487472534, 'learning_rate': 8.591399440112296e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.64, 'epoch': 1.09}
 55%|████████████████████████████████████████████▉                                     | 11010/20117 [6:57:36<5:50:40,  2.31s/it] 55%|████████████████████████████████████████████▉                                     | 11011/20117 [6:57:39<5:51:57,  2.32s/it] 55%|████████████████████████████████████████████▉                                     | 11012/20117 [6:57:41<5:54:16,  2.33s/it] 55%|████████████████████████████████████████████▉                                     | 11013/20117 [6:57:43<5:52:47,  2.33s/it] 55%|████████████████████████████████████████████▉                                     | 11014/20117 [6:57:46<5:49:06,  2.30s/it] 55%|████████████████████████████████████████████▉                                     | 11015/20117 [6:57:48<5:47:30,  2.29s/it] 55%|████████████████████████████████████████████▉                                     | 11016/20117 [6:57:50<5:47:10,  2.29s/it] 55%|████████████████████████████████████████████▉                                     | 11017/20117 [6:57:53<5:51:19,  2.32s/it] 55%|████████████████████████████████████████████▉                                     | 11018/20117 [6:57:55<5:56:07,  2.35s/it] 55%|████████████████████████████████████████████▉                                     | 11019/20117 [6:57:57<5:54:58,  2.34s/it] 55%|████████████████████████████████████████████▉                                     | 11020/20117 [6:58:00<5:51:09,  2.32s/it]                                                                                                                                 {'loss': 0.176, 'grad_norm': 0.567870020866394, 'learning_rate': 8.575863041462603e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.03, 'epoch': 1.1}
 55%|████████████████████████████████████████████▉                                     | 11020/20117 [6:58:00<5:51:09,  2.32s/it] 55%|████████████████████████████████████████████▉                                     | 11021/20117 [6:58:02<5:51:02,  2.32s/it] 55%|████████████████████████████████████████████▉                                     | 11022/20117 [6:58:04<5:50:24,  2.31s/it] 55%|████████████████████████████████████████████▉                                     | 11023/20117 [6:58:07<5:51:21,  2.32s/it] 55%|████████████████████████████████████████████▉                                     | 11024/20117 [6:58:09<5:53:17,  2.33s/it] 55%|████████████████████████████████████████████▉                                     | 11025/20117 [6:58:11<5:56:00,  2.35s/it] 55%|████████████████████████████████████████████▉                                     | 11026/20117 [6:58:14<5:52:45,  2.33s/it] 55%|████████████████████████████████████████████▉                                     | 11027/20117 [6:58:16<5:53:58,  2.34s/it] 55%|████████████████████████████████████████████▉                                     | 11028/20117 [6:58:18<5:49:27,  2.31s/it] 55%|████████████████████████████████████████████▉                                     | 11029/20117 [6:58:20<5:47:24,  2.29s/it] 55%|████████████████████████████████████████████▉                                     | 11030/20117 [6:58:23<5:45:41,  2.28s/it]                                                                                                                                 {'loss': 0.1765, 'grad_norm': 0.37033653259277344, 'learning_rate': 8.560330150763243e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.79, 'epoch': 1.1}
 55%|████████████████████████████████████████████▉                                     | 11030/20117 [6:58:23<5:45:41,  2.28s/it] 55%|████████████████████████████████████████████▉                                     | 11031/20117 [6:58:25<5:43:03,  2.27s/it] 55%|████████████████████████████████████████████▉                                     | 11032/20117 [6:58:27<5:43:56,  2.27s/it] 55%|████████████████████████████████████████████▉                                     | 11033/20117 [6:58:30<5:43:54,  2.27s/it] 55%|████████████████████████████████████████████▉                                     | 11034/20117 [6:58:32<5:45:40,  2.28s/it] 55%|████████████████████████████████████████████▉                                     | 11035/20117 [6:58:34<5:48:09,  2.30s/it] 55%|████████████████████████████████████████████▉                                     | 11036/20117 [6:58:36<5:46:44,  2.29s/it] 55%|████████████████████████████████████████████▉                                     | 11037/20117 [6:58:39<5:48:48,  2.30s/it] 55%|████████████████████████████████████████████▉                                     | 11038/20117 [6:58:41<5:49:43,  2.31s/it] 55%|████████████████████████████████████████████▉                                     | 11039/20117 [6:58:43<5:51:01,  2.32s/it] 55%|█████████████████████████████████████████████                                     | 11040/20117 [6:58:46<5:50:01,  2.31s/it]                                                                                                                                 {'loss': 0.2017, 'grad_norm': 0.6800894141197205, 'learning_rate': 8.544800806274998e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.81, 'epoch': 1.1}
 55%|█████████████████████████████████████████████                                     | 11040/20117 [6:58:46<5:50:01,  2.31s/it] 55%|█████████████████████████████████████████████                                     | 11041/20117 [6:58:48<5:48:29,  2.30s/it] 55%|█████████████████████████████████████████████                                     | 11042/20117 [6:58:50<5:45:32,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11043/20117 [6:58:53<5:44:12,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11044/20117 [6:58:55<5:45:09,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11045/20117 [6:58:57<5:43:42,  2.27s/it] 55%|█████████████████████████████████████████████                                     | 11046/20117 [6:59:00<6:00:12,  2.38s/it] 55%|█████████████████████████████████████████████                                     | 11047/20117 [6:59:02<5:53:31,  2.34s/it] 55%|█████████████████████████████████████████████                                     | 11048/20117 [6:59:04<5:50:08,  2.32s/it] 55%|█████████████████████████████████████████████                                     | 11049/20117 [6:59:07<5:51:53,  2.33s/it] 55%|█████████████████████████████████████████████                                     | 11050/20117 [6:59:09<5:48:44,  2.31s/it]                                                                                                                                 {'loss': 0.245, 'grad_norm': 0.6861316561698914, 'learning_rate': 8.529275046249934e-05, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 400.94, 'epoch': 1.1}
 55%|█████████████████████████████████████████████                                     | 11050/20117 [6:59:09<5:48:44,  2.31s/it] 55%|█████████████████████████████████████████████                                     | 11051/20117 [6:59:11<5:46:31,  2.29s/it] 55%|█████████████████████████████████████████████                                     | 11052/20117 [6:59:13<5:43:49,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11053/20117 [6:59:16<5:44:19,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11054/20117 [6:59:18<5:45:05,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11055/20117 [6:59:20<5:44:14,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11056/20117 [6:59:22<5:43:39,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11057/20117 [6:59:25<5:47:19,  2.30s/it] 55%|█████████████████████████████████████████████                                     | 11058/20117 [6:59:27<5:45:40,  2.29s/it] 55%|█████████████████████████████████████████████                                     | 11059/20117 [6:59:29<5:46:08,  2.29s/it] 55%|█████████████████████████████████████████████                                     | 11060/20117 [6:59:32<5:47:58,  2.31s/it]                                                                                                                                 {'loss': 0.1653, 'grad_norm': 0.5849536657333374, 'learning_rate': 8.513752908931273e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 383.81, 'epoch': 1.1}
 55%|█████████████████████████████████████████████                                     | 11060/20117 [6:59:32<5:47:58,  2.31s/it] 55%|█████████████████████████████████████████████                                     | 11061/20117 [6:59:34<5:46:51,  2.30s/it] 55%|█████████████████████████████████████████████                                     | 11062/20117 [6:59:36<5:46:20,  2.29s/it] 55%|█████████████████████████████████████████████                                     | 11063/20117 [6:59:39<5:47:47,  2.30s/it] 55%|█████████████████████████████████████████████                                     | 11064/20117 [6:59:41<5:43:50,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11065/20117 [6:59:43<5:45:32,  2.29s/it] 55%|█████████████████████████████████████████████                                     | 11066/20117 [6:59:45<5:42:11,  2.27s/it] 55%|█████████████████████████████████████████████                                     | 11067/20117 [6:59:48<5:43:12,  2.28s/it] 55%|█████████████████████████████████████████████                                     | 11068/20117 [6:59:50<5:46:20,  2.30s/it] 55%|█████████████████████████████████████████████                                     | 11069/20117 [6:59:52<5:45:16,  2.29s/it] 55%|█████████████████████████████████████████████                                     | 11070/20117 [6:59:55<5:44:20,  2.28s/it]                                                                                                                                 {'loss': 0.1368, 'grad_norm': 0.43626952171325684, 'learning_rate': 8.498234432553328e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.97, 'epoch': 1.1}
 55%|█████████████████████████████████████████████                                     | 11070/20117 [6:59:55<5:44:20,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11071/20117 [6:59:57<5:44:35,  2.29s/it] 55%|█████████████████████████████████████████████▏                                    | 11072/20117 [6:59:59<5:42:16,  2.27s/it] 55%|█████████████████████████████████████████████▏                                    | 11073/20117 [7:00:01<5:43:17,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11074/20117 [7:00:04<5:42:10,  2.27s/it] 55%|█████████████████████████████████████████████▏                                    | 11075/20117 [7:00:06<5:45:06,  2.29s/it] 55%|█████████████████████████████████████████████▏                                    | 11076/20117 [7:00:08<5:45:14,  2.29s/it] 55%|█████████████████████████████████████████████▏                                    | 11077/20117 [7:00:10<5:42:55,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11078/20117 [7:00:13<5:42:25,  2.27s/it] 55%|█████████████████████████████████████████████▏                                    | 11079/20117 [7:00:15<5:43:38,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11080/20117 [7:00:17<5:43:56,  2.28s/it]                                                                                                                                 {'loss': 0.1739, 'grad_norm': 0.5107852816581726, 'learning_rate': 8.482719655341374e-05, 'memory/max_active (GiB)': 21.54, 'memory/max_allocated (GiB)': 21.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.13, 'epoch': 1.1}
 55%|█████████████████████████████████████████████▏                                    | 11080/20117 [7:00:17<5:43:56,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11081/20117 [7:00:20<5:43:56,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11082/20117 [7:00:22<5:43:27,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11083/20117 [7:00:24<5:39:35,  2.26s/it] 55%|█████████████████████████████████████████████▏                                    | 11084/20117 [7:00:26<5:34:49,  2.22s/it] 55%|█████████████████████████████████████████████▏                                    | 11085/20117 [7:00:28<5:31:13,  2.20s/it] 55%|█████████████████████████████████████████████▏                                    | 11086/20117 [7:00:31<5:29:20,  2.19s/it] 55%|█████████████████████████████████████████████▏                                    | 11087/20117 [7:00:33<5:27:53,  2.18s/it] 55%|█████████████████████████████████████████████▏                                    | 11088/20117 [7:00:35<5:30:55,  2.20s/it] 55%|█████████████████████████████████████████████▏                                    | 11089/20117 [7:00:37<5:36:24,  2.24s/it] 55%|█████████████████████████████████████████████▏                                    | 11090/20117 [7:00:40<5:40:13,  2.26s/it]                                                                                                                                 {'loss': 0.1172, 'grad_norm': 0.4791623651981354, 'learning_rate': 8.467208615511599e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.57, 'epoch': 1.1}
 55%|█████████████████████████████████████████████▏                                    | 11090/20117 [7:00:40<5:40:13,  2.26s/it] 55%|█████████████████████████████████████████████▏                                    | 11091/20117 [7:00:42<5:40:22,  2.26s/it] 55%|█████████████████████████████████████████████▏                                    | 11092/20117 [7:00:44<5:43:37,  2.28s/it] 55%|█████████████████████████████████████████████▏                                    | 11093/20117 [7:00:46<5:44:09,  2.29s/it] 55%|█████████████████████████████████████████████▏                                    | 11094/20117 [7:00:49<5:44:04,  2.29s/it] 55%|█████████████████████████████████████████████▏                                    | 11095/20117 [7:00:51<5:44:59,  2.29s/it] 55%|█████████████████████████████████████████████▏                                    | 11096/20117 [7:00:53<5:45:04,  2.30s/it] 55%|█████████████████████████████████████████████▏                                    | 11097/20117 [7:00:56<5:43:53,  2.29s/it] 55%|█████████████████████████████████████████████▏                                    | 11098/20117 [7:00:58<5:39:34,  2.26s/it] 55%|█████████████████████████████████████████████▏                                    | 11099/20117 [7:01:00<5:35:02,  2.23s/it] 55%|█████████████████████████████████████████████▏                                    | 11100/20117 [7:01:02<5:46:03,  2.30s/it]                                                                                                                                 {'loss': 0.168, 'grad_norm': 0.4793543219566345, 'learning_rate': 8.451701351270965e-05, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 280.35, 'epoch': 1.1}
 55%|█████████████████████████████████████████████▏                                    | 11100/20117 [7:01:02<5:46:03,  2.30s/it] 55%|█████████████████████████████████████████████▏                                    | 11101/20117 [7:01:05<5:48:19,  2.32s/it] 55%|█████████████████████████████████████████████▎                                    | 11102/20117 [7:01:07<5:49:35,  2.33s/it] 55%|█████████████████████████████████████████████▎                                    | 11103/20117 [7:01:10<5:52:14,  2.34s/it] 55%|█████████████████████████████████████████████▎                                    | 11104/20117 [7:01:12<5:53:24,  2.35s/it] 55%|█████████████████████████████████████████████▎                                    | 11105/20117 [7:01:14<5:48:38,  2.32s/it] 55%|█████████████████████████████████████████████▎                                    | 11106/20117 [7:01:16<5:47:03,  2.31s/it] 55%|█████████████████████████████████████████████▎                                    | 11107/20117 [7:01:19<5:47:19,  2.31s/it] 55%|█████████████████████████████████████████████▎                                    | 11108/20117 [7:01:21<5:45:20,  2.30s/it] 55%|█████████████████████████████████████████████▎                                    | 11109/20117 [7:01:23<5:43:49,  2.29s/it] 55%|█████████████████████████████████████████████▎                                    | 11110/20117 [7:01:26<5:44:49,  2.30s/it]                                                                                                                                 {'loss': 0.1585, 'grad_norm': 0.41162610054016113, 'learning_rate': 8.436197900817145e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.7, 'epoch': 1.1}
 55%|█████████████████████████████████████████████▎                                    | 11110/20117 [7:01:26<5:44:49,  2.30s/it] 55%|█████████████████████████████████████████████▎                                    | 11111/20117 [7:01:28<5:44:56,  2.30s/it] 55%|█████████████████████████████████████████████▎                                    | 11112/20117 [7:01:30<5:45:38,  2.30s/it] 55%|█████████████████████████████████████████████▎                                    | 11113/20117 [7:01:32<5:42:02,  2.28s/it] 55%|█████████████████████████████████████████████▎                                    | 11114/20117 [7:01:35<5:39:25,  2.26s/it] 55%|█████████████████████████████████████████████▎                                    | 11115/20117 [7:01:37<5:41:19,  2.28s/it] 55%|█████████████████████████████████████████████▎                                    | 11116/20117 [7:01:39<5:43:43,  2.29s/it] 55%|█████████████████████████████████████████████▎                                    | 11117/20117 [7:01:42<5:42:49,  2.29s/it] 55%|█████████████████████████████████████████████▎                                    | 11118/20117 [7:01:44<5:44:08,  2.29s/it] 55%|█████████████████████████████████████████████▎                                    | 11119/20117 [7:01:46<5:40:52,  2.27s/it] 55%|█████████████████████████████████████████████▎                                    | 11120/20117 [7:01:48<5:35:47,  2.24s/it]                                                                                                                                 {'loss': 0.165, 'grad_norm': 0.7751170992851257, 'learning_rate': 8.420698302338407e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.65, 'epoch': 1.11}
 55%|█████████████████████████████████████████████▎                                    | 11120/20117 [7:01:48<5:35:47,  2.24s/it] 55%|█████████████████████████████████████████████▎                                    | 11121/20117 [7:01:50<5:31:58,  2.21s/it] 55%|█████████████████████████████████████████████▎                                    | 11122/20117 [7:01:53<5:30:48,  2.21s/it] 55%|█████████████████████████████████████████████▎                                    | 11123/20117 [7:01:55<5:30:42,  2.21s/it] 55%|█████████████████████████████████████████████▎                                    | 11124/20117 [7:01:57<5:29:51,  2.20s/it] 55%|█████████████████████████████████████████████▎                                    | 11125/20117 [7:01:59<5:28:27,  2.19s/it] 55%|█████████████████████████████████████████████▎                                    | 11126/20117 [7:02:01<5:29:35,  2.20s/it] 55%|█████████████████████████████████████████████▎                                    | 11127/20117 [7:02:04<5:31:13,  2.21s/it] 55%|█████████████████████████████████████████████▎                                    | 11128/20117 [7:02:06<5:29:43,  2.20s/it] 55%|█████████████████████████████████████████████▎                                    | 11129/20117 [7:02:08<5:27:35,  2.19s/it] 55%|█████████████████████████████████████████████▎                                    | 11130/20117 [7:02:10<5:26:44,  2.18s/it]                                                                                                                                 {'loss': 0.1529, 'grad_norm': 0.2921695113182068, 'learning_rate': 8.405202594013546e-05, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.72, 'epoch': 1.11}
 55%|█████████████████████████████████████████████▎                                    | 11130/20117 [7:02:10<5:26:44,  2.18s/it] 55%|█████████████████████████████████████████████▎                                    | 11131/20117 [7:02:12<5:27:39,  2.19s/it] 55%|█████████████████████████████████████████████▍                                    | 11132/20117 [7:02:15<5:27:45,  2.19s/it] 55%|█████████████████████████████████████████████▍                                    | 11133/20117 [7:02:17<5:29:36,  2.20s/it] 55%|█████████████████████████████████████████████▍                                    | 11134/20117 [7:02:19<5:31:08,  2.21s/it] 55%|█████████████████████████████████████████████▍                                    | 11135/20117 [7:02:21<5:37:01,  2.25s/it] 55%|█████████████████████████████████████████████▍                                    | 11136/20117 [7:02:24<5:43:34,  2.30s/it] 55%|█████████████████████████████████████████████▍                                    | 11137/20117 [7:02:26<5:42:52,  2.29s/it] 55%|█████████████████████████████████████████████▍                                    | 11138/20117 [7:02:28<5:39:49,  2.27s/it] 55%|█████████████████████████████████████████████▍                                    | 11139/20117 [7:02:31<5:39:26,  2.27s/it] 55%|█████████████████████████████████████████████▍                                    | 11140/20117 [7:02:33<5:36:25,  2.25s/it]                                                                                                                                 {'loss': 0.201, 'grad_norm': 0.5489593744277954, 'learning_rate': 8.389710814011764e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 401.46, 'epoch': 1.11}
 55%|█████████████████████████████████████████████▍                                    | 11140/20117 [7:02:33<5:36:25,  2.25s/it] 55%|█████████████████████████████████████████████▍                                    | 11141/20117 [7:02:35<5:34:08,  2.23s/it] 55%|█████████████████████████████████████████████▍                                    | 11142/20117 [7:02:37<5:39:57,  2.27s/it] 55%|█████████████████████████████████████████████▍                                    | 11143/20117 [7:02:40<5:49:12,  2.33s/it] 55%|█████████████████████████████████████████████▍                                    | 11144/20117 [7:02:42<5:54:45,  2.37s/it] 55%|█████████████████████████████████████████████▍                                    | 11145/20117 [7:02:45<6:00:05,  2.41s/it] 55%|█████████████████████████████████████████████▍                                    | 11146/20117 [7:02:47<6:04:41,  2.44s/it] 55%|█████████████████████████████████████████████▍                                    | 11147/20117 [7:02:50<6:18:20,  2.53s/it] 55%|█████████████████████████████████████████████▍                                    | 11148/20117 [7:02:53<6:24:22,  2.57s/it] 55%|█████████████████████████████████████████████▍                                    | 11149/20117 [7:02:55<6:24:51,  2.57s/it] 55%|█████████████████████████████████████████████▍                                    | 11150/20117 [7:02:58<6:21:59,  2.56s/it]                                                                                                                                 {'loss': 0.1832, 'grad_norm': 0.5984140634536743, 'learning_rate': 8.37422300049259e-05, 'memory/max_active (GiB)': 20.46, 'memory/max_allocated (GiB)': 20.46, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.29, 'epoch': 1.11}
 55%|█████████████████████████████████████████████▍                                    | 11150/20117 [7:02:58<6:21:59,  2.56s/it] 55%|█████████████████████████████████████████████▍                                    | 11151/20117 [7:03:01<6:54:15,  2.77s/it] 55%|█████████████████████████████████████████████▍                                    | 11152/20117 [7:03:04<6:44:21,  2.71s/it] 55%|█████████████████████████████████████████████▍                                    | 11153/20117 [7:03:06<6:33:35,  2.63s/it] 55%|█████████████████████████████████████████████▍                                    | 11154/20117 [7:03:09<6:25:40,  2.58s/it] 55%|█████████████████████████████████████████████▍                                    | 11155/20117 [7:03:11<6:25:26,  2.58s/it] 55%|█████████████████████████████████████████████▍                                    | 11156/20117 [7:03:14<6:40:44,  2.68s/it] 55%|█████████████████████████████████████████████▍                                    | 11157/20117 [7:03:17<6:34:12,  2.64s/it] 55%|█████████████████████████████████████████████▍                                    | 11158/20117 [7:03:19<6:28:35,  2.60s/it] 55%|█████████████████████████████████████████████▍                                    | 11159/20117 [7:03:22<6:22:11,  2.56s/it] 55%|█████████████████████████████████████████████▍                                    | 11160/20117 [7:03:24<6:18:43,  2.54s/it]                                                                                                                                 {'loss': 0.1515, 'grad_norm': 0.4428146779537201, 'learning_rate': 8.358739191605783e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 295.44, 'epoch': 1.11}
 55%|█████████████████████████████████████████████▍                                    | 11160/20117 [7:03:24<6:18:43,  2.54s/it] 55%|█████████████████████████████████████████████▍                                    | 11161/20117 [7:03:27<6:23:04,  2.57s/it] 55%|█████████████████████████████████████████████▍                                    | 11162/20117 [7:03:29<6:22:13,  2.56s/it] 55%|█████████████████████████████████████████████▌                                    | 11163/20117 [7:03:32<6:17:36,  2.53s/it] 55%|█████████████████████████████████████████████▌                                    | 11164/20117 [7:03:34<6:16:20,  2.52s/it] 56%|█████████████████████████████████████████████▌                                    | 11165/20117 [7:03:37<6:15:21,  2.52s/it] 56%|█████████████████████████████████████████████▌                                    | 11166/20117 [7:03:39<6:16:34,  2.52s/it] 56%|█████████████████████████████████████████████▌                                    | 11167/20117 [7:03:42<6:17:39,  2.53s/it] 56%|█████████████████████████████████████████████▌                                    | 11168/20117 [7:03:44<6:18:52,  2.54s/it] 56%|█████████████████████████████████████████████▌                                    | 11169/20117 [7:03:47<6:20:05,  2.55s/it] 56%|█████████████████████████████████████████████▌                                    | 11170/20117 [7:03:50<6:27:07,  2.60s/it]                                                                                                                                 {'loss': 0.1363, 'grad_norm': 0.4830903112888336, 'learning_rate': 8.343259425491234e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.11, 'epoch': 1.11}
 56%|█████████████████████████████████████████████▌                                    | 11170/20117 [7:03:50<6:27:07,  2.60s/it] 56%|█████████████████████████████████████████████▌                                    | 11171/20117 [7:03:52<6:25:06,  2.58s/it] 56%|█████████████████████████████████████████████▌                                    | 11172/20117 [7:03:55<6:32:25,  2.63s/it] 56%|█████████████████████████████████████████████▌                                    | 11173/20117 [7:03:58<6:35:47,  2.66s/it] 56%|█████████████████████████████████████████████▌                                    | 11174/20117 [7:04:00<6:37:52,  2.67s/it] 56%|█████████████████████████████████████████████▌                                    | 11175/20117 [7:04:03<6:50:32,  2.75s/it] 56%|█████████████████████████████████████████████▌                                    | 11176/20117 [7:04:06<6:42:41,  2.70s/it] 56%|█████████████████████████████████████████████▌                                    | 11177/20117 [7:04:08<6:34:02,  2.64s/it] 56%|█████████████████████████████████████████████▌                                    | 11178/20117 [7:04:11<6:29:32,  2.61s/it] 56%|█████████████████████████████████████████████▌                                    | 11179/20117 [7:04:13<6:25:16,  2.59s/it] 56%|█████████████████████████████████████████████▌                                    | 11180/20117 [7:04:16<6:31:36,  2.63s/it]                                                                                                                                 {'loss': 0.1691, 'grad_norm': 0.8339301943778992, 'learning_rate': 8.327783740278882e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.19, 'epoch': 1.11}
 56%|█████████████████████████████████████████████▌                                    | 11180/20117 [7:04:16<6:31:36,  2.63s/it] 56%|█████████████████████████████████████████████▌                                    | 11181/20117 [7:04:19<6:31:48,  2.63s/it] 56%|█████████████████████████████████████████████▌                                    | 11182/20117 [7:04:21<6:26:14,  2.59s/it] 56%|█████████████████████████████████████████████▌                                    | 11183/20117 [7:04:24<6:22:57,  2.57s/it] 56%|█████████████████████████████████████████████▌                                    | 11184/20117 [7:04:26<6:21:09,  2.56s/it] 56%|█████████████████████████████████████████████▌                                    | 11185/20117 [7:04:29<6:20:31,  2.56s/it] 56%|█████████████████████████████████████████████▌                                    | 11186/20117 [7:04:31<6:21:46,  2.56s/it] 56%|█████████████████████████████████████████████▌                                    | 11187/20117 [7:04:34<6:22:04,  2.57s/it] 56%|█████████████████████████████████████████████▌                                    | 11188/20117 [7:04:37<6:22:10,  2.57s/it] 56%|█████████████████████████████████████████████▌                                    | 11189/20117 [7:04:39<6:19:32,  2.55s/it] 56%|█████████████████████████████████████████████▌                                    | 11190/20117 [7:04:42<6:17:52,  2.54s/it]                                                                                                                                 {'loss': 0.1616, 'grad_norm': 0.30678462982177734, 'learning_rate': 8.312312174088606e-05, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.77, 'epoch': 1.11}
 56%|█████████████████████████████████████████████▌                                    | 11190/20117 [7:04:42<6:17:52,  2.54s/it] 56%|█████████████████████████████████████████████▌                                    | 11191/20117 [7:04:44<6:17:13,  2.54s/it] 56%|█████████████████████████████████████████████▌                                    | 11192/20117 [7:04:47<6:16:46,  2.53s/it] 56%|█████████████████████████████████████████████▌                                    | 11193/20117 [7:04:49<6:22:32,  2.57s/it] 56%|█████████████████████████████████████████████▋                                    | 11194/20117 [7:04:52<6:21:19,  2.56s/it] 56%|█████████████████████████████████████████████▋                                    | 11195/20117 [7:04:54<6:19:41,  2.55s/it] 56%|█████████████████████████████████████████████▋                                    | 11196/20117 [7:04:57<6:17:54,  2.54s/it] 56%|█████████████████████████████████████████████▋                                    | 11197/20117 [7:05:00<6:19:59,  2.56s/it] 56%|█████████████████████████████████████████████▋                                    | 11198/20117 [7:05:02<6:21:39,  2.57s/it] 56%|█████████████████████████████████████████████▋                                    | 11199/20117 [7:05:05<6:24:06,  2.58s/it] 56%|█████████████████████████████████████████████▋                                    | 11200/20117 [7:05:07<6:23:20,  2.58s/it]                                                                                                                                 {'loss': 0.1807, 'grad_norm': 0.1882571429014206, 'learning_rate': 8.296844765030147e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.03, 'epoch': 1.11}
 56%|█████████████████████████████████████████████▋                                    | 11200/20117 [7:05:07<6:23:20,  2.58s/it] 56%|█████████████████████████████████████████████▋                                    | 11201/20117 [7:05:10<6:19:49,  2.56s/it] 56%|█████████████████████████████████████████████▋                                    | 11202/20117 [7:05:12<6:15:26,  2.53s/it] 56%|█████████████████████████████████████████████▋                                    | 11203/20117 [7:05:15<6:41:06,  2.70s/it] 56%|█████████████████████████████████████████████▋                                    | 11204/20117 [7:05:18<6:37:09,  2.67s/it] 56%|█████████████████████████████████████████████▋                                    | 11205/20117 [7:05:20<6:29:48,  2.62s/it] 56%|█████████████████████████████████████████████▋                                    | 11206/20117 [7:05:23<6:25:23,  2.59s/it] 56%|█████████████████████████████████████████████▋                                    | 11207/20117 [7:05:26<6:21:15,  2.57s/it] 56%|█████████████████████████████████████████████▋                                    | 11208/20117 [7:05:28<6:24:05,  2.59s/it] 56%|█████████████████████████████████████████████▋                                    | 11209/20117 [7:05:30<6:13:10,  2.51s/it] 56%|█████████████████████████████████████████████▋                                    | 11210/20117 [7:05:33<6:04:32,  2.46s/it]                                                                                                                                 {'loss': 0.1784, 'grad_norm': 0.5480899214744568, 'learning_rate': 8.281381551203e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.42, 'epoch': 1.11}
 56%|█████████████████████████████████████████████▋                                    | 11210/20117 [7:05:33<6:04:32,  2.46s/it] 56%|█████████████████████████████████████████████▋                                    | 11211/20117 [7:05:35<6:01:05,  2.43s/it] 56%|█████████████████████████████████████████████▋                                    | 11212/20117 [7:05:38<6:02:11,  2.44s/it] 56%|█████████████████████████████████████████████▋                                    | 11213/20117 [7:05:40<5:55:50,  2.40s/it] 56%|█████████████████████████████████████████████▋                                    | 11214/20117 [7:05:42<5:53:33,  2.38s/it] 56%|█████████████████████████████████████████████▋                                    | 11215/20117 [7:05:45<5:59:04,  2.42s/it] 56%|█████████████████████████████████████████████▋                                    | 11216/20117 [7:05:47<6:03:27,  2.45s/it] 56%|█████████████████████████████████████████████▋                                    | 11217/20117 [7:05:50<6:04:19,  2.46s/it] 56%|█████████████████████████████████████████████▋                                    | 11218/20117 [7:05:52<6:02:48,  2.45s/it] 56%|█████████████████████████████████████████████▋                                    | 11219/20117 [7:05:55<6:06:07,  2.47s/it] 56%|█████████████████████████████████████████████▋                                    | 11220/20117 [7:05:57<6:09:40,  2.49s/it]                                                                                                                                 {'loss': 0.1421, 'grad_norm': 0.3690171241760254, 'learning_rate': 8.265922570696336e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.8, 'epoch': 1.12}
 56%|█████████████████████████████████████████████▋                                    | 11220/20117 [7:05:57<6:09:40,  2.49s/it] 56%|█████████████████████████████████████████████▋                                    | 11221/20117 [7:06:00<6:11:00,  2.50s/it] 56%|█████████████████████████████████████████████▋                                    | 11222/20117 [7:06:02<6:03:59,  2.46s/it] 56%|█████████████████████████████████████████████▋                                    | 11223/20117 [7:06:05<6:04:46,  2.46s/it] 56%|█████████████████████████████████████████████▊                                    | 11224/20117 [7:06:07<6:00:01,  2.43s/it] 56%|█████████████████████████████████████████████▊                                    | 11225/20117 [7:06:09<5:57:09,  2.41s/it] 56%|█████████████████████████████████████████████▊                                    | 11226/20117 [7:06:12<6:00:24,  2.43s/it] 56%|█████████████████████████████████████████████▊                                    | 11227/20117 [7:06:14<6:06:31,  2.47s/it] 56%|█████████████████████████████████████████████▊                                    | 11228/20117 [7:06:17<6:09:03,  2.49s/it] 56%|█████████████████████████████████████████████▊                                    | 11229/20117 [7:06:19<6:11:55,  2.51s/it] 56%|█████████████████████████████████████████████▊                                    | 11230/20117 [7:06:22<6:10:59,  2.50s/it]                                                                                                                                 {'loss': 0.1304, 'grad_norm': 0.3193921744823456, 'learning_rate': 8.250467861588879e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.67, 'epoch': 1.12}
 56%|█████████████████████████████████████████████▊                                    | 11230/20117 [7:06:22<6:10:59,  2.50s/it] 56%|█████████████████████████████████████████████▊                                    | 11231/20117 [7:06:25<6:11:56,  2.51s/it] 56%|█████████████████████████████████████████████▊                                    | 11232/20117 [7:06:27<6:12:22,  2.51s/it] 56%|█████████████████████████████████████████████▊                                    | 11233/20117 [7:06:30<6:11:40,  2.51s/it] 56%|█████████████████████████████████████████████▊                                    | 11234/20117 [7:06:32<6:12:32,  2.52s/it] 56%|█████████████████████████████████████████████▊                                    | 11235/20117 [7:06:35<6:12:46,  2.52s/it] 56%|█████████████████████████████████████████████▊                                    | 11236/20117 [7:06:37<6:13:33,  2.52s/it] 56%|█████████████████████████████████████████████▊                                    | 11237/20117 [7:06:40<6:20:03,  2.57s/it] 56%|█████████████████████████████████████████████▊                                    | 11238/20117 [7:06:42<6:18:58,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11239/20117 [7:06:45<6:18:59,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11240/20117 [7:06:47<6:17:39,  2.55s/it]                                                                                                                                 {'loss': 0.1901, 'grad_norm': 0.4590331017971039, 'learning_rate': 8.235017461948858e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.35, 'epoch': 1.12}
 56%|█████████████████████████████████████████████▊                                    | 11240/20117 [7:06:47<6:17:39,  2.55s/it] 56%|█████████████████████████████████████████████▊                                    | 11241/20117 [7:06:50<6:18:36,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11242/20117 [7:06:53<6:17:58,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11243/20117 [7:06:55<6:18:40,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11244/20117 [7:06:58<6:18:06,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11245/20117 [7:07:00<6:18:33,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11246/20117 [7:07:03<6:17:08,  2.55s/it] 56%|█████████████████████████████████████████████▊                                    | 11247/20117 [7:07:05<6:18:26,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11248/20117 [7:07:08<6:17:54,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11249/20117 [7:07:11<6:22:11,  2.59s/it] 56%|█████████████████████████████████████████████▊                                    | 11250/20117 [7:07:13<6:20:32,  2.57s/it]                                                                                                                                 {'loss': 0.1895, 'grad_norm': 0.5001057386398315, 'learning_rate': 8.219571409833862e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.59, 'epoch': 1.12}
 56%|█████████████████████████████████████████████▊                                    | 11250/20117 [7:07:13<6:20:32,  2.57s/it] 56%|█████████████████████████████████████████████▊                                    | 11251/20117 [7:07:16<6:20:54,  2.58s/it] 56%|█████████████████████████████████████████████▊                                    | 11252/20117 [7:07:18<6:18:32,  2.56s/it] 56%|█████████████████████████████████████████████▊                                    | 11253/20117 [7:07:21<6:17:22,  2.55s/it] 56%|█████████████████████████████████████████████▊                                    | 11254/20117 [7:07:23<6:18:24,  2.56s/it] 56%|█████████████████████████████████████████████▉                                    | 11255/20117 [7:07:26<6:40:33,  2.71s/it] 56%|█████████████████████████████████████████████▉                                    | 11256/20117 [7:07:29<6:34:16,  2.67s/it] 56%|█████████████████████████████████████████████▉                                    | 11257/20117 [7:07:32<6:30:24,  2.64s/it] 56%|█████████████████████████████████████████████▉                                    | 11258/20117 [7:07:34<6:27:55,  2.63s/it] 56%|█████████████████████████████████████████████▉                                    | 11259/20117 [7:07:37<6:25:20,  2.61s/it] 56%|█████████████████████████████████████████████▉                                    | 11260/20117 [7:07:39<6:22:36,  2.59s/it]                                                                                                                                 {'loss': 0.1579, 'grad_norm': 0.5432204008102417, 'learning_rate': 8.204129743290783e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 294.03, 'epoch': 1.12}
 56%|█████████████████████████████████████████████▉                                    | 11260/20117 [7:07:39<6:22:36,  2.59s/it] 56%|█████████████████████████████████████████████▉                                    | 11261/20117 [7:07:42<6:22:00,  2.59s/it] 56%|█████████████████████████████████████████████▉                                    | 11262/20117 [7:07:44<6:20:44,  2.58s/it] 56%|█████████████████████████████████████████████▉                                    | 11263/20117 [7:07:47<6:20:49,  2.58s/it] 56%|█████████████████████████████████████████████▉                                    | 11264/20117 [7:07:50<6:21:31,  2.59s/it] 56%|█████████████████████████████████████████████▉                                    | 11265/20117 [7:07:52<6:22:15,  2.59s/it] 56%|█████████████████████████████████████████████▉                                    | 11266/20117 [7:07:55<6:18:38,  2.57s/it] 56%|█████████████████████████████████████████████▉                                    | 11267/20117 [7:07:57<6:19:10,  2.57s/it] 56%|█████████████████████████████████████████████▉                                    | 11268/20117 [7:08:00<6:19:02,  2.57s/it] 56%|█████████████████████████████████████████████▉                                    | 11269/20117 [7:08:02<6:21:55,  2.59s/it] 56%|█████████████████████████████████████████████▉                                    | 11270/20117 [7:08:05<6:23:16,  2.60s/it]                                                                                                                                 {'loss': 0.23, 'grad_norm': 0.4477890431880951, 'learning_rate': 8.188692500355716e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.24, 'epoch': 1.12}
 56%|█████████████████████████████████████████████▉                                    | 11270/20117 [7:08:05<6:23:16,  2.60s/it] 56%|█████████████████████████████████████████████▉                                    | 11271/20117 [7:08:08<6:21:30,  2.59s/it] 56%|█████████████████████████████████████████████▉                                    | 11272/20117 [7:08:10<6:20:33,  2.58s/it] 56%|█████████████████████████████████████████████▉                                    | 11273/20117 [7:08:13<6:17:41,  2.56s/it] 56%|█████████████████████████████████████████████▉                                    | 11274/20117 [7:08:15<6:18:56,  2.57s/it] 56%|█████████████████████████████████████████████▉                                    | 11275/20117 [7:08:18<6:17:57,  2.56s/it] 56%|█████████████████████████████████████████████▉                                    | 11276/20117 [7:08:20<6:17:32,  2.56s/it] 56%|█████████████████████████████████████████████▉                                    | 11277/20117 [7:08:23<6:16:39,  2.56s/it] 56%|█████████████████████████████████████████████▉                                    | 11278/20117 [7:08:25<6:09:22,  2.51s/it] 56%|█████████████████████████████████████████████▉                                    | 11279/20117 [7:08:28<6:03:23,  2.47s/it] 56%|█████████████████████████████████████████████▉                                    | 11280/20117 [7:08:30<5:58:44,  2.44s/it]                                                                                                                                 {'loss': 0.1724, 'grad_norm': 0.5133992433547974, 'learning_rate': 8.173259719053847e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.58, 'epoch': 1.12}
 56%|█████████████████████████████████████████████▉                                    | 11280/20117 [7:08:30<5:58:44,  2.44s/it] 56%|█████████████████████████████████████████████▉                                    | 11281/20117 [7:08:33<6:04:00,  2.47s/it] 56%|█████████████████████████████████████████████▉                                    | 11282/20117 [7:08:35<6:07:06,  2.49s/it] 56%|█████████████████████████████████████████████▉                                    | 11283/20117 [7:08:38<6:08:30,  2.50s/it] 56%|█████████████████████████████████████████████▉                                    | 11284/20117 [7:08:40<6:06:37,  2.49s/it] 56%|█████████████████████████████████████████████▉                                    | 11285/20117 [7:08:43<6:05:54,  2.49s/it] 56%|██████████████████████████████████████████████                                    | 11286/20117 [7:08:45<6:07:55,  2.50s/it] 56%|██████████████████████████████████████████████                                    | 11287/20117 [7:08:48<6:08:10,  2.50s/it] 56%|██████████████████████████████████████████████                                    | 11288/20117 [7:08:50<6:09:21,  2.51s/it] 56%|██████████████████████████████████████████████                                    | 11289/20117 [7:08:53<6:06:21,  2.49s/it] 56%|██████████████████████████████████████████████                                    | 11290/20117 [7:08:55<6:02:38,  2.46s/it]                                                                                                                                 {'loss': 0.1225, 'grad_norm': 0.369495153427124, 'learning_rate': 8.157831437399383e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.16, 'epoch': 1.12}
 56%|██████████████████████████████████████████████                                    | 11290/20117 [7:08:55<6:02:38,  2.46s/it] 56%|██████████████████████████████████████████████                                    | 11291/20117 [7:08:58<6:06:28,  2.49s/it] 56%|██████████████████████████████████████████████                                    | 11292/20117 [7:09:00<6:08:57,  2.51s/it] 56%|██████████████████████████████████████████████                                    | 11293/20117 [7:09:03<6:11:22,  2.53s/it] 56%|██████████████████████████████████████████████                                    | 11294/20117 [7:09:05<6:13:55,  2.54s/it] 56%|██████████████████████████████████████████████                                    | 11295/20117 [7:09:08<6:14:39,  2.55s/it] 56%|██████████████████████████████████████████████                                    | 11296/20117 [7:09:10<6:15:19,  2.55s/it] 56%|██████████████████████████████████████████████                                    | 11297/20117 [7:09:13<6:15:02,  2.55s/it] 56%|██████████████████████████████████████████████                                    | 11298/20117 [7:09:16<6:15:57,  2.56s/it] 56%|██████████████████████████████████████████████                                    | 11299/20117 [7:09:18<6:14:29,  2.55s/it] 56%|██████████████████████████████████████████████                                    | 11300/20117 [7:09:21<6:15:23,  2.55s/it]                                                                                                                                 {'loss': 0.1445, 'grad_norm': 0.30792126059532166, 'learning_rate': 8.142407693395438e-05, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.05, 'epoch': 1.12}
 56%|██████████████████████████████████████████████                                    | 11300/20117 [7:09:21<6:15:23,  2.55s/it] 56%|██████████████████████████████████████████████                                    | 11301/20117 [7:09:23<6:15:46,  2.56s/it] 56%|██████████████████████████████████████████████                                    | 11302/20117 [7:09:26<6:19:52,  2.59s/it] 56%|██████████████████████████████████████████████                                    | 11303/20117 [7:09:28<6:20:29,  2.59s/it] 56%|██████████████████████████████████████████████                                    | 11304/20117 [7:09:31<6:16:35,  2.56s/it] 56%|██████████████████████████████████████████████                                    | 11305/20117 [7:09:34<6:17:08,  2.57s/it] 56%|██████████████████████████████████████████████                                    | 11306/20117 [7:09:37<6:38:10,  2.71s/it] 56%|██████████████████████████████████████████████                                    | 11307/20117 [7:09:39<6:33:00,  2.68s/it] 56%|██████████████████████████████████████████████                                    | 11308/20117 [7:09:42<6:24:53,  2.62s/it] 56%|██████████████████████████████████████████████                                    | 11309/20117 [7:09:44<6:26:12,  2.63s/it] 56%|██████████████████████████████████████████████                                    | 11310/20117 [7:09:47<6:25:50,  2.63s/it]                                                                                                                                 {'loss': 0.1675, 'grad_norm': 0.3198167085647583, 'learning_rate': 8.126988525033958e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.65, 'epoch': 1.12}
 56%|██████████████████████████████████████████████                                    | 11310/20117 [7:09:47<6:25:50,  2.63s/it] 56%|██████████████████████████████████████████████                                    | 11311/20117 [7:09:50<6:26:43,  2.63s/it] 56%|██████████████████████████████████████████████                                    | 11312/20117 [7:09:52<6:26:31,  2.63s/it] 56%|██████████████████████████████████████████████                                    | 11313/20117 [7:09:55<6:32:52,  2.68s/it] 56%|██████████████████████████████████████████████                                    | 11314/20117 [7:09:58<6:30:08,  2.66s/it] 56%|██████████████████████████████████████████████                                    | 11315/20117 [7:10:00<6:26:58,  2.64s/it] 56%|██████████████████████████████████████████████▏                                   | 11316/20117 [7:10:03<6:24:54,  2.62s/it] 56%|██████████████████████████████████████████████▏                                   | 11317/20117 [7:10:05<6:22:58,  2.61s/it] 56%|██████████████████████████████████████████████▏                                   | 11318/20117 [7:10:08<6:22:18,  2.61s/it] 56%|██████████████████████████████████████████████▏                                   | 11319/20117 [7:10:11<6:22:32,  2.61s/it] 56%|██████████████████████████████████████████████▏                                   | 11320/20117 [7:10:13<6:22:46,  2.61s/it]                                                                                                                                 {'loss': 0.2455, 'grad_norm': 0.6837435960769653, 'learning_rate': 8.111573970295607e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.62, 'epoch': 1.13}
 56%|██████████████████████████████████████████████▏                                   | 11320/20117 [7:10:13<6:22:46,  2.61s/it] 56%|██████████████████████████████████████████████▏                                   | 11321/20117 [7:10:16<6:22:03,  2.61s/it] 56%|██████████████████████████████████████████████▏                                   | 11322/20117 [7:10:18<6:23:09,  2.61s/it] 56%|██████████████████████████████████████████████▏                                   | 11323/20117 [7:10:21<6:23:31,  2.62s/it] 56%|██████████████████████████████████████████████▏                                   | 11324/20117 [7:10:24<6:24:59,  2.63s/it] 56%|██████████████████████████████████████████████▏                                   | 11325/20117 [7:10:26<6:23:04,  2.61s/it] 56%|██████████████████████████████████████████████▏                                   | 11326/20117 [7:10:29<6:27:10,  2.64s/it] 56%|██████████████████████████████████████████████▏                                   | 11327/20117 [7:10:32<6:29:14,  2.66s/it] 56%|██████████████████████████████████████████████▏                                   | 11328/20117 [7:10:34<6:26:39,  2.64s/it] 56%|██████████████████████████████████████████████▏                                   | 11329/20117 [7:10:37<6:25:17,  2.63s/it] 56%|██████████████████████████████████████████████▏                                   | 11330/20117 [7:10:40<6:23:25,  2.62s/it]                                                                                                                                 {'loss': 0.1768, 'grad_norm': 0.5170619487762451, 'learning_rate': 8.096164067149701e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 287.06, 'epoch': 1.13}
 56%|██████████████████████████████████████████████▏                                   | 11330/20117 [7:10:40<6:23:25,  2.62s/it] 56%|██████████████████████████████████████████████▏                                   | 11331/20117 [7:10:42<6:24:28,  2.63s/it] 56%|██████████████████████████████████████████████▏                                   | 11332/20117 [7:10:45<6:26:47,  2.64s/it] 56%|██████████████████████████████████████████████▏                                   | 11333/20117 [7:10:48<6:27:53,  2.65s/it] 56%|██████████████████████████████████████████████▏                                   | 11334/20117 [7:10:50<6:27:36,  2.65s/it] 56%|██████████████████████████████████████████████▏                                   | 11335/20117 [7:10:53<6:37:53,  2.72s/it] 56%|██████████████████████████████████████████████▏                                   | 11336/20117 [7:10:56<6:31:00,  2.67s/it] 56%|██████████████████████████████████████████████▏                                   | 11337/20117 [7:10:58<6:28:04,  2.65s/it] 56%|██████████████████████████████████████████████▏                                   | 11338/20117 [7:11:01<6:26:38,  2.64s/it] 56%|██████████████████████████████████████████████▏                                   | 11339/20117 [7:11:03<6:25:12,  2.63s/it] 56%|██████████████████████████████████████████████▏                                   | 11340/20117 [7:11:06<6:25:25,  2.63s/it]                                                                                                                                 {'loss': 0.1772, 'grad_norm': 0.4704976975917816, 'learning_rate': 8.080758853554075e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 308.38, 'epoch': 1.13}
 56%|██████████████████████████████████████████████▏                                   | 11340/20117 [7:11:06<6:25:25,  2.63s/it] 56%|██████████████████████████████████████████████▏                                   | 11341/20117 [7:11:09<6:26:16,  2.64s/it] 56%|██████████████████████████████████████████████▏                                   | 11342/20117 [7:11:11<6:24:17,  2.63s/it] 56%|██████████████████████████████████████████████▏                                   | 11343/20117 [7:11:14<6:23:01,  2.62s/it] 56%|██████████████████████████████████████████████▏                                   | 11344/20117 [7:11:17<6:23:22,  2.62s/it] 56%|██████████████████████████████████████████████▏                                   | 11345/20117 [7:11:19<6:23:15,  2.62s/it] 56%|██████████████████████████████████████████████▏                                   | 11346/20117 [7:11:22<6:21:43,  2.61s/it] 56%|██████████████████████████████████████████████▎                                   | 11347/20117 [7:11:24<6:18:29,  2.59s/it] 56%|██████████████████████████████████████████████▎                                   | 11348/20117 [7:11:27<6:12:46,  2.55s/it] 56%|██████████████████████████████████████████████▎                                   | 11349/20117 [7:11:29<6:11:39,  2.54s/it] 56%|██████████████████████████████████████████████▎                                   | 11350/20117 [7:11:32<6:07:28,  2.51s/it]                                                                                                                                 {'loss': 0.1593, 'grad_norm': 0.5932011008262634, 'learning_rate': 8.065358367455038e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.84, 'epoch': 1.13}
 56%|██████████████████████████████████████████████▎                                   | 11350/20117 [7:11:32<6:07:28,  2.51s/it] 56%|██████████████████████████████████████████████▎                                   | 11351/20117 [7:11:34<6:02:43,  2.48s/it] 56%|██████████████████████████████████████████████▎                                   | 11352/20117 [7:11:37<6:00:26,  2.47s/it] 56%|██████████████████████████████████████████████▎                                   | 11353/20117 [7:11:39<6:02:08,  2.48s/it] 56%|██████████████████████████████████████████████▎                                   | 11354/20117 [7:11:42<6:03:19,  2.49s/it] 56%|██████████████████████████████████████████████▎                                   | 11355/20117 [7:11:44<6:01:05,  2.47s/it] 56%|██████████████████████████████████████████████▎                                   | 11356/20117 [7:11:46<5:58:06,  2.45s/it] 56%|██████████████████████████████████████████████▎                                   | 11357/20117 [7:11:49<5:58:38,  2.46s/it] 56%|██████████████████████████████████████████████▎                                   | 11358/20117 [7:11:51<5:59:23,  2.46s/it] 56%|██████████████████████████████████████████████▎                                   | 11359/20117 [7:11:54<5:57:55,  2.45s/it] 56%|██████████████████████████████████████████████▎                                   | 11360/20117 [7:11:57<6:20:05,  2.60s/it]                                                                                                                                 {'loss': 0.1394, 'grad_norm': 0.6632476449012756, 'learning_rate': 8.049962646787235e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 250.63, 'epoch': 1.13}
 56%|██████████████████████████████████████████████▎                                   | 11360/20117 [7:11:57<6:20:05,  2.60s/it] 56%|██████████████████████████████████████████████▎                                   | 11361/20117 [7:11:59<6:12:04,  2.55s/it] 56%|██████████████████████████████████████████████▎                                   | 11362/20117 [7:12:02<6:03:04,  2.49s/it] 56%|██████████████████████████████████████████████▎                                   | 11363/20117 [7:12:04<6:10:09,  2.54s/it] 56%|██████████████████████████████████████████████▎                                   | 11364/20117 [7:12:07<6:11:13,  2.54s/it] 56%|██████████████████████████████████████████████▎                                   | 11365/20117 [7:12:09<6:12:10,  2.55s/it] 56%|██████████████████████████████████████████████▎                                   | 11366/20117 [7:12:12<6:14:02,  2.56s/it] 57%|██████████████████████████████████████████████▎                                   | 11367/20117 [7:12:14<6:13:01,  2.56s/it] 57%|██████████████████████████████████████████████▎                                   | 11368/20117 [7:12:17<6:10:16,  2.54s/it] 57%|██████████████████████████████████████████████▎                                   | 11369/20117 [7:12:19<6:08:20,  2.53s/it] 57%|██████████████████████████████████████████████▎                                   | 11370/20117 [7:12:22<6:08:33,  2.53s/it]                                                                                                                                 {'loss': 0.1832, 'grad_norm': 0.5370444655418396, 'learning_rate': 8.034571729473587e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.76, 'epoch': 1.13}
 57%|██████████████████████████████████████████████▎                                   | 11370/20117 [7:12:22<6:08:33,  2.53s/it] 57%|██████████████████████████████████████████████▎                                   | 11371/20117 [7:12:25<6:10:34,  2.54s/it] 57%|██████████████████████████████████████████████▎                                   | 11372/20117 [7:12:27<6:10:08,  2.54s/it] 57%|██████████████████████████████████████████████▎                                   | 11373/20117 [7:12:30<6:12:18,  2.55s/it] 57%|██████████████████████████████████████████████▎                                   | 11374/20117 [7:12:32<6:11:31,  2.55s/it] 57%|██████████████████████████████████████████████▎                                   | 11375/20117 [7:12:35<6:12:29,  2.56s/it] 57%|██████████████████████████████████████████████▎                                   | 11376/20117 [7:12:37<6:12:23,  2.56s/it] 57%|██████████████████████████████████████████████▎                                   | 11377/20117 [7:12:40<6:12:25,  2.56s/it] 57%|██████████████████████████████████████████████▍                                   | 11378/20117 [7:12:43<6:30:00,  2.68s/it] 57%|██████████████████████████████████████████████▍                                   | 11379/20117 [7:12:46<6:37:07,  2.73s/it] 57%|██████████████████████████████████████████████▍                                   | 11380/20117 [7:12:48<6:27:43,  2.66s/it]                                                                                                                                 {'loss': 0.1373, 'grad_norm': 0.42281845211982727, 'learning_rate': 8.019185653425168e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 307.3, 'epoch': 1.13}
 57%|██████████████████████████████████████████████▍                                   | 11380/20117 [7:12:48<6:27:43,  2.66s/it] 57%|██████████████████████████████████████████████▍                                   | 11381/20117 [7:12:51<6:21:10,  2.62s/it] 57%|██████████████████████████████████████████████▍                                   | 11382/20117 [7:12:53<6:17:28,  2.59s/it] 57%|██████████████████████████████████████████████▍                                   | 11383/20117 [7:12:56<6:13:34,  2.57s/it] 57%|██████████████████████████████████████████████▍                                   | 11384/20117 [7:12:58<6:12:22,  2.56s/it] 57%|██████████████████████████████████████████████▍                                   | 11385/20117 [7:13:01<6:09:58,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11386/20117 [7:13:03<6:09:41,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11387/20117 [7:13:06<6:09:43,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11388/20117 [7:13:08<6:11:11,  2.55s/it] 57%|██████████████████████████████████████████████▍                                   | 11389/20117 [7:13:11<6:09:33,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11390/20117 [7:13:14<6:09:57,  2.54s/it]                                                                                                                                 {'loss': 0.1607, 'grad_norm': 0.27983081340789795, 'learning_rate': 8.00380445654114e-05, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.95, 'epoch': 1.13}
 57%|██████████████████████████████████████████████▍                                   | 11390/20117 [7:13:14<6:09:57,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11391/20117 [7:13:16<6:11:20,  2.55s/it] 57%|██████████████████████████████████████████████▍                                   | 11392/20117 [7:13:19<6:12:24,  2.56s/it] 57%|██████████████████████████████████████████████▍                                   | 11393/20117 [7:13:21<6:09:43,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11394/20117 [7:13:24<6:11:44,  2.56s/it] 57%|██████████████████████████████████████████████▍                                   | 11395/20117 [7:13:26<6:14:59,  2.58s/it] 57%|██████████████████████████████████████████████▍                                   | 11396/20117 [7:13:29<6:16:06,  2.59s/it] 57%|██████████████████████████████████████████████▍                                   | 11397/20117 [7:13:32<6:14:20,  2.58s/it] 57%|██████████████████████████████████████████████▍                                   | 11398/20117 [7:13:34<6:13:11,  2.57s/it] 57%|██████████████████████████████████████████████▍                                   | 11399/20117 [7:13:37<6:12:46,  2.57s/it] 57%|██████████████████████████████████████████████▍                                   | 11400/20117 [7:13:39<6:09:14,  2.54s/it]                                                                                                                                 {'loss': 0.2021, 'grad_norm': 0.6440351009368896, 'learning_rate': 7.988428176708644e-05, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.31, 'epoch': 1.13}
 57%|██████████████████████████████████████████████▍                                   | 11400/20117 [7:13:39<6:09:14,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11401/20117 [7:13:42<6:12:40,  2.57s/it] 57%|██████████████████████████████████████████████▍                                   | 11402/20117 [7:13:44<6:08:33,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11403/20117 [7:13:47<6:07:32,  2.53s/it] 57%|██████████████████████████████████████████████▍                                   | 11404/20117 [7:13:49<6:09:06,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11405/20117 [7:13:52<6:09:18,  2.54s/it] 57%|██████████████████████████████████████████████▍                                   | 11406/20117 [7:13:54<6:10:23,  2.55s/it] 57%|██████████████████████████████████████████████▍                                   | 11407/20117 [7:13:57<6:12:29,  2.57s/it] 57%|██████████████████████████████████████████████▌                                   | 11408/20117 [7:14:00<6:13:58,  2.58s/it] 57%|██████████████████████████████████████████████▌                                   | 11409/20117 [7:14:02<6:18:08,  2.61s/it] 57%|██████████████████████████████████████████████▌                                   | 11410/20117 [7:14:05<6:16:50,  2.60s/it]                                                                                                                                 {'loss': 0.2105, 'grad_norm': 0.4090474843978882, 'learning_rate': 7.9730568518027e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.86, 'epoch': 1.13}
 57%|██████████████████████████████████████████████▌                                   | 11410/20117 [7:14:05<6:16:50,  2.60s/it] 57%|██████████████████████████████████████████████▌                                   | 11411/20117 [7:14:08<6:18:20,  2.61s/it] 57%|██████████████████████████████████████████████▌                                   | 11412/20117 [7:14:11<6:37:14,  2.74s/it] 57%|██████████████████████████████████████████████▌                                   | 11413/20117 [7:14:13<6:31:43,  2.70s/it] 57%|██████████████████████████████████████████████▌                                   | 11414/20117 [7:14:16<6:30:26,  2.69s/it] 57%|██████████████████████████████████████████████▌                                   | 11415/20117 [7:14:18<6:24:02,  2.65s/it] 57%|██████████████████████████████████████████████▌                                   | 11416/20117 [7:14:21<6:21:35,  2.63s/it] 57%|██████████████████████████████████████████████▌                                   | 11417/20117 [7:14:24<6:17:29,  2.60s/it] 57%|██████████████████████████████████████████████▌                                   | 11418/20117 [7:14:26<6:17:24,  2.60s/it] 57%|██████████████████████████████████████████████▌                                   | 11419/20117 [7:14:29<6:12:30,  2.57s/it] 57%|██████████████████████████████████████████████▌                                   | 11420/20117 [7:14:31<6:05:01,  2.52s/it]                                                                                                                                 {'loss': 0.1956, 'grad_norm': 0.2903016209602356, 'learning_rate': 7.957690519686137e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.76, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▌                                   | 11420/20117 [7:14:31<6:05:01,  2.52s/it] 57%|██████████████████████████████████████████████▌                                   | 11421/20117 [7:14:33<5:57:11,  2.46s/it] 57%|██████████████████████████████████████████████▌                                   | 11422/20117 [7:14:36<5:59:43,  2.48s/it] 57%|██████████████████████████████████████████████▌                                   | 11423/20117 [7:14:38<6:01:55,  2.50s/it] 57%|██████████████████████████████████████████████▌                                   | 11424/20117 [7:14:41<6:03:38,  2.51s/it] 57%|██████████████████████████████████████████████▌                                   | 11425/20117 [7:14:43<6:00:19,  2.49s/it] 57%|██████████████████████████████████████████████▌                                   | 11426/20117 [7:14:46<6:01:57,  2.50s/it] 57%|██████████████████████████████████████████████▌                                   | 11427/20117 [7:14:48<6:02:09,  2.50s/it] 57%|██████████████████████████████████████████████▌                                   | 11428/20117 [7:14:51<5:59:16,  2.48s/it] 57%|██████████████████████████████████████████████▌                                   | 11429/20117 [7:14:53<6:00:29,  2.49s/it] 57%|██████████████████████████████████████████████▌                                   | 11430/20117 [7:14:56<6:05:15,  2.52s/it]                                                                                                                                 {'loss': 0.1982, 'grad_norm': 0.5340930223464966, 'learning_rate': 7.942329218209474e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.98, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▌                                   | 11430/20117 [7:14:56<6:05:15,  2.52s/it] 57%|██████████████████████████████████████████████▌                                   | 11431/20117 [7:14:58<6:00:55,  2.49s/it] 57%|██████████████████████████████████████████████▌                                   | 11432/20117 [7:15:01<5:58:52,  2.48s/it] 57%|██████████████████████████████████████████████▌                                   | 11433/20117 [7:15:03<6:02:42,  2.51s/it] 57%|██████████████████████████████████████████████▌                                   | 11434/20117 [7:15:06<6:04:02,  2.52s/it] 57%|██████████████████████████████████████████████▌                                   | 11435/20117 [7:15:08<6:01:35,  2.50s/it] 57%|██████████████████████████████████████████████▌                                   | 11436/20117 [7:15:11<6:03:53,  2.52s/it] 57%|██████████████████████████████████████████████▌                                   | 11437/20117 [7:15:14<6:05:31,  2.53s/it] 57%|██████████████████████████████████████████████▌                                   | 11438/20117 [7:15:16<6:07:40,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11439/20117 [7:15:19<6:08:13,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11440/20117 [7:15:21<6:07:24,  2.54s/it]                                                                                                                                 {'loss': 0.1499, 'grad_norm': 0.515466570854187, 'learning_rate': 7.926972985210848e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 289.34, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▋                                   | 11440/20117 [7:15:21<6:07:24,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11441/20117 [7:15:24<6:07:30,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11442/20117 [7:15:26<6:07:14,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11443/20117 [7:15:29<6:11:22,  2.57s/it] 57%|██████████████████████████████████████████████▋                                   | 11444/20117 [7:15:32<6:13:41,  2.59s/it] 57%|██████████████████████████████████████████████▋                                   | 11445/20117 [7:15:34<6:10:02,  2.56s/it] 57%|██████████████████████████████████████████████▋                                   | 11446/20117 [7:15:37<6:08:15,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11447/20117 [7:15:39<6:08:22,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11448/20117 [7:15:42<6:07:22,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11449/20117 [7:15:44<6:06:32,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11450/20117 [7:15:47<6:07:18,  2.54s/it]                                                                                                                                 {'loss': 0.1676, 'grad_norm': 0.29830217361450195, 'learning_rate': 7.911621858515901e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.42, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▋                                   | 11450/20117 [7:15:47<6:07:18,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11451/20117 [7:15:49<6:04:47,  2.53s/it] 57%|██████████████████████████████████████████████▋                                   | 11452/20117 [7:15:52<6:05:56,  2.53s/it] 57%|██████████████████████████████████████████████▋                                   | 11453/20117 [7:15:54<6:10:06,  2.56s/it] 57%|██████████████████████████████████████████████▋                                   | 11454/20117 [7:15:57<6:09:07,  2.56s/it] 57%|██████████████████████████████████████████████▋                                   | 11455/20117 [7:15:59<6:07:39,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11456/20117 [7:16:02<6:08:18,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11457/20117 [7:16:05<6:09:25,  2.56s/it] 57%|██████████████████████████████████████████████▋                                   | 11458/20117 [7:16:07<6:09:22,  2.56s/it] 57%|██████████████████████████████████████████████▋                                   | 11459/20117 [7:16:10<6:08:53,  2.56s/it] 57%|██████████████████████████████████████████████▋                                   | 11460/20117 [7:16:12<6:08:05,  2.55s/it]                                                                                                                                 {'loss': 0.1715, 'grad_norm': 0.4327114224433899, 'learning_rate': 7.896275875937709e-05, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.12, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▋                                   | 11460/20117 [7:16:12<6:08:05,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11461/20117 [7:16:15<6:08:48,  2.56s/it] 57%|██████████████████████████████████████████████▋                                   | 11462/20117 [7:16:17<6:07:59,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11463/20117 [7:16:20<6:07:20,  2.55s/it] 57%|██████████████████████████████████████████████▋                                   | 11464/20117 [7:16:22<6:06:51,  2.54s/it] 57%|██████████████████████████████████████████████▋                                   | 11465/20117 [7:16:25<6:24:49,  2.67s/it] 57%|██████████████████████████████████████████████▋                                   | 11466/20117 [7:16:28<6:18:12,  2.62s/it] 57%|██████████████████████████████████████████████▋                                   | 11467/20117 [7:16:30<6:14:31,  2.60s/it] 57%|██████████████████████████████████████████████▋                                   | 11468/20117 [7:16:33<6:13:39,  2.59s/it] 57%|██████████████████████████████████████████████▋                                   | 11469/20117 [7:16:36<6:14:40,  2.60s/it] 57%|██████████████████████████████████████████████▊                                   | 11470/20117 [7:16:38<6:14:36,  2.60s/it]                                                                                                                                 {'loss': 0.1645, 'grad_norm': 0.5234288573265076, 'learning_rate': 7.880935075276663e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 263.98, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▊                                   | 11470/20117 [7:16:38<6:14:36,  2.60s/it] 57%|██████████████████████████████████████████████▊                                   | 11471/20117 [7:16:41<6:12:31,  2.59s/it] 57%|██████████████████████████████████████████████▊                                   | 11472/20117 [7:16:43<6:09:31,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11473/20117 [7:16:46<6:11:09,  2.58s/it] 57%|██████████████████████████████████████████████▊                                   | 11474/20117 [7:16:48<6:09:59,  2.57s/it] 57%|██████████████████████████████████████████████▊                                   | 11475/20117 [7:16:51<6:08:34,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11476/20117 [7:16:54<6:07:25,  2.55s/it] 57%|██████████████████████████████████████████████▊                                   | 11477/20117 [7:16:56<6:07:16,  2.55s/it] 57%|██████████████████████████████████████████████▊                                   | 11478/20117 [7:16:59<6:09:29,  2.57s/it] 57%|██████████████████████████████████████████████▊                                   | 11479/20117 [7:17:01<6:09:08,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11480/20117 [7:17:04<6:08:57,  2.56s/it]                                                                                                                                 {'loss': 0.1548, 'grad_norm': 0.35436221957206726, 'learning_rate': 7.865599494320402e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 328.71, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▊                                   | 11480/20117 [7:17:04<6:08:57,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11481/20117 [7:17:06<6:07:52,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11482/20117 [7:17:09<6:08:02,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11483/20117 [7:17:11<6:08:00,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11484/20117 [7:17:14<6:05:29,  2.54s/it] 57%|██████████████████████████████████████████████▊                                   | 11485/20117 [7:17:16<6:05:25,  2.54s/it] 57%|██████████████████████████████████████████████▊                                   | 11486/20117 [7:17:19<6:04:31,  2.53s/it] 57%|██████████████████████████████████████████████▊                                   | 11487/20117 [7:17:22<6:03:34,  2.53s/it] 57%|██████████████████████████████████████████████▊                                   | 11488/20117 [7:17:24<5:59:42,  2.50s/it] 57%|██████████████████████████████████████████████▊                                   | 11489/20117 [7:17:26<5:53:08,  2.46s/it] 57%|██████████████████████████████████████████████▊                                   | 11490/20117 [7:17:29<5:47:58,  2.42s/it]                                                                                                                                 {'loss': 0.1843, 'grad_norm': 0.5308985710144043, 'learning_rate': 7.850269170843702e-05, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.98, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▊                                   | 11490/20117 [7:17:29<5:47:58,  2.42s/it] 57%|██████████████████████████████████████████████▊                                   | 11491/20117 [7:17:31<5:49:10,  2.43s/it] 57%|██████████████████████████████████████████████▊                                   | 11492/20117 [7:17:34<5:53:48,  2.46s/it] 57%|██████████████████████████████████████████████▊                                   | 11493/20117 [7:17:36<5:57:09,  2.48s/it] 57%|██████████████████████████████████████████████▊                                   | 11494/20117 [7:17:39<5:55:10,  2.47s/it] 57%|██████████████████████████████████████████████▊                                   | 11495/20117 [7:17:41<5:57:35,  2.49s/it] 57%|██████████████████████████████████████████████▊                                   | 11496/20117 [7:17:44<5:53:53,  2.46s/it] 57%|██████████████████████████████████████████████▊                                   | 11497/20117 [7:17:46<6:00:06,  2.51s/it] 57%|██████████████████████████████████████████████▊                                   | 11498/20117 [7:17:49<6:07:39,  2.56s/it] 57%|██████████████████████████████████████████████▊                                   | 11499/20117 [7:17:51<6:05:01,  2.54s/it] 57%|██████████████████████████████████████████████▉                                   | 11500/20117 [7:17:54<5:59:26,  2.50s/it]                                                                                                                                 {'loss': 0.2258, 'grad_norm': 0.4634507894515991, 'learning_rate': 7.834944142608394e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 409.58, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▉                                   | 11500/20117 [7:17:54<5:59:26,  2.50s/it] 57%|██████████████████████████████████████████████▉                                   | 11501/20117 [7:17:56<5:52:42,  2.46s/it] 57%|██████████████████████████████████████████████▉                                   | 11502/20117 [7:17:59<5:56:32,  2.48s/it] 57%|██████████████████████████████████████████████▉                                   | 11503/20117 [7:18:01<6:09:08,  2.57s/it] 57%|██████████████████████████████████████████████▉                                   | 11504/20117 [7:18:04<6:08:22,  2.57s/it] 57%|██████████████████████████████████████████████▉                                   | 11505/20117 [7:18:07<6:07:34,  2.56s/it] 57%|██████████████████████████████████████████████▉                                   | 11506/20117 [7:18:09<6:06:38,  2.55s/it] 57%|██████████████████████████████████████████████▉                                   | 11507/20117 [7:18:12<6:06:02,  2.55s/it] 57%|██████████████████████████████████████████████▉                                   | 11508/20117 [7:18:14<6:07:37,  2.56s/it] 57%|██████████████████████████████████████████████▉                                   | 11509/20117 [7:18:17<6:06:52,  2.56s/it] 57%|██████████████████████████████████████████████▉                                   | 11510/20117 [7:18:19<6:06:26,  2.55s/it]                                                                                                                                 {'loss': 0.1471, 'grad_norm': 0.5410236716270447, 'learning_rate': 7.819624447363252e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 252.15, 'epoch': 1.14}
 57%|██████████████████████████████████████████████▉                                   | 11510/20117 [7:18:19<6:06:26,  2.55s/it] 57%|██████████████████████████████████████████████▉                                   | 11511/20117 [7:18:22<6:07:32,  2.56s/it] 57%|██████████████████████████████████████████████▉                                   | 11512/20117 [7:18:24<6:07:45,  2.56s/it] 57%|██████████████████████████████████████████████▉                                   | 11513/20117 [7:18:27<6:17:19,  2.63s/it] 57%|██████████████████████████████████████████████▉                                   | 11514/20117 [7:18:30<6:13:47,  2.61s/it] 57%|██████████████████████████████████████████████▉                                   | 11515/20117 [7:18:32<6:13:46,  2.61s/it] 57%|██████████████████████████████████████████████▉                                   | 11516/20117 [7:18:35<6:10:53,  2.59s/it] 57%|██████████████████████████████████████████████▉                                   | 11517/20117 [7:18:37<6:08:06,  2.57s/it] 57%|██████████████████████████████████████████████▉                                   | 11518/20117 [7:18:40<6:08:44,  2.57s/it] 57%|██████████████████████████████████████████████▉                                   | 11519/20117 [7:18:43<6:26:01,  2.69s/it] 57%|██████████████████████████████████████████████▉                                   | 11520/20117 [7:18:46<6:21:25,  2.66s/it]                                                                                                                                 {'loss': 0.1358, 'grad_norm': 0.3912404477596283, 'learning_rate': 7.80431012284393e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.18, 'epoch': 1.15}
 57%|██████████████████████████████████████████████▉                                   | 11520/20117 [7:18:46<6:21:25,  2.66s/it] 57%|██████████████████████████████████████████████▉                                   | 11521/20117 [7:18:48<6:18:21,  2.64s/it] 57%|██████████████████████████████████████████████▉                                   | 11522/20117 [7:18:51<6:16:36,  2.63s/it] 57%|██████████████████████████████████████████████▉                                   | 11523/20117 [7:18:53<6:14:12,  2.61s/it] 57%|██████████████████████████████████████████████▉                                   | 11524/20117 [7:18:56<6:13:15,  2.61s/it] 57%|██████████████████████████████████████████████▉                                   | 11525/20117 [7:18:59<6:13:18,  2.61s/it] 57%|██████████████████████████████████████████████▉                                   | 11526/20117 [7:19:01<6:10:41,  2.59s/it] 57%|██████████████████████████████████████████████▉                                   | 11527/20117 [7:19:04<6:08:32,  2.57s/it] 57%|██████████████████████████████████████████████▉                                   | 11528/20117 [7:19:06<6:06:35,  2.56s/it] 57%|██████████████████████████████████████████████▉                                   | 11529/20117 [7:19:09<6:06:17,  2.56s/it] 57%|██████████████████████████████████████████████▉                                   | 11530/20117 [7:19:11<6:15:01,  2.62s/it]                                                                                                                                 {'loss': 0.1441, 'grad_norm': 0.2692343592643738, 'learning_rate': 7.789001206772849e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 230.07, 'epoch': 1.15}
 57%|██████████████████████████████████████████████▉                                   | 11530/20117 [7:19:11<6:15:01,  2.62s/it] 57%|███████████████████████████████████████████████                                   | 11531/20117 [7:19:14<6:12:00,  2.60s/it] 57%|███████████████████████████████████████████████                                   | 11532/20117 [7:19:17<6:07:40,  2.57s/it] 57%|███████████████████████████████████████████████                                   | 11533/20117 [7:19:19<6:06:21,  2.56s/it] 57%|███████████████████████████████████████████████                                   | 11534/20117 [7:19:22<6:12:14,  2.60s/it] 57%|███████████████████████████████████████████████                                   | 11535/20117 [7:19:24<6:14:22,  2.62s/it] 57%|███████████████████████████████████████████████                                   | 11536/20117 [7:19:27<6:16:35,  2.63s/it] 57%|███████████████████████████████████████████████                                   | 11537/20117 [7:19:30<6:16:37,  2.63s/it] 57%|███████████████████████████████████████████████                                   | 11538/20117 [7:19:32<6:13:35,  2.61s/it] 57%|███████████████████████████████████████████████                                   | 11539/20117 [7:19:35<6:10:11,  2.59s/it] 57%|███████████████████████████████████████████████                                   | 11540/20117 [7:19:37<6:07:10,  2.57s/it]                                                                                                                                 {'loss': 0.1424, 'grad_norm': 0.44046950340270996, 'learning_rate': 7.773697736859098e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.3, 'epoch': 1.15}
 57%|███████████████████████████████████████████████                                   | 11540/20117 [7:19:37<6:07:10,  2.57s/it] 57%|███████████████████████████████████████████████                                   | 11541/20117 [7:19:40<6:05:32,  2.56s/it] 57%|███████████████████████████████████████████████                                   | 11542/20117 [7:19:42<6:05:56,  2.56s/it] 57%|███████████████████████████████████████████████                                   | 11543/20117 [7:19:45<6:05:27,  2.56s/it] 57%|███████████████████████████████████████████████                                   | 11544/20117 [7:19:48<6:05:02,  2.55s/it] 57%|███████████████████████████████████████████████                                   | 11545/20117 [7:19:50<6:02:40,  2.54s/it] 57%|███████████████████████████████████████████████                                   | 11546/20117 [7:19:53<6:05:00,  2.56s/it] 57%|███████████████████████████████████████████████                                   | 11547/20117 [7:19:55<6:05:00,  2.56s/it] 57%|███████████████████████████████████████████████                                   | 11548/20117 [7:19:58<6:06:47,  2.57s/it] 57%|███████████████████████████████████████████████                                   | 11549/20117 [7:20:00<6:08:55,  2.58s/it] 57%|███████████████████████████████████████████████                                   | 11550/20117 [7:20:03<6:08:27,  2.58s/it]                                                                                                                                 {'loss': 0.194, 'grad_norm': 0.48867931962013245, 'learning_rate': 7.758399750798364e-05, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 280.59, 'epoch': 1.15}
 57%|███████████████████████████████████████████████                                   | 11550/20117 [7:20:03<6:08:27,  2.58s/it] 57%|███████████████████████████████████████████████                                   | 11551/20117 [7:20:06<6:09:55,  2.59s/it] 57%|███████████████████████████████████████████████                                   | 11552/20117 [7:20:08<6:08:15,  2.58s/it] 57%|███████████████████████████████████████████████                                   | 11553/20117 [7:20:11<6:10:12,  2.59s/it] 57%|███████████████████████████████████████████████                                   | 11554/20117 [7:20:14<6:22:29,  2.68s/it] 57%|███████████████████████████████████████████████                                   | 11555/20117 [7:20:17<6:28:52,  2.73s/it] 57%|███████████████████████████████████████████████                                   | 11556/20117 [7:20:19<6:22:15,  2.68s/it] 57%|███████████████████████████████████████████████                                   | 11557/20117 [7:20:22<6:13:16,  2.62s/it] 57%|███████████████████████████████████████████████                                   | 11558/20117 [7:20:24<6:11:14,  2.60s/it] 57%|███████████████████████████████████████████████                                   | 11559/20117 [7:20:27<6:09:46,  2.59s/it] 57%|███████████████████████████████████████████████                                   | 11560/20117 [7:20:29<6:07:52,  2.58s/it]                                                                                                                                 {'loss': 0.1481, 'grad_norm': 0.3554067313671112, 'learning_rate': 7.743107286272812e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.0, 'epoch': 1.15}
 57%|███████████████████████████████████████████████                                   | 11560/20117 [7:20:29<6:07:52,  2.58s/it] 57%|███████████████████████████████████████████████                                   | 11561/20117 [7:20:32<6:11:01,  2.60s/it] 57%|███████████████████████████████████████████████▏                                  | 11562/20117 [7:20:34<6:08:54,  2.59s/it] 57%|███████████████████████████████████████████████▏                                  | 11563/20117 [7:20:37<6:05:15,  2.56s/it] 57%|███████████████████████████████████████████████▏                                  | 11564/20117 [7:20:39<5:58:41,  2.52s/it] 57%|███████████████████████████████████████████████▏                                  | 11565/20117 [7:20:42<5:55:53,  2.50s/it] 57%|███████████████████████████████████████████████▏                                  | 11566/20117 [7:20:44<5:57:29,  2.51s/it] 57%|███████████████████████████████████████████████▏                                  | 11567/20117 [7:20:47<5:59:45,  2.52s/it] 58%|███████████████████████████████████████████████▏                                  | 11568/20117 [7:20:49<6:02:29,  2.54s/it] 58%|███████████████████████████████████████████████▏                                  | 11569/20117 [7:20:52<6:04:55,  2.56s/it] 58%|███████████████████████████████████████████████▏                                  | 11570/20117 [7:20:55<6:04:22,  2.56s/it]                                                                                                                                 {'loss': 0.1755, 'grad_norm': 0.4472406506538391, 'learning_rate': 7.727820380951022e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 291.81, 'epoch': 1.15}
 58%|███████████████████████████████████████████████▏                                  | 11570/20117 [7:20:55<6:04:22,  2.56s/it] 58%|███████████████████████████████████████████████▏                                  | 11571/20117 [7:20:58<6:24:23,  2.70s/it] 58%|███████████████████████████████████████████████▏                                  | 11572/20117 [7:21:00<6:15:58,  2.64s/it] 58%|███████████████████████████████████████████████▏                                  | 11573/20117 [7:21:03<6:11:43,  2.61s/it] 58%|███████████████████████████████████████████████▏                                  | 11574/20117 [7:21:05<6:13:16,  2.62s/it] 58%|███████████████████████████████████████████████▏                                  | 11575/20117 [7:21:08<6:09:15,  2.59s/it] 58%|███████████████████████████████████████████████▏                                  | 11576/20117 [7:21:10<6:08:36,  2.59s/it] 58%|███████████████████████████████████████████████▏                                  | 11577/20117 [7:21:13<6:06:29,  2.57s/it] 58%|███████████████████████████████████████████████▏                                  | 11578/20117 [7:21:16<6:04:33,  2.56s/it] 58%|███████████████████████████████████████████████▏                                  | 11579/20117 [7:21:18<6:02:24,  2.55s/it] 58%|███████████████████████████████████████████████▏                                  | 11580/20117 [7:21:21<6:09:25,  2.60s/it]                                                                                                                                 {'loss': 0.18, 'grad_norm': 0.47527438402175903, 'learning_rate': 7.712539072487867e-05, 'memory/max_active (GiB)': 19.99, 'memory/max_allocated (GiB)': 19.99, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.04, 'epoch': 1.15}
 58%|███████████████████████████████████████████████▏                                  | 11580/20117 [7:21:21<6:09:25,  2.60s/it] 58%|███████████████████████████████████████████████▏                                  | 11581/20117 [7:21:23<6:12:32,  2.62s/it] 58%|███████████████████████████████████████████████▏                                  | 11582/20117 [7:21:26<6:07:57,  2.59s/it] 58%|███████████████████████████████████████████████▏                                  | 11583/20117 [7:21:29<6:09:36,  2.60s/it] 58%|███████████████████████████████████████████████▏                                  | 11584/20117 [7:21:31<6:06:32,  2.58s/it] 58%|███████████████████████████████████████████████▏                                  | 11585/20117 [7:21:34<6:06:31,  2.58s/it] 58%|███████████████████████████████████████████████▏                                  | 11586/20117 [7:21:36<6:06:29,  2.58s/it] 58%|███████████████████████████████████████████████▏                                  | 11587/20117 [7:21:39<6:05:56,  2.57s/it] 58%|███████████████████████████████████████████████▏                                  | 11588/20117 [7:21:41<6:04:48,  2.57s/it] 58%|███████████████████████████████████████████████▏                                  | 11589/20117 [7:21:44<6:08:23,  2.59s/it] 58%|███████████████████████████████████████████████▏                                  | 11590/20117 [7:21:47<6:08:07,  2.59s/it]                                                                                                                                 {'loss': 0.1479, 'grad_norm': 0.4945676624774933, 'learning_rate': 7.697263398524448e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 271.43, 'epoch': 1.15}
 58%|███████████████████████████████████████████████▏                                  | 11590/20117 [7:21:47<6:08:07,  2.59s/it] 58%|███████████████████████████████████████████████▏                                  | 11591/20117 [7:21:49<6:07:10,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11592/20117 [7:21:52<6:05:00,  2.57s/it] 58%|███████████████████████████████████████████████▎                                  | 11593/20117 [7:21:54<6:06:40,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11594/20117 [7:21:57<6:06:24,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11595/20117 [7:22:00<6:07:22,  2.59s/it] 58%|███████████████████████████████████████████████▎                                  | 11596/20117 [7:22:02<6:08:44,  2.60s/it] 58%|███████████████████████████████████████████████▎                                  | 11597/20117 [7:22:05<6:09:25,  2.60s/it] 58%|███████████████████████████████████████████████▎                                  | 11598/20117 [7:22:07<6:06:11,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11599/20117 [7:22:10<6:05:52,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11600/20117 [7:22:12<6:03:56,  2.56s/it]                                                                                                                                 {'loss': 0.1455, 'grad_norm': 0.3966761529445648, 'learning_rate': 7.681993396687968e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.63, 'epoch': 1.15}
 58%|███████████████████████████████████████████████▎                                  | 11600/20117 [7:22:12<6:03:56,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11601/20117 [7:22:15<6:03:13,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11602/20117 [7:22:17<6:02:42,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11603/20117 [7:22:20<6:08:16,  2.60s/it] 58%|███████████████████████████████████████████████▎                                  | 11604/20117 [7:22:23<6:05:30,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11605/20117 [7:22:25<6:05:26,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11606/20117 [7:22:28<6:04:17,  2.57s/it] 58%|███████████████████████████████████████████████▎                                  | 11607/20117 [7:22:30<6:03:09,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11608/20117 [7:22:33<6:03:03,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11609/20117 [7:22:35<6:02:37,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11610/20117 [7:22:38<6:04:20,  2.57s/it]                                                                                                                                 {'loss': 0.1397, 'grad_norm': 0.34350907802581787, 'learning_rate': 7.666729104591678e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 292.42, 'epoch': 1.15}
 58%|███████████████████████████████████████████████▎                                  | 11610/20117 [7:22:38<6:04:20,  2.57s/it] 58%|███████████████████████████████████████████████▎                                  | 11611/20117 [7:22:41<6:03:14,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11612/20117 [7:22:43<6:05:34,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11613/20117 [7:22:46<6:05:49,  2.58s/it] 58%|███████████████████████████████████████████████▎                                  | 11614/20117 [7:22:48<6:04:40,  2.57s/it] 58%|███████████████████████████████████████████████▎                                  | 11615/20117 [7:22:51<6:02:25,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11616/20117 [7:22:53<6:03:27,  2.57s/it] 58%|███████████████████████████████████████████████▎                                  | 11617/20117 [7:22:56<6:02:49,  2.56s/it] 58%|███████████████████████████████████████████████▎                                  | 11618/20117 [7:22:58<5:58:58,  2.53s/it] 58%|███████████████████████████████████████████████▎                                  | 11619/20117 [7:23:01<5:59:24,  2.54s/it] 58%|███████████████████████████████████████████████▎                                  | 11620/20117 [7:23:03<5:52:16,  2.49s/it]                                                                                                                                 {'loss': 0.1825, 'grad_norm': 0.5656160712242126, 'learning_rate': 7.651470559834747e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.89, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▎                                  | 11620/20117 [7:23:03<5:52:16,  2.49s/it] 58%|███████████████████████████████████████████████▎                                  | 11621/20117 [7:23:06<5:48:40,  2.46s/it] 58%|███████████████████████████████████████████████▎                                  | 11622/20117 [7:23:08<5:50:23,  2.47s/it] 58%|███████████████████████████████████████████████▍                                  | 11623/20117 [7:23:11<5:54:30,  2.50s/it] 58%|███████████████████████████████████████████████▍                                  | 11624/20117 [7:23:14<6:11:13,  2.62s/it] 58%|███████████████████████████████████████████████▍                                  | 11625/20117 [7:23:16<6:04:56,  2.58s/it] 58%|███████████████████████████████████████████████▍                                  | 11626/20117 [7:23:19<6:00:56,  2.55s/it] 58%|███████████████████████████████████████████████▍                                  | 11627/20117 [7:23:21<5:59:22,  2.54s/it] 58%|███████████████████████████████████████████████▍                                  | 11628/20117 [7:23:24<5:57:34,  2.53s/it] 58%|███████████████████████████████████████████████▍                                  | 11629/20117 [7:23:26<5:53:20,  2.50s/it] 58%|███████████████████████████████████████████████▍                                  | 11630/20117 [7:23:29<5:51:10,  2.48s/it]                                                                                                                                 {'loss': 0.1718, 'grad_norm': 0.4410739541053772, 'learning_rate': 7.636217800002203e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.13, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▍                                  | 11630/20117 [7:23:29<5:51:10,  2.48s/it] 58%|███████████████████████████████████████████████▍                                  | 11631/20117 [7:23:31<5:46:11,  2.45s/it] 58%|███████████████████████████████████████████████▍                                  | 11632/20117 [7:23:33<5:45:05,  2.44s/it] 58%|███████████████████████████████████████████████▍                                  | 11633/20117 [7:23:36<5:49:37,  2.47s/it] 58%|███████████████████████████████████████████████▍                                  | 11634/20117 [7:23:39<5:54:32,  2.51s/it] 58%|███████████████████████████████████████████████▍                                  | 11635/20117 [7:23:41<5:56:15,  2.52s/it] 58%|███████████████████████████████████████████████▍                                  | 11636/20117 [7:23:44<5:59:05,  2.54s/it] 58%|███████████████████████████████████████████████▍                                  | 11637/20117 [7:23:46<5:57:23,  2.53s/it] 58%|███████████████████████████████████████████████▍                                  | 11638/20117 [7:23:49<5:59:03,  2.54s/it] 58%|███████████████████████████████████████████████▍                                  | 11639/20117 [7:23:51<5:57:55,  2.53s/it] 58%|███████████████████████████████████████████████▍                                  | 11640/20117 [7:23:54<5:57:56,  2.53s/it]                                                                                                                                 {'loss': 0.1638, 'grad_norm': 0.5351715087890625, 'learning_rate': 7.620970862664811e-05, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.44, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▍                                  | 11640/20117 [7:23:54<5:57:56,  2.53s/it] 58%|███████████████████████████████████████████████▍                                  | 11641/20117 [7:23:56<6:00:38,  2.55s/it] 58%|███████████████████████████████████████████████▍                                  | 11642/20117 [7:23:59<5:59:24,  2.54s/it] 58%|███████████████████████████████████████████████▍                                  | 11643/20117 [7:24:01<5:59:18,  2.54s/it] 58%|███████████████████████████████████████████████▍                                  | 11644/20117 [7:24:04<5:58:50,  2.54s/it] 58%|███████████████████████████████████████████████▍                                  | 11645/20117 [7:24:07<5:59:24,  2.55s/it] 58%|███████████████████████████████████████████████▍                                  | 11646/20117 [7:24:09<5:59:10,  2.54s/it] 58%|███████████████████████████████████████████████▍                                  | 11647/20117 [7:24:12<5:59:34,  2.55s/it] 58%|███████████████████████████████████████████████▍                                  | 11648/20117 [7:24:14<5:59:27,  2.55s/it] 58%|███████████████████████████████████████████████▍                                  | 11649/20117 [7:24:17<5:59:47,  2.55s/it] 58%|███████████████████████████████████████████████▍                                  | 11650/20117 [7:24:19<6:00:28,  2.55s/it]                                                                                                                                 {'loss': 0.2124, 'grad_norm': 0.5122233033180237, 'learning_rate': 7.605729785379005e-05, 'memory/max_active (GiB)': 20.46, 'memory/max_allocated (GiB)': 20.46, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.63, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▍                                  | 11650/20117 [7:24:19<6:00:28,  2.55s/it] 58%|███████████████████████████████████████████████▍                                  | 11651/20117 [7:24:22<6:01:33,  2.56s/it] 58%|███████████████████████████████████████████████▍                                  | 11652/20117 [7:24:24<6:01:31,  2.56s/it] 58%|███████████████████████████████████████████████▍                                  | 11653/20117 [7:24:27<6:01:04,  2.56s/it] 58%|███████████████████████████████████████████████▌                                  | 11654/20117 [7:24:30<6:01:00,  2.56s/it] 58%|███████████████████████████████████████████████▌                                  | 11655/20117 [7:24:32<6:02:28,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11656/20117 [7:24:35<6:02:30,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11657/20117 [7:24:37<6:04:03,  2.58s/it] 58%|███████████████████████████████████████████████▌                                  | 11658/20117 [7:24:40<6:03:23,  2.58s/it] 58%|███████████████████████████████████████████████▌                                  | 11659/20117 [7:24:42<6:02:32,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11660/20117 [7:24:45<6:01:22,  2.56s/it]                                                                                                                                 {'loss': 0.1758, 'grad_norm': 0.49694183468818665, 'learning_rate': 7.590494605686781e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.59, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▌                                  | 11660/20117 [7:24:45<6:01:22,  2.56s/it] 58%|███████████████████████████████████████████████▌                                  | 11661/20117 [7:24:48<6:01:03,  2.56s/it] 58%|███████████████████████████████████████████████▌                                  | 11662/20117 [7:24:50<6:02:21,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11663/20117 [7:24:53<6:01:54,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11664/20117 [7:24:55<6:03:36,  2.58s/it] 58%|███████████████████████████████████████████████▌                                  | 11665/20117 [7:24:58<6:03:02,  2.58s/it] 58%|███████████████████████████████████████████████▌                                  | 11666/20117 [7:25:01<6:02:51,  2.58s/it] 58%|███████████████████████████████████████████████▌                                  | 11667/20117 [7:25:03<6:01:10,  2.56s/it] 58%|███████████████████████████████████████████████▌                                  | 11668/20117 [7:25:06<6:00:22,  2.56s/it] 58%|███████████████████████████████████████████████▌                                  | 11669/20117 [7:25:08<6:03:43,  2.58s/it] 58%|███████████████████████████████████████████████▌                                  | 11670/20117 [7:25:11<6:03:04,  2.58s/it]                                                                                                                                 {'loss': 0.1506, 'grad_norm': 0.5573239922523499, 'learning_rate': 7.5752653611156e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.32, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▌                                  | 11670/20117 [7:25:11<6:03:04,  2.58s/it] 58%|███████████████████████████████████████████████▌                                  | 11671/20117 [7:25:13<6:01:30,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11672/20117 [7:25:16<5:57:54,  2.54s/it] 58%|███████████████████████████████████████████████▌                                  | 11673/20117 [7:25:18<5:57:30,  2.54s/it] 58%|███████████████████████████████████████████████▌                                  | 11674/20117 [7:25:21<5:57:47,  2.54s/it] 58%|███████████████████████████████████████████████▌                                  | 11675/20117 [7:25:23<5:56:58,  2.54s/it] 58%|███████████████████████████████████████████████▌                                  | 11676/20117 [7:25:26<6:01:02,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11677/20117 [7:25:29<6:01:16,  2.57s/it] 58%|███████████████████████████████████████████████▌                                  | 11678/20117 [7:25:31<6:13:29,  2.66s/it] 58%|███████████████████████████████████████████████▌                                  | 11679/20117 [7:25:34<6:08:41,  2.62s/it] 58%|███████████████████████████████████████████████▌                                  | 11680/20117 [7:25:37<6:06:22,  2.61s/it]                                                                                                                                 {'loss': 0.1212, 'grad_norm': 0.4641178250312805, 'learning_rate': 7.560042089178319e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 253.53, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▌                                  | 11680/20117 [7:25:37<6:06:22,  2.61s/it] 58%|███████████████████████████████████████████████▌                                  | 11681/20117 [7:25:39<6:04:24,  2.59s/it] 58%|███████████████████████████████████████████████▌                                  | 11682/20117 [7:25:42<6:05:11,  2.60s/it] 58%|███████████████████████████████████████████████▌                                  | 11683/20117 [7:25:44<6:03:07,  2.58s/it] 58%|███████████████████████████████████████████████▋                                  | 11684/20117 [7:25:47<6:03:26,  2.59s/it] 58%|███████████████████████████████████████████████▋                                  | 11685/20117 [7:25:49<6:02:21,  2.58s/it] 58%|███████████████████████████████████████████████▋                                  | 11686/20117 [7:25:52<6:00:14,  2.56s/it] 58%|███████████████████████████████████████████████▋                                  | 11687/20117 [7:25:55<5:57:27,  2.54s/it] 58%|███████████████████████████████████████████████▋                                  | 11688/20117 [7:25:57<6:00:22,  2.57s/it] 58%|███████████████████████████████████████████████▋                                  | 11689/20117 [7:26:00<6:04:23,  2.59s/it] 58%|███████████████████████████████████████████████▋                                  | 11690/20117 [7:26:02<5:58:40,  2.55s/it]                                                                                                                                 {'loss': 0.1489, 'grad_norm': 0.2381938248872757, 'learning_rate': 7.544824827373064e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.27, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▋                                  | 11690/20117 [7:26:02<5:58:40,  2.55s/it] 58%|███████████████████████████████████████████████▋                                  | 11691/20117 [7:26:05<5:53:42,  2.52s/it] 58%|███████████████████████████████████████████████▋                                  | 11692/20117 [7:26:07<5:54:48,  2.53s/it] 58%|███████████████████████████████████████████████▋                                  | 11693/20117 [7:26:10<5:56:18,  2.54s/it] 58%|███████████████████████████████████████████████▋                                  | 11694/20117 [7:26:12<5:53:21,  2.52s/it] 58%|███████████████████████████████████████████████▋                                  | 11695/20117 [7:26:15<5:53:04,  2.52s/it] 58%|███████████████████████████████████████████████▋                                  | 11696/20117 [7:26:17<5:53:24,  2.52s/it] 58%|███████████████████████████████████████████████▋                                  | 11697/20117 [7:26:20<5:52:15,  2.51s/it] 58%|███████████████████████████████████████████████▋                                  | 11698/20117 [7:26:22<5:51:41,  2.51s/it] 58%|███████████████████████████████████████████████▋                                  | 11699/20117 [7:26:25<5:51:38,  2.51s/it] 58%|███████████████████████████████████████████████▋                                  | 11700/20117 [7:26:27<5:49:06,  2.49s/it]                                                                                                                                 {'loss': 0.2159, 'grad_norm': 0.5554631948471069, 'learning_rate': 7.529613613183174e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.55, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▋                                  | 11700/20117 [7:26:27<5:49:06,  2.49s/it] 58%|███████████████████████████████████████████████▋                                  | 11701/20117 [7:26:30<5:46:01,  2.47s/it] 58%|███████████████████████████████████████████████▋                                  | 11702/20117 [7:26:32<5:49:48,  2.49s/it] 58%|███████████████████████████████████████████████▋                                  | 11703/20117 [7:26:35<5:52:24,  2.51s/it] 58%|███████████████████████████████████████████████▋                                  | 11704/20117 [7:26:37<5:55:44,  2.54s/it] 58%|███████████████████████████████████████████████▋                                  | 11705/20117 [7:26:40<5:54:19,  2.53s/it] 58%|███████████████████████████████████████████████▋                                  | 11706/20117 [7:26:42<5:54:32,  2.53s/it] 58%|███████████████████████████████████████████████▋                                  | 11707/20117 [7:26:45<5:55:28,  2.54s/it] 58%|███████████████████████████████████████████████▋                                  | 11708/20117 [7:26:48<5:56:58,  2.55s/it] 58%|███████████████████████████████████████████████▋                                  | 11709/20117 [7:26:50<5:55:36,  2.54s/it] 58%|███████████████████████████████████████████████▋                                  | 11710/20117 [7:26:53<6:02:18,  2.59s/it]                                                                                                                                 {'loss': 0.1905, 'grad_norm': 0.3771873414516449, 'learning_rate': 7.514408484077081e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.67, 'epoch': 1.16}
 58%|███████████████████████████████████████████████▋                                  | 11710/20117 [7:26:53<6:02:18,  2.59s/it] 58%|███████████████████████████████████████████████▋                                  | 11711/20117 [7:26:55<6:00:11,  2.57s/it] 58%|███████████████████████████████████████████████▋                                  | 11712/20117 [7:26:58<5:58:47,  2.56s/it] 58%|███████████████████████████████████████████████▋                                  | 11713/20117 [7:27:00<5:57:51,  2.55s/it] 58%|███████████████████████████████████████████████▋                                  | 11714/20117 [7:27:03<5:57:14,  2.55s/it] 58%|███████████████████████████████████████████████▊                                  | 11715/20117 [7:27:05<5:56:10,  2.54s/it] 58%|███████████████████████████████████████████████▊                                  | 11716/20117 [7:27:08<5:55:59,  2.54s/it] 58%|███████████████████████████████████████████████▊                                  | 11717/20117 [7:27:11<5:57:25,  2.55s/it] 58%|███████████████████████████████████████████████▊                                  | 11718/20117 [7:27:13<5:57:51,  2.56s/it] 58%|███████████████████████████████████████████████▊                                  | 11719/20117 [7:27:16<5:58:33,  2.56s/it] 58%|███████████████████████████████████████████████▊                                  | 11720/20117 [7:27:18<5:59:51,  2.57s/it]                                                                                                                                 {'loss': 0.1409, 'grad_norm': 0.5581318736076355, 'learning_rate': 7.499209477508238e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 259.67, 'epoch': 1.17}
 58%|███████████████████████████████████████████████▊                                  | 11720/20117 [7:27:18<5:59:51,  2.57s/it] 58%|███████████████████████████████████████████████▊                                  | 11721/20117 [7:27:21<6:01:25,  2.58s/it] 58%|███████████████████████████████████████████████▊                                  | 11722/20117 [7:27:23<6:01:34,  2.58s/it] 58%|███████████████████████████████████████████████▊                                  | 11723/20117 [7:27:26<6:07:43,  2.63s/it] 58%|███████████████████████████████████████████████▊                                  | 11724/20117 [7:27:29<6:15:18,  2.68s/it] 58%|███████████████████████████████████████████████▊                                  | 11725/20117 [7:27:32<6:11:12,  2.65s/it] 58%|███████████████████████████████████████████████▊                                  | 11726/20117 [7:27:34<6:08:16,  2.63s/it] 58%|███████████████████████████████████████████████▊                                  | 11727/20117 [7:27:37<6:04:33,  2.61s/it] 58%|███████████████████████████████████████████████▊                                  | 11728/20117 [7:27:39<6:03:19,  2.60s/it] 58%|███████████████████████████████████████████████▊                                  | 11729/20117 [7:27:42<6:22:23,  2.74s/it] 58%|███████████████████████████████████████████████▊                                  | 11730/20117 [7:27:45<6:16:27,  2.69s/it]                                                                                                                                 {'loss': 0.1236, 'grad_norm': 0.4785107970237732, 'learning_rate': 7.484016630915003e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.21, 'epoch': 1.17}
 58%|███████████████████████████████████████████████▊                                  | 11730/20117 [7:27:45<6:16:27,  2.69s/it] 58%|███████████████████████████████████████████████▊                                  | 11731/20117 [7:27:48<6:11:26,  2.66s/it] 58%|███████████████████████████████████████████████▊                                  | 11732/20117 [7:27:50<6:08:04,  2.63s/it] 58%|███████████████████████████████████████████████▊                                  | 11733/20117 [7:27:53<6:03:47,  2.60s/it] 58%|███████████████████████████████████████████████▊                                  | 11734/20117 [7:27:55<6:01:55,  2.59s/it] 58%|███████████████████████████████████████████████▊                                  | 11735/20117 [7:27:58<5:58:24,  2.57s/it] 58%|███████████████████████████████████████████████▊                                  | 11736/20117 [7:28:00<6:00:32,  2.58s/it] 58%|███████████████████████████████████████████████▊                                  | 11737/20117 [7:28:03<5:59:40,  2.58s/it] 58%|███████████████████████████████████████████████▊                                  | 11738/20117 [7:28:05<5:59:06,  2.57s/it] 58%|███████████████████████████████████████████████▊                                  | 11739/20117 [7:28:08<5:59:00,  2.57s/it] 58%|███████████████████████████████████████████████▊                                  | 11740/20117 [7:28:11<6:00:45,  2.58s/it]                                                                                                                                 {'loss': 0.1259, 'grad_norm': 0.48833346366882324, 'learning_rate': 7.468829981720574e-05, 'memory/max_active (GiB)': 21.41, 'memory/max_allocated (GiB)': 21.41, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.37, 'epoch': 1.17}
 58%|███████████████████████████████████████████████▊                                  | 11740/20117 [7:28:11<6:00:45,  2.58s/it] 58%|███████████████████████████████████████████████▊                                  | 11741/20117 [7:28:13<6:00:00,  2.58s/it] 58%|███████████████████████████████████████████████▊                                  | 11742/20117 [7:28:16<5:58:03,  2.57s/it] 58%|███████████████████████████████████████████████▊                                  | 11743/20117 [7:28:18<5:55:53,  2.55s/it] 58%|███████████████████████████████████████████████▊                                  | 11744/20117 [7:28:21<5:57:04,  2.56s/it] 58%|███████████████████████████████████████████████▊                                  | 11745/20117 [7:28:23<5:57:09,  2.56s/it] 58%|███████████████████████████████████████████████▉                                  | 11746/20117 [7:28:26<5:57:30,  2.56s/it] 58%|███████████████████████████████████████████████▉                                  | 11747/20117 [7:28:29<5:58:23,  2.57s/it] 58%|███████████████████████████████████████████████▉                                  | 11748/20117 [7:28:31<5:57:03,  2.56s/it] 58%|███████████████████████████████████████████████▉                                  | 11749/20117 [7:28:34<5:56:14,  2.55s/it] 58%|███████████████████████████████████████████████▉                                  | 11750/20117 [7:28:36<5:57:34,  2.56s/it]                                                                                                                                 {'loss': 0.1447, 'grad_norm': 0.3307853639125824, 'learning_rate': 7.453649567332871e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 261.54, 'epoch': 1.17}
 58%|███████████████████████████████████████████████▉                                  | 11750/20117 [7:28:36<5:57:34,  2.56s/it] 58%|███████████████████████████████████████████████▉                                  | 11751/20117 [7:28:39<5:56:56,  2.56s/it] 58%|███████████████████████████████████████████████▉                                  | 11752/20117 [7:28:41<5:59:11,  2.58s/it] 58%|███████████████████████████████████████████████▉                                  | 11753/20117 [7:28:44<5:57:50,  2.57s/it] 58%|███████████████████████████████████████████████▉                                  | 11754/20117 [7:28:47<6:02:29,  2.60s/it] 58%|███████████████████████████████████████████████▉                                  | 11755/20117 [7:28:49<6:05:05,  2.62s/it] 58%|███████████████████████████████████████████████▉                                  | 11756/20117 [7:28:52<6:06:18,  2.63s/it] 58%|███████████████████████████████████████████████▉                                  | 11757/20117 [7:28:54<6:01:51,  2.60s/it] 58%|███████████████████████████████████████████████▉                                  | 11758/20117 [7:28:57<5:53:41,  2.54s/it] 58%|███████████████████████████████████████████████▉                                  | 11759/20117 [7:28:59<5:48:46,  2.50s/it] 58%|███████████████████████████████████████████████▉                                  | 11760/20117 [7:29:02<5:45:41,  2.48s/it]                                                                                                                                 {'loss': 0.1734, 'grad_norm': 0.6457106471061707, 'learning_rate': 7.438475425144469e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.76, 'epoch': 1.17}
 58%|███████████████████████████████████████████████▉                                  | 11760/20117 [7:29:02<5:45:41,  2.48s/it] 58%|███████████████████████████████████████████████▉                                  | 11761/20117 [7:29:04<5:53:18,  2.54s/it] 58%|███████████████████████████████████████████████▉                                  | 11762/20117 [7:29:07<5:55:10,  2.55s/it] 58%|███████████████████████████████████████████████▉                                  | 11763/20117 [7:29:10<5:57:06,  2.56s/it] 58%|███████████████████████████████████████████████▉                                  | 11764/20117 [7:29:12<5:52:24,  2.53s/it] 58%|███████████████████████████████████████████████▉                                  | 11765/20117 [7:29:15<5:52:52,  2.54s/it] 58%|███████████████████████████████████████████████▉                                  | 11766/20117 [7:29:17<5:57:10,  2.57s/it] 58%|███████████████████████████████████████████████▉                                  | 11767/20117 [7:29:20<5:53:22,  2.54s/it] 58%|███████████████████████████████████████████████▉                                  | 11768/20117 [7:29:22<5:54:18,  2.55s/it] 59%|███████████████████████████████████████████████▉                                  | 11769/20117 [7:29:25<5:50:48,  2.52s/it] 59%|███████████████████████████████████████████████▉                                  | 11770/20117 [7:29:27<5:45:37,  2.48s/it]                                                                                                                                 {'loss': 0.1597, 'grad_norm': 0.4975489675998688, 'learning_rate': 7.423307592532484e-05, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.31, 'epoch': 1.17}
 59%|███████████████████████████████████████████████▉                                  | 11770/20117 [7:29:27<5:45:37,  2.48s/it] 59%|███████████████████████████████████████████████▉                                  | 11771/20117 [7:29:29<5:41:46,  2.46s/it] 59%|███████████████████████████████████████████████▉                                  | 11772/20117 [7:29:32<5:48:12,  2.50s/it] 59%|███████████████████████████████████████████████▉                                  | 11773/20117 [7:29:35<5:52:29,  2.53s/it] 59%|███████████████████████████████████████████████▉                                  | 11774/20117 [7:29:37<5:57:17,  2.57s/it] 59%|███████████████████████████████████████████████▉                                  | 11775/20117 [7:29:40<5:58:03,  2.58s/it] 59%|████████████████████████████████████████████████                                  | 11776/20117 [7:29:43<5:58:57,  2.58s/it] 59%|████████████████████████████████████████████████                                  | 11777/20117 [7:29:45<5:57:50,  2.57s/it] 59%|████████████████████████████████████████████████                                  | 11778/20117 [7:29:48<5:59:10,  2.58s/it] 59%|████████████████████████████████████████████████                                  | 11779/20117 [7:29:50<5:57:38,  2.57s/it] 59%|████████████████████████████████████████████████                                  | 11780/20117 [7:29:53<5:57:49,  2.58s/it]                                                                                                                                 {'loss': 0.1448, 'grad_norm': 0.5595365166664124, 'learning_rate': 7.408146106858496e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 302.61, 'epoch': 1.17}
 59%|████████████████████████████████████████████████                                  | 11780/20117 [7:29:53<5:57:49,  2.58s/it] 59%|████████████████████████████████████████████████                                  | 11781/20117 [7:29:56<6:16:11,  2.71s/it] 59%|████████████████████████████████████████████████                                  | 11782/20117 [7:29:58<6:08:22,  2.65s/it] 59%|████████████████████████████████████████████████                                  | 11783/20117 [7:30:01<6:05:19,  2.63s/it] 59%|████████████████████████████████████████████████                                  | 11784/20117 [7:30:03<6:01:51,  2.61s/it] 59%|████████████████████████████████████████████████                                  | 11785/20117 [7:30:06<6:00:29,  2.60s/it] 59%|████████████████████████████████████████████████                                  | 11786/20117 [7:30:09<6:00:03,  2.59s/it] 59%|████████████████████████████████████████████████                                  | 11787/20117 [7:30:11<5:58:33,  2.58s/it] 59%|████████████████████████████████████████████████                                  | 11788/20117 [7:30:14<5:56:12,  2.57s/it] 59%|████████████████████████████████████████████████                                  | 11789/20117 [7:30:16<5:54:10,  2.55s/it] 59%|████████████████████████████████████████████████                                  | 11790/20117 [7:30:19<5:53:37,  2.55s/it]                                                                                                                                 {'loss': 0.1829, 'grad_norm': 0.5762649178504944, 'learning_rate': 7.392991005468449e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.36, 'epoch': 1.17}
 59%|████████████████████████████████████████████████                                  | 11790/20117 [7:30:19<5:53:37,  2.55s/it] 59%|████████████████████████████████████████████████                                  | 11791/20117 [7:30:21<5:51:39,  2.53s/it] 59%|████████████████████████████████████████████████                                  | 11792/20117 [7:30:24<5:50:22,  2.53s/it] 59%|████████████████████████████████████████████████                                  | 11793/20117 [7:30:26<5:50:03,  2.52s/it] 59%|████████████████████████████████████████████████                                  | 11794/20117 [7:30:29<5:46:55,  2.50s/it] 59%|████████████████████████████████████████████████                                  | 11795/20117 [7:30:31<5:46:42,  2.50s/it] 59%|████████████████████████████████████████████████                                  | 11796/20117 [7:30:34<5:43:04,  2.47s/it] 59%|████████████████████████████████████████████████                                  | 11797/20117 [7:30:36<5:44:11,  2.48s/it] 59%|████████████████████████████████████████████████                                  | 11798/20117 [7:30:39<5:44:01,  2.48s/it] 59%|████████████████████████████████████████████████                                  | 11799/20117 [7:30:41<5:43:24,  2.48s/it] 59%|████████████████████████████████████████████████                                  | 11800/20117 [7:30:44<5:45:34,  2.49s/it]                                                                                                                                 {'loss': 0.1615, 'grad_norm': 0.4875398278236389, 'learning_rate': 7.377842325692557e-05, 'memory/max_active (GiB)': 21.54, 'memory/max_allocated (GiB)': 21.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.43, 'epoch': 1.17}
 59%|████████████████████████████████████████████████                                  | 11800/20117 [7:30:44<5:45:34,  2.49s/it] 59%|████████████████████████████████████████████████                                  | 11801/20117 [7:30:46<5:44:59,  2.49s/it] 59%|████████████████████████████████████████████████                                  | 11802/20117 [7:30:49<5:47:35,  2.51s/it] 59%|████████████████████████████████████████████████                                  | 11803/20117 [7:30:51<5:45:52,  2.50s/it] 59%|████████████████████████████████████████████████                                  | 11804/20117 [7:30:54<5:48:12,  2.51s/it] 59%|████████████████████████████████████████████████                                  | 11805/20117 [7:30:56<5:47:22,  2.51s/it] 59%|████████████████████████████████████████████████                                  | 11806/20117 [7:30:59<5:47:01,  2.51s/it] 59%|████████████████████████████████████████████████▏                                 | 11807/20117 [7:31:01<5:47:10,  2.51s/it] 59%|████████████████████████████████████████████████▏                                 | 11808/20117 [7:31:04<5:48:06,  2.51s/it] 59%|████████████████████████████████████████████████▏                                 | 11809/20117 [7:31:06<5:47:58,  2.51s/it] 59%|████████████████████████████████████████████████▏                                 | 11810/20117 [7:31:09<5:45:51,  2.50s/it]                                                                                                                                 {'loss': 0.1935, 'grad_norm': 0.7832923531532288, 'learning_rate': 7.362700104845226e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.32, 'epoch': 1.17}
 59%|████████████████████████████████████████████████▏                                 | 11810/20117 [7:31:09<5:45:51,  2.50s/it] 59%|████████████████████████████████████████████████▏                                 | 11811/20117 [7:31:11<5:46:18,  2.50s/it] 59%|████████████████████████████████████████████████▏                                 | 11812/20117 [7:31:14<5:49:17,  2.52s/it] 59%|████████████████████████████████████████████████▏                                 | 11813/20117 [7:31:16<5:51:45,  2.54s/it] 59%|████████████████████████████████████████████████▏                                 | 11814/20117 [7:31:19<5:50:39,  2.53s/it] 59%|████████████████████████████████████████████████▏                                 | 11815/20117 [7:31:21<5:48:22,  2.52s/it] 59%|████████████████████████████████████████████████▏                                 | 11816/20117 [7:31:24<5:49:47,  2.53s/it] 59%|████████████████████████████████████████████████▏                                 | 11817/20117 [7:31:26<5:49:39,  2.53s/it] 59%|████████████████████████████████████████████████▏                                 | 11818/20117 [7:31:29<5:49:12,  2.52s/it] 59%|████████████████████████████████████████████████▏                                 | 11819/20117 [7:31:32<5:50:58,  2.54s/it] 59%|████████████████████████████████████████████████▏                                 | 11820/20117 [7:31:34<5:51:10,  2.54s/it]                                                                                                                                 {'loss': 0.1337, 'grad_norm': 0.39051079750061035, 'learning_rate': 7.34756438022494e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 287.23, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▏                                 | 11820/20117 [7:31:34<5:51:10,  2.54s/it] 59%|████████████████████████████████████████████████▏                                 | 11821/20117 [7:31:37<5:50:15,  2.53s/it] 59%|████████████████████████████████████████████████▏                                 | 11822/20117 [7:31:39<5:51:01,  2.54s/it] 59%|████████████████████████████████████████████████▏                                 | 11823/20117 [7:31:42<5:50:51,  2.54s/it] 59%|████████████████████████████████████████████████▏                                 | 11824/20117 [7:31:44<5:52:07,  2.55s/it] 59%|████████████████████████████████████████████████▏                                 | 11825/20117 [7:31:47<6:08:10,  2.66s/it] 59%|████████████████████████████████████████████████▏                                 | 11826/20117 [7:31:50<6:06:51,  2.65s/it] 59%|████████████████████████████████████████████████▏                                 | 11827/20117 [7:31:52<6:01:37,  2.62s/it] 59%|████████████████████████████████████████████████▏                                 | 11828/20117 [7:31:55<5:59:04,  2.60s/it] 59%|████████████████████████████████████████████████▏                                 | 11829/20117 [7:31:57<5:56:53,  2.58s/it] 59%|████████████████████████████████████████████████▏                                 | 11830/20117 [7:32:00<5:49:14,  2.53s/it]                                                                                                                                 {'loss': 0.1535, 'grad_norm': 0.3700825273990631, 'learning_rate': 7.332435189114194e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.88, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▏                                 | 11830/20117 [7:32:00<5:49:14,  2.53s/it] 59%|████████████████████████████████████████████████▏                                 | 11831/20117 [7:32:02<5:43:19,  2.49s/it] 59%|████████████████████████████████████████████████▏                                 | 11832/20117 [7:32:05<5:54:59,  2.57s/it] 59%|████████████████████████████████████████████████▏                                 | 11833/20117 [7:32:08<6:14:02,  2.71s/it] 59%|████████████████████████████████████████████████▏                                 | 11834/20117 [7:32:11<6:04:51,  2.64s/it] 59%|████████████████████████████████████████████████▏                                 | 11835/20117 [7:32:13<5:56:02,  2.58s/it] 59%|████████████████████████████████████████████████▏                                 | 11836/20117 [7:32:15<5:50:51,  2.54s/it] 59%|████████████████████████████████████████████████▏                                 | 11837/20117 [7:32:18<5:47:47,  2.52s/it] 59%|████████████████████████████████████████████████▎                                 | 11838/20117 [7:32:20<5:42:04,  2.48s/it] 59%|████████████████████████████████████████████████▎                                 | 11839/20117 [7:32:23<5:41:49,  2.48s/it] 59%|████████████████████████████████████████████████▎                                 | 11840/20117 [7:32:25<5:38:44,  2.46s/it]                                                                                                                                 {'loss': 0.1586, 'grad_norm': 0.6411652565002441, 'learning_rate': 7.317312568779375e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.43, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▎                                 | 11840/20117 [7:32:25<5:38:44,  2.46s/it] 59%|████████████████████████████████████████████████▎                                 | 11841/20117 [7:32:28<5:34:15,  2.42s/it] 59%|████████████████████████████████████████████████▎                                 | 11842/20117 [7:32:30<5:30:51,  2.40s/it] 59%|████████████████████████████████████████████████▎                                 | 11843/20117 [7:32:32<5:35:54,  2.44s/it] 59%|████████████████████████████████████████████████▎                                 | 11844/20117 [7:32:35<5:39:39,  2.46s/it] 59%|████████████████████████████████████████████████▎                                 | 11845/20117 [7:32:37<5:42:41,  2.49s/it] 59%|████████████████████████████████████████████████▎                                 | 11846/20117 [7:32:40<5:42:10,  2.48s/it] 59%|████████████████████████████████████████████████▎                                 | 11847/20117 [7:32:43<5:55:20,  2.58s/it] 59%|████████████████████████████████████████████████▎                                 | 11848/20117 [7:32:46<6:05:37,  2.65s/it] 59%|████████████████████████████████████████████████▎                                 | 11849/20117 [7:32:48<6:11:53,  2.70s/it] 59%|████████████████████████████████████████████████▎                                 | 11850/20117 [7:32:51<6:05:10,  2.65s/it]                                                                                                                                 {'loss': 0.1639, 'grad_norm': 0.46753886342048645, 'learning_rate': 7.302196556470701e-05, 'memory/max_active (GiB)': 19.66, 'memory/max_allocated (GiB)': 19.66, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.98, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▎                                 | 11850/20117 [7:32:51<6:05:10,  2.65s/it] 59%|████████████████████████████████████████████████▎                                 | 11851/20117 [7:32:53<6:03:17,  2.64s/it] 59%|████████████████████████████████████████████████▎                                 | 11852/20117 [7:32:56<5:58:27,  2.60s/it] 59%|████████████████████████████████████████████████▎                                 | 11853/20117 [7:32:59<5:55:28,  2.58s/it] 59%|████████████████████████████████████████████████▎                                 | 11854/20117 [7:33:01<5:53:20,  2.57s/it] 59%|████████████████████████████████████████████████▎                                 | 11855/20117 [7:33:04<5:52:16,  2.56s/it] 59%|████████████████████████████████████████████████▎                                 | 11856/20117 [7:33:06<5:52:53,  2.56s/it] 59%|████████████████████████████████████████████████▎                                 | 11857/20117 [7:33:09<5:51:31,  2.55s/it] 59%|████████████████████████████████████████████████▎                                 | 11858/20117 [7:33:11<5:50:35,  2.55s/it] 59%|████████████████████████████████████████████████▎                                 | 11859/20117 [7:33:14<5:50:14,  2.54s/it] 59%|████████████████████████████████████████████████▎                                 | 11860/20117 [7:33:16<5:49:30,  2.54s/it]                                                                                                                                 {'loss': 0.2674, 'grad_norm': 0.7362424731254578, 'learning_rate': 7.287087189422099e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.74, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▎                                 | 11860/20117 [7:33:16<5:49:30,  2.54s/it] 59%|████████████████████████████████████████████████▎                                 | 11861/20117 [7:33:19<5:49:45,  2.54s/it] 59%|████████████████████████████████████████████████▎                                 | 11862/20117 [7:33:21<5:49:56,  2.54s/it] 59%|████████████████████████████████████████████████▎                                 | 11863/20117 [7:33:24<5:50:24,  2.55s/it] 59%|████████████████████████████████████████████████▎                                 | 11864/20117 [7:33:26<5:48:45,  2.54s/it] 59%|████████████████████████████████████████████████▎                                 | 11865/20117 [7:33:29<5:48:49,  2.54s/it] 59%|████████████████████████████████████████████████▎                                 | 11866/20117 [7:33:32<5:47:40,  2.53s/it] 59%|████████████████████████████████████████████████▎                                 | 11867/20117 [7:33:34<5:45:59,  2.52s/it] 59%|████████████████████████████████████████████████▍                                 | 11868/20117 [7:33:37<5:46:02,  2.52s/it] 59%|████████████████████████████████████████████████▍                                 | 11869/20117 [7:33:39<5:46:36,  2.52s/it] 59%|████████████████████████████████████████████████▍                                 | 11870/20117 [7:33:42<5:45:00,  2.51s/it]                                                                                                                                 {'loss': 0.166, 'grad_norm': 0.5120696425437927, 'learning_rate': 7.271984504851141e-05, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 321.01, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▍                                 | 11870/20117 [7:33:42<5:45:00,  2.51s/it] 59%|████████████████████████████████████████████████▍                                 | 11871/20117 [7:33:44<5:43:04,  2.50s/it] 59%|████████████████████████████████████████████████▍                                 | 11872/20117 [7:33:47<5:43:52,  2.50s/it] 59%|████████████████████████████████████████████████▍                                 | 11873/20117 [7:33:49<5:43:12,  2.50s/it] 59%|████████████████████████████████████████████████▍                                 | 11874/20117 [7:33:51<5:41:23,  2.48s/it] 59%|████████████████████████████████████████████████▍                                 | 11875/20117 [7:33:54<5:44:10,  2.51s/it] 59%|████████████████████████████████████████████████▍                                 | 11876/20117 [7:33:57<5:44:34,  2.51s/it] 59%|████████████████████████████████████████████████▍                                 | 11877/20117 [7:33:59<5:45:13,  2.51s/it] 59%|████████████████████████████████████████████████▍                                 | 11878/20117 [7:34:02<5:45:13,  2.51s/it] 59%|████████████████████████████████████████████████▍                                 | 11879/20117 [7:34:04<5:46:37,  2.52s/it] 59%|████████████████████████████████████████████████▍                                 | 11880/20117 [7:34:07<5:47:26,  2.53s/it]                                                                                                                                 {'loss': 0.1663, 'grad_norm': 0.5142634510993958, 'learning_rate': 7.256888539958923e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.0, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▍                                 | 11880/20117 [7:34:07<5:47:26,  2.53s/it] 59%|████████████████████████████████████████████████▍                                 | 11881/20117 [7:34:09<5:49:46,  2.55s/it] 59%|████████████████████████████████████████████████▍                                 | 11882/20117 [7:34:12<5:55:06,  2.59s/it] 59%|████████████████████████████████████████████████▍                                 | 11883/20117 [7:34:14<5:52:12,  2.57s/it] 59%|████████████████████████████████████████████████▍                                 | 11884/20117 [7:34:17<6:09:01,  2.69s/it] 59%|████████████████████████████████████████████████▍                                 | 11885/20117 [7:34:20<6:02:59,  2.65s/it] 59%|████████████████████████████████████████████████▍                                 | 11886/20117 [7:34:23<5:58:56,  2.62s/it] 59%|████████████████████████████████████████████████▍                                 | 11887/20117 [7:34:25<5:54:54,  2.59s/it] 59%|████████████████████████████████████████████████▍                                 | 11888/20117 [7:34:28<5:52:57,  2.57s/it] 59%|████████████████████████████████████████████████▍                                 | 11889/20117 [7:34:30<5:50:46,  2.56s/it] 59%|████████████████████████████████████████████████▍                                 | 11890/20117 [7:34:33<5:50:47,  2.56s/it]                                                                                                                                 {'loss': 0.2012, 'grad_norm': 0.5077565908432007, 'learning_rate': 7.241799331930006e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.88, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▍                                 | 11890/20117 [7:34:33<5:50:47,  2.56s/it] 59%|████████████████████████████████████████████████▍                                 | 11891/20117 [7:34:35<5:50:44,  2.56s/it] 59%|████████████████████████████████████████████████▍                                 | 11892/20117 [7:34:38<5:51:43,  2.57s/it] 59%|████████████████████████████████████████████████▍                                 | 11893/20117 [7:34:40<5:51:54,  2.57s/it] 59%|████████████████████████████████████████████████▍                                 | 11894/20117 [7:34:43<5:50:05,  2.55s/it] 59%|████████████████████████████████████████████████▍                                 | 11895/20117 [7:34:45<5:47:47,  2.54s/it] 59%|████████████████████████████████████████████████▍                                 | 11896/20117 [7:34:48<5:47:34,  2.54s/it] 59%|████████████████████████████████████████████████▍                                 | 11897/20117 [7:34:50<5:44:55,  2.52s/it] 59%|████████████████████████████████████████████████▍                                 | 11898/20117 [7:34:53<5:44:00,  2.51s/it] 59%|████████████████████████████████████████████████▌                                 | 11899/20117 [7:34:55<5:40:02,  2.48s/it] 59%|████████████████████████████████████████████████▌                                 | 11900/20117 [7:34:58<5:34:40,  2.44s/it]                                                                                                                                 {'loss': 0.1986, 'grad_norm': 0.5016006231307983, 'learning_rate': 7.226716917932289e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.37, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▌                                 | 11900/20117 [7:34:58<5:34:40,  2.44s/it] 59%|████████████████████████████████████████████████▌                                 | 11901/20117 [7:35:00<5:31:28,  2.42s/it] 59%|████████████████████████████████████████████████▌                                 | 11902/20117 [7:35:03<5:34:25,  2.44s/it] 59%|████████████████████████████████████████████████▌                                 | 11903/20117 [7:35:05<5:36:44,  2.46s/it] 59%|████████████████████████████████████████████████▌                                 | 11904/20117 [7:35:08<5:38:20,  2.47s/it] 59%|████████████████████████████████████████████████▌                                 | 11905/20117 [7:35:10<5:36:09,  2.46s/it] 59%|████████████████████████████████████████████████▌                                 | 11906/20117 [7:35:12<5:38:10,  2.47s/it] 59%|████████████████████████████████████████████████▌                                 | 11907/20117 [7:35:15<5:38:51,  2.48s/it] 59%|████████████████████████████████████████████████▌                                 | 11908/20117 [7:35:18<5:45:21,  2.52s/it] 59%|████████████████████████████████████████████████▌                                 | 11909/20117 [7:35:20<5:44:26,  2.52s/it] 59%|████████████████████████████████████████████████▌                                 | 11910/20117 [7:35:22<5:38:17,  2.47s/it]                                                                                                                                 {'loss': 0.1379, 'grad_norm': 0.5038326382637024, 'learning_rate': 7.21164133511695e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.98, 'epoch': 1.18}
 59%|████████████████████████████████████████████████▌                                 | 11910/20117 [7:35:22<5:38:17,  2.47s/it] 59%|████████████████████████████████████████████████▌                                 | 11911/20117 [7:35:25<5:32:11,  2.43s/it] 59%|████████████████████████████████████████████████▌                                 | 11912/20117 [7:35:27<5:28:09,  2.40s/it] 59%|████████████████████████████████████████████████▌                                 | 11913/20117 [7:35:30<5:32:08,  2.43s/it] 59%|████████████████████████████████████████████████▌                                 | 11914/20117 [7:35:32<5:36:11,  2.46s/it] 59%|████████████████████████████████████████████████▌                                 | 11915/20117 [7:35:35<5:37:28,  2.47s/it] 59%|████████████████████████████████████████████████▌                                 | 11916/20117 [7:35:37<5:37:45,  2.47s/it] 59%|████████████████████████████████████████████████▌                                 | 11917/20117 [7:35:40<5:38:10,  2.47s/it] 59%|████████████████████████████████████████████████▌                                 | 11918/20117 [7:35:42<5:38:56,  2.48s/it] 59%|████████████████████████████████████████████████▌                                 | 11919/20117 [7:35:45<5:41:14,  2.50s/it] 59%|████████████████████████████████████████████████▌                                 | 11920/20117 [7:35:47<5:43:21,  2.51s/it]                                                                                                                                 {'loss': 0.1479, 'grad_norm': 0.44399407505989075, 'learning_rate': 7.196572620618336e-05, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 295.3, 'epoch': 1.19}
 59%|████████████████████████████████████████████████▌                                 | 11920/20117 [7:35:47<5:43:21,  2.51s/it] 59%|████████████████████████████████████████████████▌                                 | 11921/20117 [7:35:50<5:44:06,  2.52s/it] 59%|████████████████████████████████████████████████▌                                 | 11922/20117 [7:35:52<5:45:13,  2.53s/it] 59%|████████████████████████████████████████████████▌                                 | 11923/20117 [7:35:55<5:44:32,  2.52s/it] 59%|████████████████████████████████████████████████▌                                 | 11924/20117 [7:35:57<5:42:35,  2.51s/it] 59%|████████████████████████████████████████████████▌                                 | 11925/20117 [7:36:00<5:42:06,  2.51s/it] 59%|████████████████████████████████████████████████▌                                 | 11926/20117 [7:36:02<5:40:32,  2.49s/it] 59%|████████████████████████████████████████████████▌                                 | 11927/20117 [7:36:05<5:41:48,  2.50s/it] 59%|████████████████████████████████████████████████▌                                 | 11928/20117 [7:36:07<5:40:04,  2.49s/it] 59%|████████████████████████████████████████████████▌                                 | 11929/20117 [7:36:10<5:41:07,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11930/20117 [7:36:12<5:40:58,  2.50s/it]                                                                                                                                 {'loss': 0.2141, 'grad_norm': 0.5076953172683716, 'learning_rate': 7.181510811553874e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.74, 'epoch': 1.19}
 59%|████████████████████████████████████████████████▋                                 | 11930/20117 [7:36:12<5:40:58,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11931/20117 [7:36:15<5:41:21,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11932/20117 [7:36:17<5:40:39,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11933/20117 [7:36:20<5:43:08,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11934/20117 [7:36:22<5:44:52,  2.53s/it] 59%|████████████████████████████████████████████████▋                                 | 11935/20117 [7:36:25<5:43:58,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11936/20117 [7:36:27<5:45:50,  2.54s/it] 59%|████████████████████████████████████████████████▋                                 | 11937/20117 [7:36:30<5:56:11,  2.61s/it] 59%|████████████████████████████████████████████████▋                                 | 11938/20117 [7:36:33<5:52:11,  2.58s/it] 59%|████████████████████████████████████████████████▋                                 | 11939/20117 [7:36:35<5:49:19,  2.56s/it] 59%|████████████████████████████████████████████████▋                                 | 11940/20117 [7:36:38<5:48:02,  2.55s/it]                                                                                                                                 {'loss': 0.1896, 'grad_norm': 0.39150622487068176, 'learning_rate': 7.166455945023989e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.43, 'epoch': 1.19}
 59%|████████████████████████████████████████████████▋                                 | 11940/20117 [7:36:38<5:48:02,  2.55s/it] 59%|████████████████████████████████████████████████▋                                 | 11941/20117 [7:36:40<5:45:46,  2.54s/it] 59%|████████████████████████████████████████████████▋                                 | 11942/20117 [7:36:43<5:43:35,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11943/20117 [7:36:45<5:42:02,  2.51s/it] 59%|████████████████████████████████████████████████▋                                 | 11944/20117 [7:36:48<5:41:45,  2.51s/it] 59%|████████████████████████████████████████████████▋                                 | 11945/20117 [7:36:50<5:43:46,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11946/20117 [7:36:53<5:42:55,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11947/20117 [7:36:55<5:42:06,  2.51s/it] 59%|████████████████████████████████████████████████▋                                 | 11948/20117 [7:36:58<5:40:07,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11949/20117 [7:37:00<5:42:08,  2.51s/it] 59%|████████████████████████████████████████████████▋                                 | 11950/20117 [7:37:03<5:43:10,  2.52s/it]                                                                                                                                 {'loss': 0.1467, 'grad_norm': 0.305494099855423, 'learning_rate': 7.151408058111991e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 271.68, 'epoch': 1.19}
 59%|████████████████████████████████████████████████▋                                 | 11950/20117 [7:37:03<5:43:10,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11951/20117 [7:37:05<5:43:04,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11952/20117 [7:37:08<5:43:28,  2.52s/it] 59%|████████████████████████████████████████████████▋                                 | 11953/20117 [7:37:10<5:43:38,  2.53s/it] 59%|████████████████████████████████████████████████▋                                 | 11954/20117 [7:37:13<5:40:03,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11955/20117 [7:37:15<5:40:25,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11956/20117 [7:37:18<5:40:39,  2.50s/it] 59%|████████████████████████████████████████████████▋                                 | 11957/20117 [7:37:20<5:37:30,  2.48s/it] 59%|████████████████████████████████████████████████▋                                 | 11958/20117 [7:37:23<5:38:49,  2.49s/it] 59%|████████████████████████████████████████████████▋                                 | 11959/20117 [7:37:25<5:40:45,  2.51s/it] 59%|████████████████████████████████████████████████▊                                 | 11960/20117 [7:37:28<5:40:46,  2.51s/it]                                                                                                                                 {'loss': 0.1725, 'grad_norm': 0.5669624209403992, 'learning_rate': 7.136367187884014e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 305.75, 'epoch': 1.19}
 59%|████████████████████████████████████████████████▊                                 | 11960/20117 [7:37:28<5:40:46,  2.51s/it] 59%|████████████████████████████████████████████████▊                                 | 11961/20117 [7:37:30<5:41:31,  2.51s/it] 59%|████████████████████████████████████████████████▊                                 | 11962/20117 [7:37:33<5:39:23,  2.50s/it] 59%|████████████████████████████████████████████████▊                                 | 11963/20117 [7:37:35<5:41:03,  2.51s/it] 59%|████████████████████████████████████████████████▊                                 | 11964/20117 [7:37:38<5:42:13,  2.52s/it] 59%|████████████████████████████████████████████████▊                                 | 11965/20117 [7:37:40<5:41:47,  2.52s/it] 59%|████████████████████████████████████████████████▊                                 | 11966/20117 [7:37:43<5:41:34,  2.51s/it] 59%|████████████████████████████████████████████████▊                                 | 11967/20117 [7:37:45<5:40:42,  2.51s/it] 59%|████████████████████████████████████████████████▊                                 | 11968/20117 [7:37:48<5:42:48,  2.52s/it] 59%|████████████████████████████████████████████████▊                                 | 11969/20117 [7:37:51<5:41:52,  2.52s/it] 60%|████████████████████████████████████████████████▊                                 | 11970/20117 [7:37:53<5:42:43,  2.52s/it]                                                                                                                                 {'loss': 0.164, 'grad_norm': 0.7032368183135986, 'learning_rate': 7.121333371388889e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 247.59, 'epoch': 1.19}
 60%|████████████████████████████████████████████████▊                                 | 11970/20117 [7:37:53<5:42:43,  2.52s/it] 60%|████████████████████████████████████████████████▊                                 | 11971/20117 [7:37:56<5:43:15,  2.53s/it] 60%|████████████████████████████████████████████████▊                                 | 11972/20117 [7:37:58<5:44:06,  2.53s/it] 60%|████████████████████████████████████████████████▊                                 | 11973/20117 [7:38:01<5:41:08,  2.51s/it] 60%|████████████████████████████████████████████████▊                                 | 11974/20117 [7:38:03<5:36:30,  2.48s/it] 60%|████████████████████████████████████████████████▊                                 | 11975/20117 [7:38:05<5:31:43,  2.44s/it] 60%|████████████████████████████████████████████████▊                                 | 11976/20117 [7:38:08<5:29:45,  2.43s/it] 60%|████████████████████████████████████████████████▊                                 | 11977/20117 [7:38:10<5:36:23,  2.48s/it] 60%|████████████████████████████████████████████████▊                                 | 11978/20117 [7:38:13<5:38:08,  2.49s/it] 60%|████████████████████████████████████████████████▊                                 | 11979/20117 [7:38:15<5:37:18,  2.49s/it] 60%|████████████████████████████████████████████████▊                                 | 11980/20117 [7:38:18<5:33:29,  2.46s/it]                                                                                                                                 {'loss': 0.1808, 'grad_norm': 0.5729434490203857, 'learning_rate': 7.106306645658095e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.52, 'epoch': 1.19}
 60%|████████████████████████████████████████████████▊                                 | 11980/20117 [7:38:18<5:33:29,  2.46s/it] 60%|████████████████████████████████████████████████▊                                 | 11981/20117 [7:38:20<5:34:05,  2.46s/it] 60%|████████████████████████████████████████████████▊                                 | 11982/20117 [7:38:23<5:32:41,  2.45s/it] 60%|████████████████████████████████████████████████▊                                 | 11983/20117 [7:38:25<5:33:06,  2.46s/it] 60%|████████████████████████████████████████████████▊                                 | 11984/20117 [7:38:28<5:32:15,  2.45s/it] 60%|████████████████████████████████████████████████▊                                 | 11985/20117 [7:38:30<5:30:59,  2.44s/it] 60%|████████████████████████████████████████████████▊                                 | 11986/20117 [7:38:32<5:30:03,  2.44s/it] 60%|████████████████████████████████████████████████▊                                 | 11987/20117 [7:38:35<5:26:31,  2.41s/it] 60%|████████████████████████████████████████████████▊                                 | 11988/20117 [7:38:37<5:24:05,  2.39s/it] 60%|████████████████████████████████████████████████▊                                 | 11989/20117 [7:38:40<5:45:03,  2.55s/it] 60%|████████████████████████████████████████████████▊                                 | 11990/20117 [7:38:43<5:44:20,  2.54s/it]                                                                                                                                 {'loss': 0.1459, 'grad_norm': 0.6809588074684143, 'learning_rate': 7.091287047705626e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.28, 'epoch': 1.19}
 60%|████████████████████████████████████████████████▊                                 | 11990/20117 [7:38:43<5:44:20,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 11991/20117 [7:38:45<5:48:37,  2.57s/it] 60%|████████████████████████████████████████████████▉                                 | 11992/20117 [7:38:48<6:06:35,  2.71s/it] 60%|████████████████████████████████████████████████▉                                 | 11993/20117 [7:38:51<6:01:20,  2.67s/it] 60%|████████████████████████████████████████████████▉                                 | 11994/20117 [7:38:54<6:08:16,  2.72s/it] 60%|████████████████████████████████████████████████▉                                 | 11995/20117 [7:38:56<5:58:49,  2.65s/it] 60%|████████████████████████████████████████████████▉                                 | 11996/20117 [7:38:59<5:50:47,  2.59s/it] 60%|████████████████████████████████████████████████▉                                 | 11997/20117 [7:39:01<5:48:14,  2.57s/it] 60%|████████████████████████████████████████████████▉                                 | 11998/20117 [7:39:04<5:46:10,  2.56s/it] 60%|████████████████████████████████████████████████▉                                 | 11999/20117 [7:39:06<5:43:23,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 12000/20117 [7:39:09<5:43:42,  2.54s/it]                                                                                                                                 {'loss': 0.1697, 'grad_norm': 0.34328803420066833, 'learning_rate': 7.076274614527934e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.71, 'epoch': 1.19}
 60%|████████████████████████████████████████████████▉                                 | 12000/20117 [7:39:09<5:43:42,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 12001/20117 [7:39:11<5:42:15,  2.53s/it] 60%|████████████████████████████████████████████████▉                                 | 12002/20117 [7:39:14<5:39:39,  2.51s/it] 60%|████████████████████████████████████████████████▉                                 | 12003/20117 [7:39:16<5:42:58,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 12004/20117 [7:39:19<5:42:15,  2.53s/it] 60%|████████████████████████████████████████████████▉                                 | 12005/20117 [7:39:21<5:39:21,  2.51s/it] 60%|████████████████████████████████████████████████▉                                 | 12006/20117 [7:39:24<5:41:17,  2.52s/it] 60%|████████████████████████████████████████████████▉                                 | 12007/20117 [7:39:26<5:42:15,  2.53s/it] 60%|████████████████████████████████████████████████▉                                 | 12008/20117 [7:39:29<5:43:38,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 12009/20117 [7:39:31<5:41:19,  2.53s/it] 60%|████████████████████████████████████████████████▉                                 | 12010/20117 [7:39:34<5:38:52,  2.51s/it]                                                                                                                                 {'loss': 0.1699, 'grad_norm': 0.7516032457351685, 'learning_rate': 7.061269383103804e-05, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.16, 'epoch': 1.19}
 60%|████████████████████████████████████████████████▉                                 | 12010/20117 [7:39:34<5:38:52,  2.51s/it] 60%|████████████████████████████████████████████████▉                                 | 12011/20117 [7:39:36<5:41:12,  2.53s/it] 60%|████████████████████████████████████████████████▉                                 | 12012/20117 [7:39:39<5:43:29,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 12013/20117 [7:39:42<5:42:44,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 12014/20117 [7:39:44<5:44:00,  2.55s/it] 60%|████████████████████████████████████████████████▉                                 | 12015/20117 [7:39:47<5:45:59,  2.56s/it] 60%|████████████████████████████████████████████████▉                                 | 12016/20117 [7:39:49<5:46:59,  2.57s/it] 60%|████████████████████████████████████████████████▉                                 | 12017/20117 [7:39:52<5:45:52,  2.56s/it] 60%|████████████████████████████████████████████████▉                                 | 12018/20117 [7:39:54<5:44:21,  2.55s/it] 60%|████████████████████████████████████████████████▉                                 | 12019/20117 [7:39:57<5:42:13,  2.54s/it] 60%|████████████████████████████████████████████████▉                                 | 12020/20117 [7:39:59<5:41:54,  2.53s/it]                                                                                                                                 {'loss': 0.1585, 'grad_norm': 0.4296916425228119, 'learning_rate': 7.046271390394303e-05, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.62, 'epoch': 1.19}
 60%|████████████████████████████████████████████████▉                                 | 12020/20117 [7:39:59<5:41:54,  2.53s/it] 60%|████████████████████████████████████████████████▉                                 | 12021/20117 [7:40:02<5:39:33,  2.52s/it] 60%|█████████████████████████████████████████████████                                 | 12022/20117 [7:40:04<5:38:00,  2.51s/it] 60%|█████████████████████████████████████████████████                                 | 12023/20117 [7:40:07<5:36:48,  2.50s/it] 60%|█████████████████████████████████████████████████                                 | 12024/20117 [7:40:09<5:36:46,  2.50s/it] 60%|█████████████████████████████████████████████████                                 | 12025/20117 [7:40:12<5:42:11,  2.54s/it] 60%|█████████████████████████████████████████████████                                 | 12026/20117 [7:40:14<5:42:24,  2.54s/it] 60%|█████████████████████████████████████████████████                                 | 12027/20117 [7:40:17<5:42:26,  2.54s/it] 60%|█████████████████████████████████████████████████                                 | 12028/20117 [7:40:20<5:45:12,  2.56s/it] 60%|█████████████████████████████████████████████████                                 | 12029/20117 [7:40:22<5:45:53,  2.57s/it] 60%|█████████████████████████████████████████████████                                 | 12030/20117 [7:40:25<5:41:44,  2.54s/it]                                                                                                                                 {'loss': 0.1239, 'grad_norm': 0.2905307412147522, 'learning_rate': 7.031280673342648e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.39, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████                                 | 12030/20117 [7:40:25<5:41:44,  2.54s/it] 60%|█████████████████████████████████████████████████                                 | 12031/20117 [7:40:27<5:40:43,  2.53s/it] 60%|█████████████████████████████████████████████████                                 | 12032/20117 [7:40:30<5:38:45,  2.51s/it] 60%|█████████████████████████████████████████████████                                 | 12033/20117 [7:40:32<5:37:15,  2.50s/it] 60%|█████████████████████████████████████████████████                                 | 12034/20117 [7:40:35<5:39:39,  2.52s/it] 60%|█████████████████████████████████████████████████                                 | 12035/20117 [7:40:37<5:39:36,  2.52s/it] 60%|█████████████████████████████████████████████████                                 | 12036/20117 [7:40:40<5:39:41,  2.52s/it] 60%|█████████████████████████████████████████████████                                 | 12037/20117 [7:40:42<5:39:39,  2.52s/it] 60%|█████████████████████████████████████████████████                                 | 12038/20117 [7:40:45<5:41:12,  2.53s/it] 60%|█████████████████████████████████████████████████                                 | 12039/20117 [7:40:47<5:44:04,  2.56s/it] 60%|█████████████████████████████████████████████████                                 | 12040/20117 [7:40:50<5:44:44,  2.56s/it]                                                                                                                                 {'loss': 0.1541, 'grad_norm': 0.2632843554019928, 'learning_rate': 7.016297268874152e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.37, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████                                 | 12040/20117 [7:40:50<5:44:44,  2.56s/it] 60%|█████████████████████████████████████████████████                                 | 12041/20117 [7:40:53<5:44:06,  2.56s/it] 60%|█████████████████████████████████████████████████                                 | 12042/20117 [7:40:55<5:43:10,  2.55s/it] 60%|█████████████████████████████████████████████████                                 | 12043/20117 [7:40:58<5:58:34,  2.66s/it] 60%|█████████████████████████████████████████████████                                 | 12044/20117 [7:41:00<5:50:57,  2.61s/it] 60%|█████████████████████████████████████████████████                                 | 12045/20117 [7:41:03<5:47:39,  2.58s/it] 60%|█████████████████████████████████████████████████                                 | 12046/20117 [7:41:06<5:47:16,  2.58s/it] 60%|█████████████████████████████████████████████████                                 | 12047/20117 [7:41:08<5:45:37,  2.57s/it] 60%|█████████████████████████████████████████████████                                 | 12048/20117 [7:41:11<5:46:17,  2.57s/it] 60%|█████████████████████████████████████████████████                                 | 12049/20117 [7:41:13<5:40:06,  2.53s/it] 60%|█████████████████████████████████████████████████                                 | 12050/20117 [7:41:15<5:31:35,  2.47s/it]                                                                                                                                 {'loss': 0.125, 'grad_norm': 0.41677621006965637, 'learning_rate': 7.001321213896099e-05, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.01, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████                                 | 12050/20117 [7:41:15<5:31:35,  2.47s/it] 60%|█████████████████████████████████████████████████                                 | 12051/20117 [7:41:18<5:30:13,  2.46s/it] 60%|█████████████████████████████████████████████████▏                                | 12052/20117 [7:41:20<5:32:08,  2.47s/it] 60%|█████████████████████████████████████████████████▏                                | 12053/20117 [7:41:23<5:34:00,  2.49s/it] 60%|█████████████████████████████████████████████████▏                                | 12054/20117 [7:41:25<5:34:57,  2.49s/it] 60%|█████████████████████████████████████████████████▏                                | 12055/20117 [7:41:28<5:32:17,  2.47s/it] 60%|█████████████████████████████████████████████████▏                                | 12056/20117 [7:41:30<5:31:53,  2.47s/it] 60%|█████████████████████████████████████████████████▏                                | 12057/20117 [7:41:33<5:31:25,  2.47s/it] 60%|█████████████████████████████████████████████████▏                                | 12058/20117 [7:41:35<5:29:47,  2.46s/it] 60%|█████████████████████████████████████████████████▏                                | 12059/20117 [7:41:38<5:29:25,  2.45s/it] 60%|█████████████████████████████████████████████████▏                                | 12060/20117 [7:41:40<5:29:08,  2.45s/it]                                                                                                                                 {'loss': 0.1617, 'grad_norm': 0.6199091672897339, 'learning_rate': 6.98635254529768e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.05, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████▏                                | 12060/20117 [7:41:40<5:29:08,  2.45s/it] 60%|█████████████████████████████████████████████████▏                                | 12061/20117 [7:41:42<5:25:03,  2.42s/it] 60%|█████████████████████████████████████████████████▏                                | 12062/20117 [7:41:45<5:23:52,  2.41s/it] 60%|█████████████████████████████████████████████████▏                                | 12063/20117 [7:41:47<5:27:55,  2.44s/it] 60%|█████████████████████████████████████████████████▏                                | 12064/20117 [7:41:50<5:31:52,  2.47s/it] 60%|█████████████████████████████████████████████████▏                                | 12065/20117 [7:41:52<5:33:30,  2.49s/it] 60%|█████████████████████████████████████████████████▏                                | 12066/20117 [7:41:55<5:35:08,  2.50s/it] 60%|█████████████████████████████████████████████████▏                                | 12067/20117 [7:41:57<5:33:50,  2.49s/it] 60%|█████████████████████████████████████████████████▏                                | 12068/20117 [7:42:00<5:35:01,  2.50s/it] 60%|█████████████████████████████████████████████████▏                                | 12069/20117 [7:42:02<5:32:22,  2.48s/it] 60%|█████████████████████████████████████████████████▏                                | 12070/20117 [7:42:05<5:33:10,  2.48s/it]                                                                                                                                 {'loss': 0.1585, 'grad_norm': 0.6814326643943787, 'learning_rate': 6.971391299949895e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 285.66, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████▏                                | 12070/20117 [7:42:05<5:33:10,  2.48s/it] 60%|█████████████████████████████████████████████████▏                                | 12071/20117 [7:42:07<5:34:30,  2.49s/it] 60%|█████████████████████████████████████████████████▏                                | 12072/20117 [7:42:10<5:34:24,  2.49s/it] 60%|█████████████████████████████████████████████████▏                                | 12073/20117 [7:42:12<5:36:23,  2.51s/it] 60%|█████████████████████████████████████████████████▏                                | 12074/20117 [7:42:15<5:38:38,  2.53s/it] 60%|█████████████████████████████████████████████████▏                                | 12075/20117 [7:42:18<5:40:42,  2.54s/it] 60%|█████████████████████████████████████████████████▏                                | 12076/20117 [7:42:20<5:40:28,  2.54s/it] 60%|█████████████████████████████████████████████████▏                                | 12077/20117 [7:42:23<5:39:03,  2.53s/it] 60%|█████████████████████████████████████████████████▏                                | 12078/20117 [7:42:25<5:38:00,  2.52s/it] 60%|█████████████████████████████████████████████████▏                                | 12079/20117 [7:42:28<5:36:51,  2.51s/it] 60%|█████████████████████████████████████████████████▏                                | 12080/20117 [7:42:30<5:37:08,  2.52s/it]                                                                                                                                 {'loss': 0.2202, 'grad_norm': 0.5412986874580383, 'learning_rate': 6.956437514705447e-05, 'memory/max_active (GiB)': 18.07, 'memory/max_allocated (GiB)': 18.07, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.0, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████▏                                | 12080/20117 [7:42:30<5:37:08,  2.52s/it] 60%|█████████████████████████████████████████████████▏                                | 12081/20117 [7:42:33<5:35:12,  2.50s/it] 60%|█████████████████████████████████████████████████▏                                | 12082/20117 [7:42:35<5:36:43,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12083/20117 [7:42:38<5:37:22,  2.52s/it] 60%|█████████████████████████████████████████████████▎                                | 12084/20117 [7:42:40<5:38:08,  2.53s/it] 60%|█████████████████████████████████████████████████▎                                | 12085/20117 [7:42:43<5:39:33,  2.54s/it] 60%|█████████████████████████████████████████████████▎                                | 12086/20117 [7:42:45<5:36:49,  2.52s/it] 60%|█████████████████████████████████████████████████▎                                | 12087/20117 [7:42:48<5:36:26,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12088/20117 [7:42:50<5:34:32,  2.50s/it] 60%|█████████████████████████████████████████████████▎                                | 12089/20117 [7:42:53<5:32:57,  2.49s/it] 60%|█████████████████████████████████████████████████▎                                | 12090/20117 [7:42:55<5:33:10,  2.49s/it]                                                                                                                                 {'loss': 0.1899, 'grad_norm': 0.6325846314430237, 'learning_rate': 6.941491226398675e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.08, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████▎                                | 12090/20117 [7:42:55<5:33:10,  2.49s/it] 60%|█████████████████████████████████████████████████▎                                | 12091/20117 [7:42:58<5:36:17,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12092/20117 [7:43:00<5:34:44,  2.50s/it] 60%|█████████████████████████████████████████████████▎                                | 12093/20117 [7:43:03<5:36:54,  2.52s/it] 60%|█████████████████████████████████████████████████▎                                | 12094/20117 [7:43:05<5:38:15,  2.53s/it] 60%|█████████████████████████████████████████████████▎                                | 12095/20117 [7:43:08<5:35:08,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12096/20117 [7:43:10<5:34:56,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12097/20117 [7:43:13<5:37:56,  2.53s/it] 60%|█████████████████████████████████████████████████▎                                | 12098/20117 [7:43:16<5:55:07,  2.66s/it] 60%|█████████████████████████████████████████████████▎                                | 12099/20117 [7:43:18<5:50:07,  2.62s/it] 60%|█████████████████████████████████████████████████▎                                | 12100/20117 [7:43:21<5:46:00,  2.59s/it]                                                                                                                                 {'loss': 0.1916, 'grad_norm': 0.5769904255867004, 'learning_rate': 6.926552471845439e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.54, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████▎                                | 12100/20117 [7:43:21<5:46:00,  2.59s/it] 60%|█████████████████████████████████████████████████▎                                | 12101/20117 [7:43:23<5:43:40,  2.57s/it] 60%|█████████████████████████████████████████████████▎                                | 12102/20117 [7:43:26<5:41:45,  2.56s/it] 60%|█████████████████████████████████████████████████▎                                | 12103/20117 [7:43:28<5:39:44,  2.54s/it] 60%|█████████████████████████████████████████████████▎                                | 12104/20117 [7:43:31<5:37:39,  2.53s/it] 60%|█████████████████████████████████████████████████▎                                | 12105/20117 [7:43:33<5:36:41,  2.52s/it] 60%|█████████████████████████████████████████████████▎                                | 12106/20117 [7:43:36<5:35:42,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12107/20117 [7:43:38<5:36:58,  2.52s/it] 60%|█████████████████████████████████████████████████▎                                | 12108/20117 [7:43:41<5:36:52,  2.52s/it] 60%|█████████████████████████████████████████████████▎                                | 12109/20117 [7:43:43<5:32:18,  2.49s/it] 60%|█████████████████████████████████████████████████▎                                | 12110/20117 [7:43:46<5:35:18,  2.51s/it]                                                                                                                                 {'loss': 0.1523, 'grad_norm': 0.5062241554260254, 'learning_rate': 6.911621287843058e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.78, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████▎                                | 12110/20117 [7:43:46<5:35:18,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12111/20117 [7:43:49<5:35:25,  2.51s/it] 60%|█████████████████████████████████████████████████▎                                | 12112/20117 [7:43:51<5:35:33,  2.52s/it] 60%|█████████████████████████████████████████████████▎                                | 12113/20117 [7:43:54<5:35:27,  2.51s/it] 60%|█████████████████████████████████████████████████▍                                | 12114/20117 [7:43:56<5:35:16,  2.51s/it] 60%|█████████████████████████████████████████████████▍                                | 12115/20117 [7:43:59<5:35:58,  2.52s/it] 60%|█████████████████████████████████████████████████▍                                | 12116/20117 [7:44:01<5:35:37,  2.52s/it] 60%|█████████████████████████████████████████████████▍                                | 12117/20117 [7:44:04<5:35:27,  2.52s/it] 60%|█████████████████████████████████████████████████▍                                | 12118/20117 [7:44:06<5:34:49,  2.51s/it] 60%|█████████████████████████████████████████████████▍                                | 12119/20117 [7:44:09<5:34:38,  2.51s/it] 60%|█████████████████████████████████████████████████▍                                | 12120/20117 [7:44:11<5:34:20,  2.51s/it]                                                                                                                                 {'loss': 0.1967, 'grad_norm': 0.6168234348297119, 'learning_rate': 6.896697711170183e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.37, 'epoch': 1.2}
 60%|█████████████████████████████████████████████████▍                                | 12120/20117 [7:44:11<5:34:20,  2.51s/it] 60%|█████████████████████████████████████████████████▍                                | 12121/20117 [7:44:14<5:37:16,  2.53s/it] 60%|█████████████████████████████████████████████████▍                                | 12122/20117 [7:44:16<5:36:15,  2.52s/it] 60%|█████████████████████████████████████████████████▍                                | 12123/20117 [7:44:19<5:34:09,  2.51s/it] 60%|█████████████████████████████████████████████████▍                                | 12124/20117 [7:44:21<5:34:22,  2.51s/it] 60%|█████████████████████████████████████████████████▍                                | 12125/20117 [7:44:24<5:33:33,  2.50s/it] 60%|█████████████████████████████████████████████████▍                                | 12126/20117 [7:44:26<5:28:44,  2.47s/it] 60%|█████████████████████████████████████████████████▍                                | 12127/20117 [7:44:28<5:23:10,  2.43s/it] 60%|█████████████████████████████████████████████████▍                                | 12128/20117 [7:44:31<5:23:09,  2.43s/it] 60%|█████████████████████████████████████████████████▍                                | 12129/20117 [7:44:33<5:27:19,  2.46s/it] 60%|█████████████████████████████████████████████████▍                                | 12130/20117 [7:44:36<5:31:00,  2.49s/it]                                                                                                                                 {'loss': 0.1494, 'grad_norm': 0.3159475326538086, 'learning_rate': 6.881781778586745e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.84, 'epoch': 1.21}
 60%|█████████████████████████████████████████████████▍                                | 12130/20117 [7:44:36<5:31:00,  2.49s/it] 60%|█████████████████████████████████████████████████▍                                | 12131/20117 [7:44:38<5:29:55,  2.48s/it] 60%|█████████████████████████████████████████████████▍                                | 12132/20117 [7:44:41<5:25:28,  2.45s/it] 60%|█████████████████████████████████████████████████▍                                | 12133/20117 [7:44:43<5:25:30,  2.45s/it] 60%|█████████████████████████████████████████████████▍                                | 12134/20117 [7:44:46<5:25:28,  2.45s/it] 60%|█████████████████████████████████████████████████▍                                | 12135/20117 [7:44:48<5:27:20,  2.46s/it] 60%|█████████████████████████████████████████████████▍                                | 12136/20117 [7:44:51<5:27:09,  2.46s/it] 60%|█████████████████████████████████████████████████▍                                | 12137/20117 [7:44:53<5:27:53,  2.47s/it] 60%|█████████████████████████████████████████████████▍                                | 12138/20117 [7:44:56<5:28:42,  2.47s/it] 60%|█████████████████████████████████████████████████▍                                | 12139/20117 [7:44:58<5:23:50,  2.44s/it] 60%|█████████████████████████████████████████████████▍                                | 12140/20117 [7:45:00<5:24:27,  2.44s/it]                                                                                                                                 {'loss': 0.1419, 'grad_norm': 0.697409451007843, 'learning_rate': 6.866873526833838e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 286.26, 'epoch': 1.21}
 60%|█████████████████████████████████████████████████▍                                | 12140/20117 [7:45:00<5:24:27,  2.44s/it] 60%|█████████████████████████████████████████████████▍                                | 12141/20117 [7:45:03<5:29:11,  2.48s/it] 60%|█████████████████████████████████████████████████▍                                | 12142/20117 [7:45:05<5:32:07,  2.50s/it] 60%|█████████████████████████████████████████████████▍                                | 12143/20117 [7:45:08<5:35:01,  2.52s/it] 60%|█████████████████████████████████████████████████▌                                | 12144/20117 [7:45:11<5:37:14,  2.54s/it] 60%|█████████████████████████████████████████████████▌                                | 12145/20117 [7:45:13<5:36:26,  2.53s/it] 60%|█████████████████████████████████████████████████▌                                | 12146/20117 [7:45:16<5:33:27,  2.51s/it] 60%|█████████████████████████████████████████████████▌                                | 12147/20117 [7:45:18<5:32:39,  2.50s/it] 60%|█████████████████████████████████████████████████▌                                | 12148/20117 [7:45:21<5:33:41,  2.51s/it] 60%|█████████████████████████████████████████████████▌                                | 12149/20117 [7:45:23<5:35:20,  2.53s/it] 60%|█████████████████████████████████████████████████▌                                | 12150/20117 [7:45:26<5:51:45,  2.65s/it]                                                                                                                                 {'loss': 0.1646, 'grad_norm': 0.4400809407234192, 'learning_rate': 6.851972992633636e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 276.17, 'epoch': 1.21}
 60%|█████████████████████████████████████████████████▌                                | 12150/20117 [7:45:26<5:51:45,  2.65s/it] 60%|█████████████████████████████████████████████████▌                                | 12151/20117 [7:45:29<5:46:31,  2.61s/it] 60%|█████████████████████████████████████████████████▌                                | 12152/20117 [7:45:31<5:43:30,  2.59s/it] 60%|█████████████████████████████████████████████████▌                                | 12153/20117 [7:45:34<5:43:32,  2.59s/it] 60%|█████████████████████████████████████████████████▌                                | 12154/20117 [7:45:36<5:40:39,  2.57s/it] 60%|█████████████████████████████████████████████████▌                                | 12155/20117 [7:45:39<5:37:59,  2.55s/it] 60%|█████████████████████████████████████████████████▌                                | 12156/20117 [7:45:41<5:37:57,  2.55s/it] 60%|█████████████████████████████████████████████████▌                                | 12157/20117 [7:45:44<5:38:25,  2.55s/it] 60%|█████████████████████████████████████████████████▌                                | 12158/20117 [7:45:46<5:35:46,  2.53s/it] 60%|█████████████████████████████████████████████████▌                                | 12159/20117 [7:45:49<5:33:29,  2.51s/it] 60%|█████████████████████████████████████████████████▌                                | 12160/20117 [7:45:51<5:32:32,  2.51s/it]                                                                                                                                 {'loss': 0.1717, 'grad_norm': 0.4484960436820984, 'learning_rate': 6.837080212689302e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 275.23, 'epoch': 1.21}
 60%|█████████████████████████████████████████████████▌                                | 12160/20117 [7:45:51<5:32:32,  2.51s/it] 60%|█████████████████████████████████████████████████▌                                | 12161/20117 [7:45:54<5:34:36,  2.52s/it] 60%|█████████████████████████████████████████████████▌                                | 12162/20117 [7:45:56<5:36:20,  2.54s/it] 60%|█████████████████████████████████████████████████▌                                | 12163/20117 [7:45:59<5:37:24,  2.55s/it] 60%|█████████████████████████████████████████████████▌                                | 12164/20117 [7:46:02<5:38:00,  2.55s/it] 60%|█████████████████████████████████████████████████▌                                | 12165/20117 [7:46:04<5:38:36,  2.55s/it] 60%|█████████████████████████████████████████████████▌                                | 12166/20117 [7:46:07<5:36:45,  2.54s/it] 60%|█████████████████████████████████████████████████▌                                | 12167/20117 [7:46:09<5:37:57,  2.55s/it] 60%|█████████████████████████████████████████████████▌                                | 12168/20117 [7:46:12<5:38:47,  2.56s/it] 60%|█████████████████████████████████████████████████▌                                | 12169/20117 [7:46:14<5:38:27,  2.56s/it] 60%|█████████████████████████████████████████████████▌                                | 12170/20117 [7:46:17<5:37:56,  2.55s/it]                                                                                                                                 {'loss': 0.1457, 'grad_norm': 0.24361641705036163, 'learning_rate': 6.822195223684906e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.09, 'epoch': 1.21}
 60%|█████████████████████████████████████████████████▌                                | 12170/20117 [7:46:17<5:37:56,  2.55s/it] 61%|█████████████████████████████████████████████████▌                                | 12171/20117 [7:46:19<5:33:26,  2.52s/it] 61%|█████████████████████████████████████████████████▌                                | 12172/20117 [7:46:22<5:34:31,  2.53s/it] 61%|█████████████████████████████████████████████████▌                                | 12173/20117 [7:46:24<5:34:55,  2.53s/it] 61%|█████████████████████████████████████████████████▌                                | 12174/20117 [7:46:27<5:34:18,  2.53s/it] 61%|█████████████████████████████████████████████████▋                                | 12175/20117 [7:46:29<5:33:12,  2.52s/it] 61%|█████████████████████████████████████████████████▋                                | 12176/20117 [7:46:32<5:31:50,  2.51s/it] 61%|█████████████████████████████████████████████████▋                                | 12177/20117 [7:46:34<5:32:49,  2.52s/it] 61%|█████████████████████████████████████████████████▋                                | 12178/20117 [7:46:37<5:32:43,  2.51s/it] 61%|█████████████████████████████████████████████████▋                                | 12179/20117 [7:46:40<5:33:44,  2.52s/it] 61%|█████████████████████████████████████████████████▋                                | 12180/20117 [7:46:42<5:32:14,  2.51s/it]                                                                                                                                 {'loss': 0.1669, 'grad_norm': 0.4752653241157532, 'learning_rate': 6.807318062285314e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.15, 'epoch': 1.21}
 61%|█████████████████████████████████████████████████▋                                | 12180/20117 [7:46:42<5:32:14,  2.51s/it] 61%|█████████████████████████████████████████████████▋                                | 12181/20117 [7:46:45<5:34:59,  2.53s/it] 61%|█████████████████████████████████████████████████▋                                | 12182/20117 [7:46:47<5:34:45,  2.53s/it] 61%|█████████████████████████████████████████████████▋                                | 12183/20117 [7:46:50<5:39:08,  2.56s/it] 61%|█████████████████████████████████████████████████▋                                | 12184/20117 [7:46:52<5:40:26,  2.57s/it] 61%|█████████████████████████████████████████████████▋                                | 12185/20117 [7:46:55<5:35:44,  2.54s/it] 61%|█████████████████████████████████████████████████▋                                | 12186/20117 [7:46:57<5:35:03,  2.53s/it] 61%|█████████████████████████████████████████████████▋                                | 12187/20117 [7:47:00<5:34:10,  2.53s/it] 61%|█████████████████████████████████████████████████▋                                | 12188/20117 [7:47:02<5:32:57,  2.52s/it] 61%|█████████████████████████████████████████████████▋                                | 12189/20117 [7:47:05<5:33:22,  2.52s/it] 61%|█████████████████████████████████████████████████▋                                | 12190/20117 [7:47:07<5:32:57,  2.52s/it]                                                                                                                                 {'loss': 0.1572, 'grad_norm': 0.437656968832016, 'learning_rate': 6.792448765136124e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 283.06, 'epoch': 1.21}
 61%|█████████████████████████████████████████████████▋                                | 12190/20117 [7:47:07<5:32:57,  2.52s/it] 61%|█████████████████████████████████████████████████▋                                | 12191/20117 [7:47:10<5:33:33,  2.53s/it] 61%|█████████████████████████████████████████████████▋                                | 12192/20117 [7:47:12<5:34:58,  2.54s/it] 61%|█████████████████████████████████████████████████▋                                | 12193/20117 [7:47:15<5:35:30,  2.54s/it] 61%|█████████████████████████████████████████████████▋                                | 12194/20117 [7:47:18<5:34:47,  2.54s/it] 61%|█████████████████████████████████████████████████▋                                | 12195/20117 [7:47:20<5:35:49,  2.54s/it] 61%|█████████████████████████████████████████████████▋                                | 12196/20117 [7:47:23<5:32:45,  2.52s/it] 61%|█████████████████████████████████████████████████▋                                | 12197/20117 [7:47:25<5:25:12,  2.46s/it] 61%|█████████████████████████████████████████████████▋                                | 12198/20117 [7:47:27<5:24:29,  2.46s/it] 61%|█████████████████████████████████████████████████▋                                | 12199/20117 [7:47:30<5:25:05,  2.46s/it] 61%|█████████████████████████████████████████████████▋                                | 12200/20117 [7:47:32<5:15:11,  2.39s/it]                                                                                                                                 {'loss': 0.1974, 'grad_norm': 0.8095329999923706, 'learning_rate': 6.777587368863558e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.42, 'epoch': 1.21}
 61%|█████████████████████████████████████████████████▋                                | 12200/20117 [7:47:32<5:15:11,  2.39s/it] 61%|█████████████████████████████████████████████████▋                                | 12201/20117 [7:47:35<5:21:24,  2.44s/it] 61%|█████████████████████████████████████████████████▋                                | 12202/20117 [7:47:37<5:23:05,  2.45s/it] 61%|█████████████████████████████████████████████████▋                                | 12203/20117 [7:47:40<5:25:12,  2.47s/it] 61%|█████████████████████████████████████████████████▋                                | 12204/20117 [7:47:42<5:26:30,  2.48s/it] 61%|█████████████████████████████████████████████████▋                                | 12205/20117 [7:47:44<5:21:34,  2.44s/it] 61%|█████████████████████████████████████████████████▊                                | 12206/20117 [7:47:47<5:24:16,  2.46s/it] 61%|█████████████████████████████████████████████████▊                                | 12207/20117 [7:47:49<5:23:43,  2.46s/it] 61%|█████████████████████████████████████████████████▊                                | 12208/20117 [7:47:52<5:24:28,  2.46s/it] 61%|█████████████████████████████████████████████████▊                                | 12209/20117 [7:47:54<5:23:59,  2.46s/it] 61%|█████████████████████████████████████████████████▊                                | 12210/20117 [7:47:57<5:25:37,  2.47s/it]                                                                                                                                 {'loss': 0.1423, 'grad_norm': 0.48734039068222046, 'learning_rate': 6.762733910074372e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 289.48, 'epoch': 1.21}
 61%|█████████████████████████████████████████████████▊                                | 12210/20117 [7:47:57<5:25:37,  2.47s/it] 61%|█████████████████████████████████████████████████▊                                | 12211/20117 [7:47:59<5:21:38,  2.44s/it] 61%|█████████████████████████████████████████████████▊                                | 12212/20117 [7:48:02<5:18:42,  2.42s/it] 61%|█████████████████████████████████████████████████▊                                | 12213/20117 [7:48:04<5:20:08,  2.43s/it] 61%|█████████████████████████████████████████████████▊                                | 12214/20117 [7:48:07<5:23:55,  2.46s/it] 61%|█████████████████████████████████████████████████▊                                | 12215/20117 [7:48:09<5:25:32,  2.47s/it] 61%|█████████████████████████████████████████████████▊                                | 12216/20117 [7:48:12<5:28:00,  2.49s/it] 61%|█████████████████████████████████████████████████▊                                | 12217/20117 [7:48:14<5:44:08,  2.61s/it] 61%|█████████████████████████████████████████████████▊                                | 12218/20117 [7:48:17<5:49:20,  2.65s/it] 61%|█████████████████████████████████████████████████▊                                | 12219/20117 [7:48:20<5:43:09,  2.61s/it] 61%|█████████████████████████████████████████████████▊                                | 12220/20117 [7:48:22<5:38:40,  2.57s/it]                                                                                                                                 {'loss': 0.1729, 'grad_norm': 0.6078053712844849, 'learning_rate': 6.747888425355783e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.08, 'epoch': 1.21}
 61%|█████████████████████████████████████████████████▊                                | 12220/20117 [7:48:22<5:38:40,  2.57s/it] 61%|█████████████████████████████████████████████████▊                                | 12221/20117 [7:48:25<5:35:32,  2.55s/it] 61%|█████████████████████████████████████████████████▊                                | 12222/20117 [7:48:27<5:33:27,  2.53s/it] 61%|█████████████████████████████████████████████████▊                                | 12223/20117 [7:48:30<5:33:16,  2.53s/it] 61%|█████████████████████████████████████████████████▊                                | 12224/20117 [7:48:32<5:33:33,  2.54s/it] 61%|█████████████████████████████████████████████████▊                                | 12225/20117 [7:48:35<5:33:47,  2.54s/it] 61%|█████████████████████████████████████████████████▊                                | 12226/20117 [7:48:37<5:33:08,  2.53s/it] 61%|█████████████████████████████████████████████████▊                                | 12227/20117 [7:48:40<5:34:17,  2.54s/it] 61%|█████████████████████████████████████████████████▊                                | 12228/20117 [7:48:42<5:33:32,  2.54s/it] 61%|█████████████████████████████████████████████████▊                                | 12229/20117 [7:48:45<5:31:01,  2.52s/it] 61%|█████████████████████████████████████████████████▊                                | 12230/20117 [7:48:47<5:31:24,  2.52s/it]                                                                                                                                 {'loss': 0.1939, 'grad_norm': 0.4170425832271576, 'learning_rate': 6.733050951275347e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.45, 'epoch': 1.22}
 61%|█████████████████████████████████████████████████▊                                | 12230/20117 [7:48:47<5:31:24,  2.52s/it] 61%|█████████████████████████████████████████████████▊                                | 12231/20117 [7:48:50<5:32:47,  2.53s/it] 61%|█████████████████████████████████████████████████▊                                | 12232/20117 [7:48:53<5:33:10,  2.54s/it] 61%|█████████████████████████████████████████████████▊                                | 12233/20117 [7:48:55<5:35:00,  2.55s/it] 61%|█████████████████████████████████████████████████▊                                | 12234/20117 [7:48:58<5:35:23,  2.55s/it] 61%|█████████████████████████████████████████████████▊                                | 12235/20117 [7:49:00<5:34:21,  2.55s/it] 61%|█████████████████████████████████████████████████▉                                | 12236/20117 [7:49:03<5:32:13,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12237/20117 [7:49:05<5:31:44,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12238/20117 [7:49:08<5:31:55,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12239/20117 [7:49:10<5:32:41,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12240/20117 [7:49:13<5:32:59,  2.54s/it]                                                                                                                                 {'loss': 0.1701, 'grad_norm': 0.6094039678573608, 'learning_rate': 6.71822152438091e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.33, 'epoch': 1.22}
 61%|█████████████████████████████████████████████████▉                                | 12240/20117 [7:49:13<5:32:59,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12241/20117 [7:49:15<5:32:18,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12242/20117 [7:49:18<5:32:12,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12243/20117 [7:49:20<5:33:58,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12244/20117 [7:49:23<5:32:44,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12245/20117 [7:49:26<5:33:06,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12246/20117 [7:49:28<5:31:47,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12247/20117 [7:49:31<5:32:27,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12248/20117 [7:49:33<5:37:52,  2.58s/it] 61%|█████████████████████████████████████████████████▉                                | 12249/20117 [7:49:36<5:34:38,  2.55s/it] 61%|█████████████████████████████████████████████████▉                                | 12250/20117 [7:49:38<5:34:14,  2.55s/it]                                                                                                                                 {'loss': 0.168, 'grad_norm': 0.5931222438812256, 'learning_rate': 6.703400181200472e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.43, 'epoch': 1.22}
 61%|█████████████████████████████████████████████████▉                                | 12250/20117 [7:49:38<5:34:14,  2.55s/it] 61%|█████████████████████████████████████████████████▉                                | 12251/20117 [7:49:41<5:32:33,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12252/20117 [7:49:43<5:31:56,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12253/20117 [7:49:46<5:30:26,  2.52s/it] 61%|█████████████████████████████████████████████████▉                                | 12254/20117 [7:49:49<5:47:00,  2.65s/it] 61%|█████████████████████████████████████████████████▉                                | 12255/20117 [7:49:51<5:42:18,  2.61s/it] 61%|█████████████████████████████████████████████████▉                                | 12256/20117 [7:49:54<5:37:31,  2.58s/it] 61%|█████████████████████████████████████████████████▉                                | 12257/20117 [7:49:56<5:34:34,  2.55s/it] 61%|█████████████████████████████████████████████████▉                                | 12258/20117 [7:49:59<5:32:28,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12259/20117 [7:50:01<5:31:32,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12260/20117 [7:50:04<5:31:03,  2.53s/it]                                                                                                                                 {'loss': 0.1449, 'grad_norm': 0.5588046908378601, 'learning_rate': 6.688586958242144e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.04, 'epoch': 1.22}
 61%|█████████████████████████████████████████████████▉                                | 12260/20117 [7:50:04<5:31:03,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12261/20117 [7:50:06<5:32:35,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12262/20117 [7:50:09<5:29:50,  2.52s/it] 61%|█████████████████████████████████████████████████▉                                | 12263/20117 [7:50:11<5:32:21,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12264/20117 [7:50:14<5:32:55,  2.54s/it] 61%|█████████████████████████████████████████████████▉                                | 12265/20117 [7:50:17<5:31:33,  2.53s/it] 61%|█████████████████████████████████████████████████▉                                | 12266/20117 [7:50:19<5:32:08,  2.54s/it] 61%|██████████████████████████████████████████████████                                | 12267/20117 [7:50:22<5:32:04,  2.54s/it] 61%|██████████████████████████████████████████████████                                | 12268/20117 [7:50:24<5:30:17,  2.52s/it] 61%|██████████████████████████████████████████████████                                | 12269/20117 [7:50:27<5:27:30,  2.50s/it] 61%|██████████████████████████████████████████████████                                | 12270/20117 [7:50:29<5:21:06,  2.46s/it]                                                                                                                                 {'loss': 0.1925, 'grad_norm': 0.7040362358093262, 'learning_rate': 6.673781891994018e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 280.89, 'epoch': 1.22}
 61%|██████████████████████████████████████████████████                                | 12270/20117 [7:50:29<5:21:06,  2.46s/it] 61%|██████████████████████████████████████████████████                                | 12271/20117 [7:50:31<5:16:31,  2.42s/it] 61%|██████████████████████████████████████████████████                                | 12272/20117 [7:50:34<5:17:53,  2.43s/it] 61%|██████████████████████████████████████████████████                                | 12273/20117 [7:50:36<5:19:10,  2.44s/it] 61%|██████████████████████████████████████████████████                                | 12274/20117 [7:50:39<5:19:17,  2.44s/it] 61%|██████████████████████████████████████████████████                                | 12275/20117 [7:50:41<5:15:35,  2.41s/it] 61%|██████████████████████████████████████████████████                                | 12276/20117 [7:50:43<5:16:45,  2.42s/it] 61%|██████████████████████████████████████████████████                                | 12277/20117 [7:50:46<5:19:02,  2.44s/it] 61%|██████████████████████████████████████████████████                                | 12278/20117 [7:50:48<5:20:37,  2.45s/it] 61%|██████████████████████████████████████████████████                                | 12279/20117 [7:50:51<5:23:39,  2.48s/it] 61%|██████████████████████████████████████████████████                                | 12280/20117 [7:50:53<5:19:48,  2.45s/it]                                                                                                                                 {'loss': 0.1817, 'grad_norm': 0.46226373314857483, 'learning_rate': 6.658985018924104e-05, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.62, 'epoch': 1.22}
 61%|██████████████████████████████████████████████████                                | 12280/20117 [7:50:53<5:19:48,  2.45s/it] 61%|██████████████████████████████████████████████████                                | 12281/20117 [7:50:56<5:15:07,  2.41s/it] 61%|██████████████████████████████████████████████████                                | 12282/20117 [7:50:58<5:19:03,  2.44s/it] 61%|██████████████████████████████████████████████████                                | 12283/20117 [7:51:01<5:21:40,  2.46s/it] 61%|██████████████████████████████████████████████████                                | 12284/20117 [7:51:03<5:22:20,  2.47s/it] 61%|██████████████████████████████████████████████████                                | 12285/20117 [7:51:06<5:24:39,  2.49s/it] 61%|██████████████████████████████████████████████████                                | 12286/20117 [7:51:08<5:25:08,  2.49s/it] 61%|██████████████████████████████████████████████████                                | 12287/20117 [7:51:11<5:27:27,  2.51s/it] 61%|██████████████████████████████████████████████████                                | 12288/20117 [7:51:14<5:40:27,  2.61s/it] 61%|██████████████████████████████████████████████████                                | 12289/20117 [7:51:16<5:36:39,  2.58s/it] 61%|██████████████████████████████████████████████████                                | 12290/20117 [7:51:19<5:33:19,  2.56s/it]                                                                                                                                 {'loss': 0.1602, 'grad_norm': 0.6106900572776794, 'learning_rate': 6.644196375480228e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 242.36, 'epoch': 1.22}
 61%|██████████████████████████████████████████████████                                | 12290/20117 [7:51:19<5:33:19,  2.56s/it] 61%|██████████████████████████████████████████████████                                | 12291/20117 [7:51:21<5:31:26,  2.54s/it] 61%|██████████████████████████████████████████████████                                | 12292/20117 [7:51:24<5:31:14,  2.54s/it] 61%|██████████████████████████████████████████████████                                | 12293/20117 [7:51:26<5:28:59,  2.52s/it] 61%|██████████████████████████████████████████████████                                | 12294/20117 [7:51:29<5:30:57,  2.54s/it] 61%|██████████████████████████████████████████████████                                | 12295/20117 [7:51:31<5:30:22,  2.53s/it] 61%|██████████████████████████████████████████████████                                | 12296/20117 [7:51:34<5:29:25,  2.53s/it] 61%|██████████████████████████████████████████████████                                | 12297/20117 [7:51:36<5:29:34,  2.53s/it] 61%|██████████████████████████████████████████████████▏                               | 12298/20117 [7:51:39<5:30:29,  2.54s/it] 61%|██████████████████████████████████████████████████▏                               | 12299/20117 [7:51:41<5:32:05,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12300/20117 [7:51:44<5:32:42,  2.55s/it]                                                                                                                                 {'loss': 0.2175, 'grad_norm': 0.3931345045566559, 'learning_rate': 6.629415998089947e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.03, 'epoch': 1.22}
 61%|██████████████████████████████████████████████████▏                               | 12300/20117 [7:51:44<5:32:42,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12301/20117 [7:51:46<5:31:13,  2.54s/it] 61%|██████████████████████████████████████████████████▏                               | 12302/20117 [7:51:49<5:32:28,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12303/20117 [7:51:52<5:32:32,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12304/20117 [7:51:54<5:31:59,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12305/20117 [7:51:57<5:32:42,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12306/20117 [7:51:59<5:33:06,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12307/20117 [7:52:02<5:47:02,  2.67s/it] 61%|██████████████████████████████████████████████████▏                               | 12308/20117 [7:52:05<5:44:08,  2.64s/it] 61%|██████████████████████████████████████████████████▏                               | 12309/20117 [7:52:07<5:40:18,  2.62s/it] 61%|██████████████████████████████████████████████████▏                               | 12310/20117 [7:52:10<5:36:23,  2.59s/it]                                                                                                                                 {'loss': 0.1477, 'grad_norm': 0.5207247734069824, 'learning_rate': 6.61464392316045e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.44, 'epoch': 1.22}
 61%|██████████████████████████████████████████████████▏                               | 12310/20117 [7:52:10<5:36:23,  2.59s/it] 61%|██████████████████████████████████████████████████▏                               | 12311/20117 [7:52:12<5:31:59,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12312/20117 [7:52:15<5:31:08,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12313/20117 [7:52:17<5:29:50,  2.54s/it] 61%|██████████████████████████████████████████████████▏                               | 12314/20117 [7:52:20<5:33:01,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12315/20117 [7:52:22<5:32:37,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12316/20117 [7:52:25<5:32:49,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12317/20117 [7:52:28<5:34:01,  2.57s/it] 61%|██████████████████████████████████████████████████▏                               | 12318/20117 [7:52:30<5:32:47,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12319/20117 [7:52:33<5:33:01,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12320/20117 [7:52:35<5:32:59,  2.56s/it]                                                                                                                                 {'loss': 0.1891, 'grad_norm': 0.709812343120575, 'learning_rate': 6.599880187078479e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.92, 'epoch': 1.22}
 61%|██████████████████████████████████████████████████▏                               | 12320/20117 [7:52:35<5:32:59,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12321/20117 [7:52:38<5:32:04,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12322/20117 [7:52:40<5:33:24,  2.57s/it] 61%|██████████████████████████████████████████████████▏                               | 12323/20117 [7:52:43<5:33:20,  2.57s/it] 61%|██████████████████████████████████████████████████▏                               | 12324/20117 [7:52:46<5:31:57,  2.56s/it] 61%|██████████████████████████████████████████████████▏                               | 12325/20117 [7:52:48<5:30:41,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12326/20117 [7:52:51<5:31:12,  2.55s/it] 61%|██████████████████████████████████████████████████▏                               | 12327/20117 [7:52:53<5:30:19,  2.54s/it] 61%|██████████████████████████████████████████████████▎                               | 12328/20117 [7:52:56<5:39:16,  2.61s/it] 61%|██████████████████████████████████████████████████▎                               | 12329/20117 [7:52:59<5:39:15,  2.61s/it] 61%|██████████████████████████████████████████████████▎                               | 12330/20117 [7:53:01<5:35:43,  2.59s/it]                                                                                                                                 {'loss': 0.1395, 'grad_norm': 0.7825199961662292, 'learning_rate': 6.585124826210245e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 265.35, 'epoch': 1.23}
 61%|██████████████████████████████████████████████████▎                               | 12330/20117 [7:53:01<5:35:43,  2.59s/it] 61%|██████████████████████████████████████████████████▎                               | 12331/20117 [7:53:04<5:32:43,  2.56s/it] 61%|██████████████████████████████████████████████████▎                               | 12332/20117 [7:53:06<5:30:30,  2.55s/it] 61%|██████████████████████████████████████████████████▎                               | 12333/20117 [7:53:09<5:27:37,  2.53s/it] 61%|██████████████████████████████████████████████████▎                               | 12334/20117 [7:53:11<5:29:57,  2.54s/it] 61%|██████████████████████████████████████████████████▎                               | 12335/20117 [7:53:14<5:30:52,  2.55s/it] 61%|██████████████████████████████████████████████████▎                               | 12336/20117 [7:53:16<5:29:03,  2.54s/it] 61%|██████████████████████████████████████████████████▎                               | 12337/20117 [7:53:19<5:21:22,  2.48s/it] 61%|██████████████████████████████████████████████████▎                               | 12338/20117 [7:53:21<5:19:57,  2.47s/it] 61%|██████████████████████████████████████████████████▎                               | 12339/20117 [7:53:24<5:25:05,  2.51s/it] 61%|██████████████████████████████████████████████████▎                               | 12340/20117 [7:53:26<5:27:35,  2.53s/it]                                                                                                                                 {'loss': 0.1813, 'grad_norm': 0.6053150296211243, 'learning_rate': 6.570377876901311e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 374.3, 'epoch': 1.23}
 61%|██████████████████████████████████████████████████▎                               | 12340/20117 [7:53:26<5:27:35,  2.53s/it] 61%|██████████████████████████████████████████████████▎                               | 12341/20117 [7:53:29<5:25:25,  2.51s/it] 61%|██████████████████████████████████████████████████▎                               | 12342/20117 [7:53:31<5:26:41,  2.52s/it] 61%|██████████████████████████████████████████████████▎                               | 12343/20117 [7:53:34<5:23:28,  2.50s/it] 61%|██████████████████████████████████████████████████▎                               | 12344/20117 [7:53:36<5:25:49,  2.52s/it] 61%|██████████████████████████████████████████████████▎                               | 12345/20117 [7:53:39<5:25:10,  2.51s/it] 61%|██████████████████████████████████████████████████▎                               | 12346/20117 [7:53:41<5:23:52,  2.50s/it] 61%|██████████████████████████████████████████████████▎                               | 12347/20117 [7:53:44<5:20:18,  2.47s/it] 61%|██████████████████████████████████████████████████▎                               | 12348/20117 [7:53:46<5:09:50,  2.39s/it] 61%|██████████████████████████████████████████████████▎                               | 12349/20117 [7:53:48<5:13:51,  2.42s/it] 61%|██████████████████████████████████████████████████▎                               | 12350/20117 [7:53:51<5:19:13,  2.47s/it]                                                                                                                                 {'loss': 0.2049, 'grad_norm': 0.6844688653945923, 'learning_rate': 6.555639375476532e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.45, 'epoch': 1.23}
 61%|██████████████████████████████████████████████████▎                               | 12350/20117 [7:53:51<5:19:13,  2.47s/it] 61%|██████████████████████████████████████████████████▎                               | 12351/20117 [7:53:53<5:23:14,  2.50s/it] 61%|██████████████████████████████████████████████████▎                               | 12352/20117 [7:53:56<5:25:35,  2.52s/it] 61%|██████████████████████████████████████████████████▎                               | 12353/20117 [7:53:58<5:25:26,  2.51s/it] 61%|██████████████████████████████████████████████████▎                               | 12354/20117 [7:54:01<5:25:31,  2.52s/it] 61%|██████████████████████████████████████████████████▎                               | 12355/20117 [7:54:03<5:22:45,  2.49s/it] 61%|██████████████████████████████████████████████████▎                               | 12356/20117 [7:54:06<5:19:57,  2.47s/it] 61%|██████████████████████████████████████████████████▎                               | 12357/20117 [7:54:08<5:16:56,  2.45s/it] 61%|██████████████████████████████████████████████████▎                               | 12358/20117 [7:54:11<5:22:12,  2.49s/it] 61%|██████████████████████████████████████████████████▍                               | 12359/20117 [7:54:13<5:23:12,  2.50s/it] 61%|██████████████████████████████████████████████████▍                               | 12360/20117 [7:54:16<5:37:38,  2.61s/it]                                                                                                                                 {'loss': 0.1821, 'grad_norm': 0.5875769853591919, 'learning_rate': 6.540909358239954e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.63, 'epoch': 1.23}
 61%|██████████████████████████████████████████████████▍                               | 12360/20117 [7:54:16<5:37:38,  2.61s/it] 61%|██████████████████████████████████████████████████▍                               | 12361/20117 [7:54:19<5:33:08,  2.58s/it] 61%|██████████████████████████████████████████████████▍                               | 12362/20117 [7:54:21<5:26:18,  2.52s/it] 61%|██████████████████████████████████████████████████▍                               | 12363/20117 [7:54:24<5:22:03,  2.49s/it] 61%|██████████████████████████████████████████████████▍                               | 12364/20117 [7:54:26<5:23:39,  2.50s/it] 61%|██████████████████████████████████████████████████▍                               | 12365/20117 [7:54:29<5:21:50,  2.49s/it] 61%|██████████████████████████████████████████████████▍                               | 12366/20117 [7:54:31<5:17:54,  2.46s/it] 61%|██████████████████████████████████████████████████▍                               | 12367/20117 [7:54:33<5:17:09,  2.46s/it] 61%|██████████████████████████████████████████████████▍                               | 12368/20117 [7:54:36<5:15:16,  2.44s/it] 61%|██████████████████████████████████████████████████▍                               | 12369/20117 [7:54:38<5:15:55,  2.45s/it] 61%|██████████████████████████████████████████████████▍                               | 12370/20117 [7:54:41<5:12:51,  2.42s/it]                                                                                                                                 {'loss': 0.1581, 'grad_norm': 0.4794519543647766, 'learning_rate': 6.526187861474727e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.97, 'epoch': 1.23}
 61%|██████████████████████████████████████████████████▍                               | 12370/20117 [7:54:41<5:12:51,  2.42s/it] 61%|██████████████████████████████████████████████████▍                               | 12371/20117 [7:54:43<5:12:24,  2.42s/it] 62%|██████████████████████████████████████████████████▍                               | 12372/20117 [7:54:45<5:10:29,  2.41s/it] 62%|██████████████████████████████████████████████████▍                               | 12373/20117 [7:54:48<5:10:29,  2.41s/it] 62%|██████████████████████████████████████████████████▍                               | 12374/20117 [7:54:50<5:11:17,  2.41s/it] 62%|██████████████████████████████████████████████████▍                               | 12375/20117 [7:54:53<5:11:38,  2.42s/it] 62%|██████████████████████████████████████████████████▍                               | 12376/20117 [7:54:55<5:13:21,  2.43s/it] 62%|██████████████████████████████████████████████████▍                               | 12377/20117 [7:54:58<5:13:44,  2.43s/it] 62%|██████████████████████████████████████████████████▍                               | 12378/20117 [7:55:00<5:11:28,  2.41s/it] 62%|██████████████████████████████████████████████████▍                               | 12379/20117 [7:55:02<5:13:29,  2.43s/it] 62%|██████████████████████████████████████████████████▍                               | 12380/20117 [7:55:05<5:14:16,  2.44s/it]                                                                                                                                 {'loss': 0.1482, 'grad_norm': 0.4545304775238037, 'learning_rate': 6.511474921442997e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 307.3, 'epoch': 1.23}
 62%|██████████████████████████████████████████████████▍                               | 12380/20117 [7:55:05<5:14:16,  2.44s/it] 62%|██████████████████████████████████████████████████▍                               | 12381/20117 [7:55:07<5:14:02,  2.44s/it] 62%|██████████████████████████████████████████████████▍                               | 12382/20117 [7:55:10<5:13:55,  2.44s/it] 62%|██████████████████████████████████████████████████▍                               | 12383/20117 [7:55:12<5:14:05,  2.44s/it] 62%|██████████████████████████████████████████████████▍                               | 12384/20117 [7:55:15<5:14:19,  2.44s/it] 62%|██████████████████████████████████████████████████▍                               | 12385/20117 [7:55:17<5:14:06,  2.44s/it] 62%|██████████████████████████████████████████████████▍                               | 12386/20117 [7:55:19<5:13:34,  2.43s/it] 62%|██████████████████████████████████████████████████▍                               | 12387/20117 [7:55:22<5:16:11,  2.45s/it] 62%|██████████████████████████████████████████████████▍                               | 12388/20117 [7:55:24<5:15:29,  2.45s/it] 62%|██████████████████████████████████████████████████▍                               | 12389/20117 [7:55:27<5:13:51,  2.44s/it] 62%|██████████████████████████████████████████████████▌                               | 12390/20117 [7:55:29<5:13:53,  2.44s/it]                                                                                                                                 {'loss': 0.1688, 'grad_norm': 0.4457061290740967, 'learning_rate': 6.496770574385858e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.22, 'epoch': 1.23}
 62%|██████████████████████████████████████████████████▌                               | 12390/20117 [7:55:29<5:13:53,  2.44s/it] 62%|██████████████████████████████████████████████████▌                               | 12391/20117 [7:55:32<5:12:36,  2.43s/it] 62%|██████████████████████████████████████████████████▌                               | 12392/20117 [7:55:34<5:09:14,  2.40s/it] 62%|██████████████████████████████████████████████████▌                               | 12393/20117 [7:55:36<5:09:06,  2.40s/it] 62%|██████████████████████████████████████████████████▌                               | 12394/20117 [7:55:39<5:11:55,  2.42s/it] 62%|██████████████████████████████████████████████████▌                               | 12395/20117 [7:55:41<5:11:18,  2.42s/it] 62%|██████████████████████████████████████████████████▌                               | 12396/20117 [7:55:44<5:10:16,  2.41s/it] 62%|██████████████████████████████████████████████████▌                               | 12397/20117 [7:55:46<5:07:32,  2.39s/it] 62%|██████████████████████████████████████████████████▌                               | 12398/20117 [7:55:48<5:08:00,  2.39s/it] 62%|██████████████████████████████████████████████████▌                               | 12399/20117 [7:55:51<5:06:36,  2.38s/it] 62%|██████████████████████████████████████████████████▌                               | 12400/20117 [7:55:53<5:09:20,  2.41s/it]                                                                                                                                 {'loss': 0.1487, 'grad_norm': 0.5178716778755188, 'learning_rate': 6.482074856523215e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.6, 'epoch': 1.23}
 62%|██████████████████████████████████████████████████▌                               | 12400/20117 [7:55:53<5:09:20,  2.41s/it] 62%|██████████████████████████████████████████████████▌                               | 12401/20117 [7:55:56<5:09:55,  2.41s/it] 62%|██████████████████████████████████████████████████▌                               | 12402/20117 [7:55:58<5:09:54,  2.41s/it] 62%|██████████████████████████████████████████████████▌                               | 12403/20117 [7:56:01<5:10:06,  2.41s/it] 62%|██████████████████████████████████████████████████▌                               | 12404/20117 [7:56:03<5:09:07,  2.40s/it] 62%|██████████████████████████████████████████████████▌                               | 12405/20117 [7:56:05<5:15:45,  2.46s/it] 62%|██████████████████████████████████████████████████▌                               | 12406/20117 [7:56:08<5:13:27,  2.44s/it] 62%|██████████████████████████████████████████████████▌                               | 12407/20117 [7:56:10<5:13:40,  2.44s/it] 62%|██████████████████████████████████████████████████▌                               | 12408/20117 [7:56:13<5:07:31,  2.39s/it] 62%|██████████████████████████████████████████████████▌                               | 12409/20117 [7:56:15<5:00:49,  2.34s/it] 62%|██████████████████████████████████████████████████▌                               | 12410/20117 [7:56:17<4:59:01,  2.33s/it]                                                                                                                                 {'loss': 0.208, 'grad_norm': 0.2689170837402344, 'learning_rate': 6.467387804053731e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.76, 'epoch': 1.23}
 62%|██████████████████████████████████████████████████▌                               | 12410/20117 [7:56:17<4:59:01,  2.33s/it] 62%|██████████████████████████████████████████████████▌                               | 12411/20117 [7:56:20<5:01:44,  2.35s/it] 62%|██████████████████████████████████████████████████▌                               | 12412/20117 [7:56:22<5:19:21,  2.49s/it] 62%|██████████████████████████████████████████████████▌                               | 12413/20117 [7:56:25<5:11:02,  2.42s/it] 62%|██████████████████████████████████████████████████▌                               | 12414/20117 [7:56:27<5:10:31,  2.42s/it] 62%|██████████████████████████████████████████████████▌                               | 12415/20117 [7:56:29<5:06:09,  2.39s/it] 62%|██████████████████████████████████████████████████▌                               | 12416/20117 [7:56:32<5:06:31,  2.39s/it] 62%|██████████████████████████████████████████████████▌                               | 12417/20117 [7:56:34<5:03:42,  2.37s/it] 62%|██████████████████████████████████████████████████▌                               | 12418/20117 [7:56:36<5:04:09,  2.37s/it] 62%|██████████████████████████████████████████████████▌                               | 12419/20117 [7:56:39<5:00:59,  2.35s/it] 62%|██████████████████████████████████████████████████▋                               | 12420/20117 [7:56:41<5:03:01,  2.36s/it]                                                                                                                                 {'loss': 0.1667, 'grad_norm': 0.5642197132110596, 'learning_rate': 6.45270945315472e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 299.73, 'epoch': 1.23}
 62%|██████████████████████████████████████████████████▋                               | 12420/20117 [7:56:41<5:03:01,  2.36s/it] 62%|██████████████████████████████████████████████████▋                               | 12421/20117 [7:56:44<5:05:23,  2.38s/it] 62%|██████████████████████████████████████████████████▋                               | 12422/20117 [7:56:46<5:09:00,  2.41s/it] 62%|██████████████████████████████████████████████████▋                               | 12423/20117 [7:56:48<5:10:28,  2.42s/it] 62%|██████████████████████████████████████████████████▋                               | 12424/20117 [7:56:51<5:11:22,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12425/20117 [7:56:53<5:10:33,  2.42s/it] 62%|██████████████████████████████████████████████████▋                               | 12426/20117 [7:56:56<5:11:03,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12427/20117 [7:56:58<5:11:31,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12428/20117 [7:57:01<5:10:48,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12429/20117 [7:57:03<5:10:21,  2.42s/it] 62%|██████████████████████████████████████████████████▋                               | 12430/20117 [7:57:05<5:10:13,  2.42s/it]                                                                                                                                 {'loss': 0.1317, 'grad_norm': 0.38734033703804016, 'learning_rate': 6.438039839982066e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 298.98, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▋                               | 12430/20117 [7:57:05<5:10:13,  2.42s/it] 62%|██████████████████████████████████████████████████▋                               | 12431/20117 [7:57:08<5:10:01,  2.42s/it] 62%|██████████████████████████████████████████████████▋                               | 12432/20117 [7:57:10<5:12:31,  2.44s/it] 62%|██████████████████████████████████████████████████▋                               | 12433/20117 [7:57:13<5:12:57,  2.44s/it] 62%|██████████████████████████████████████████████████▋                               | 12434/20117 [7:57:15<5:14:51,  2.46s/it] 62%|██████████████████████████████████████████████████▋                               | 12435/20117 [7:57:18<5:13:04,  2.45s/it] 62%|██████████████████████████████████████████████████▋                               | 12436/20117 [7:57:20<5:11:00,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12437/20117 [7:57:22<5:10:45,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12438/20117 [7:57:25<5:10:16,  2.42s/it] 62%|██████████████████████████████████████████████████▋                               | 12439/20117 [7:57:27<5:12:31,  2.44s/it] 62%|██████████████████████████████████████████████████▋                               | 12440/20117 [7:57:30<5:13:14,  2.45s/it]                                                                                                                                 {'loss': 0.1357, 'grad_norm': 0.39504683017730713, 'learning_rate': 6.42337900067012e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.51, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▋                               | 12440/20117 [7:57:30<5:13:14,  2.45s/it] 62%|██████████████████████████████████████████████████▋                               | 12441/20117 [7:57:32<5:12:20,  2.44s/it] 62%|██████████████████████████████████████████████████▋                               | 12442/20117 [7:57:35<5:12:02,  2.44s/it] 62%|██████████████████████████████████████████████████▋                               | 12443/20117 [7:57:37<5:12:03,  2.44s/it] 62%|██████████████████████████████████████████████████▋                               | 12444/20117 [7:57:40<5:13:12,  2.45s/it] 62%|██████████████████████████████████████████████████▋                               | 12445/20117 [7:57:42<5:11:09,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12446/20117 [7:57:44<5:10:27,  2.43s/it] 62%|██████████████████████████████████████████████████▋                               | 12447/20117 [7:57:47<5:06:56,  2.40s/it] 62%|██████████████████████████████████████████████████▋                               | 12448/20117 [7:57:49<5:06:51,  2.40s/it] 62%|██████████████████████████████████████████████████▋                               | 12449/20117 [7:57:52<5:07:07,  2.40s/it] 62%|██████████████████████████████████████████████████▋                               | 12450/20117 [7:57:54<5:08:34,  2.41s/it]                                                                                                                                 {'loss': 0.1919, 'grad_norm': 0.38826897740364075, 'learning_rate': 6.408726971331631e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.89, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▋                               | 12450/20117 [7:57:54<5:08:34,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12451/20117 [7:57:56<5:08:11,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12452/20117 [7:57:59<5:06:49,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12453/20117 [7:58:01<5:07:24,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12454/20117 [7:58:04<5:06:10,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12455/20117 [7:58:06<5:06:45,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12456/20117 [7:58:08<5:09:30,  2.42s/it] 62%|██████████████████████████████████████████████████▊                               | 12457/20117 [7:58:11<5:09:45,  2.43s/it] 62%|██████████████████████████████████████████████████▊                               | 12458/20117 [7:58:13<5:06:31,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12459/20117 [7:58:16<5:06:06,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12460/20117 [7:58:18<5:07:16,  2.41s/it]                                                                                                                                 {'loss': 0.1613, 'grad_norm': 0.6274026036262512, 'learning_rate': 6.39408378805765e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.25, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▊                               | 12460/20117 [7:58:18<5:07:16,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12461/20117 [7:58:20<5:04:49,  2.39s/it] 62%|██████████████████████████████████████████████████▊                               | 12462/20117 [7:58:23<5:04:49,  2.39s/it] 62%|██████████████████████████████████████████████████▊                               | 12463/20117 [7:58:25<5:05:34,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12464/20117 [7:58:28<5:21:33,  2.52s/it] 62%|██████████████████████████████████████████████████▊                               | 12465/20117 [7:58:30<5:18:01,  2.49s/it] 62%|██████████████████████████████████████████████████▊                               | 12466/20117 [7:58:33<5:15:37,  2.48s/it] 62%|██████████████████████████████████████████████████▊                               | 12467/20117 [7:58:35<5:11:44,  2.45s/it] 62%|██████████████████████████████████████████████████▊                               | 12468/20117 [7:58:38<5:08:16,  2.42s/it] 62%|██████████████████████████████████████████████████▊                               | 12469/20117 [7:58:40<5:08:20,  2.42s/it] 62%|██████████████████████████████████████████████████▊                               | 12470/20117 [7:58:42<5:06:59,  2.41s/it]                                                                                                                                 {'loss': 0.1976, 'grad_norm': 0.5444367527961731, 'learning_rate': 6.379449486917421e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 398.28, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▊                               | 12470/20117 [7:58:42<5:06:59,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12471/20117 [7:58:45<5:06:49,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12472/20117 [7:58:47<5:07:46,  2.42s/it] 62%|██████████████████████████████████████████████████▊                               | 12473/20117 [7:58:50<5:05:23,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12474/20117 [7:58:52<5:05:20,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12475/20117 [7:58:54<5:04:34,  2.39s/it] 62%|██████████████████████████████████████████████████▊                               | 12476/20117 [7:58:57<5:06:19,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12477/20117 [7:58:59<5:05:25,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12478/20117 [7:59:02<5:06:13,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12479/20117 [7:59:04<5:06:39,  2.41s/it] 62%|██████████████████████████████████████████████████▊                               | 12480/20117 [7:59:06<5:05:39,  2.40s/it]                                                                                                                                 {'loss': 0.1205, 'grad_norm': 0.40487316250801086, 'learning_rate': 6.364824103958331e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.23, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▊                               | 12480/20117 [7:59:06<5:05:39,  2.40s/it] 62%|██████████████████████████████████████████████████▊                               | 12481/20117 [7:59:09<5:06:15,  2.41s/it] 62%|██████████████████████████████████████████████████▉                               | 12482/20117 [7:59:11<5:09:43,  2.43s/it] 62%|██████████████████████████████████████████████████▉                               | 12483/20117 [7:59:14<5:09:15,  2.43s/it] 62%|██████████████████████████████████████████████████▉                               | 12484/20117 [7:59:16<5:06:29,  2.41s/it] 62%|██████████████████████████████████████████████████▉                               | 12485/20117 [7:59:18<4:59:49,  2.36s/it] 62%|██████████████████████████████████████████████████▉                               | 12486/20117 [7:59:21<4:56:14,  2.33s/it] 62%|██████████████████████████████████████████████████▉                               | 12487/20117 [7:59:23<4:56:38,  2.33s/it] 62%|██████████████████████████████████████████████████▉                               | 12488/20117 [7:59:25<4:58:51,  2.35s/it] 62%|██████████████████████████████████████████████████▉                               | 12489/20117 [7:59:28<4:59:18,  2.35s/it] 62%|██████████████████████████████████████████████████▉                               | 12490/20117 [7:59:30<4:55:09,  2.32s/it]                                                                                                                                 {'loss': 0.1472, 'grad_norm': 0.55726557970047, 'learning_rate': 6.350207675205781e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.36, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▉                               | 12490/20117 [7:59:30<4:55:09,  2.32s/it] 62%|██████████████████████████████████████████████████▉                               | 12491/20117 [7:59:32<4:56:08,  2.33s/it] 62%|██████████████████████████████████████████████████▉                               | 12492/20117 [7:59:35<4:57:55,  2.34s/it] 62%|██████████████████████████████████████████████████▉                               | 12493/20117 [7:59:37<5:00:14,  2.36s/it] 62%|██████████████████████████████████████████████████▉                               | 12494/20117 [7:59:39<4:59:38,  2.36s/it] 62%|██████████████████████████████████████████████████▉                               | 12495/20117 [7:59:42<5:01:59,  2.38s/it] 62%|██████████████████████████████████████████████████▉                               | 12496/20117 [7:59:44<4:58:07,  2.35s/it] 62%|██████████████████████████████████████████████████▉                               | 12497/20117 [7:59:46<4:52:06,  2.30s/it] 62%|██████████████████████████████████████████████████▉                               | 12498/20117 [7:59:49<4:50:29,  2.29s/it] 62%|██████████████████████████████████████████████████▉                               | 12499/20117 [7:59:51<4:54:44,  2.32s/it] 62%|██████████████████████████████████████████████████▉                               | 12500/20117 [7:59:53<4:59:02,  2.36s/it]                                                                                                                                 {'loss': 0.173, 'grad_norm': 0.5732585787773132, 'learning_rate': 6.335600236663131e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.39, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▉                               | 12500/20117 [7:59:53<4:59:02,  2.36s/it] 62%|██████████████████████████████████████████████████▉                               | 12501/20117 [7:59:56<5:04:58,  2.40s/it] 62%|██████████████████████████████████████████████████▉                               | 12502/20117 [7:59:58<5:04:40,  2.40s/it] 62%|██████████████████████████████████████████████████▉                               | 12503/20117 [8:00:01<5:05:28,  2.41s/it] 62%|██████████████████████████████████████████████████▉                               | 12504/20117 [8:00:03<5:05:46,  2.41s/it] 62%|██████████████████████████████████████████████████▉                               | 12505/20117 [8:00:06<5:04:05,  2.40s/it] 62%|██████████████████████████████████████████████████▉                               | 12506/20117 [8:00:08<5:03:18,  2.39s/it] 62%|██████████████████████████████████████████████████▉                               | 12507/20117 [8:00:10<5:05:55,  2.41s/it] 62%|██████████████████████████████████████████████████▉                               | 12508/20117 [8:00:13<5:07:05,  2.42s/it] 62%|██████████████████████████████████████████████████▉                               | 12509/20117 [8:00:15<5:04:12,  2.40s/it] 62%|██████████████████████████████████████████████████▉                               | 12510/20117 [8:00:18<5:06:57,  2.42s/it]                                                                                                                                 {'loss': 0.1712, 'grad_norm': 0.47033485770225525, 'learning_rate': 6.321001824311583e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 300.06, 'epoch': 1.24}
 62%|██████████████████████████████████████████████████▉                               | 12510/20117 [8:00:18<5:06:57,  2.42s/it] 62%|██████████████████████████████████████████████████▉                               | 12511/20117 [8:00:20<5:06:54,  2.42s/it] 62%|███████████████████████████████████████████████████                               | 12512/20117 [8:00:23<5:08:29,  2.43s/it] 62%|███████████████████████████████████████████████████                               | 12513/20117 [8:00:25<5:09:01,  2.44s/it] 62%|███████████████████████████████████████████████████                               | 12514/20117 [8:00:27<5:09:16,  2.44s/it] 62%|███████████████████████████████████████████████████                               | 12515/20117 [8:00:30<5:10:30,  2.45s/it] 62%|███████████████████████████████████████████████████                               | 12516/20117 [8:00:32<5:08:57,  2.44s/it] 62%|███████████████████████████████████████████████████                               | 12517/20117 [8:00:35<5:25:30,  2.57s/it] 62%|███████████████████████████████████████████████████                               | 12518/20117 [8:00:38<5:21:54,  2.54s/it] 62%|███████████████████████████████████████████████████                               | 12519/20117 [8:00:40<5:19:26,  2.52s/it] 62%|███████████████████████████████████████████████████                               | 12520/20117 [8:00:43<5:14:04,  2.48s/it]                                                                                                                                 {'loss': 0.143, 'grad_norm': 0.44385817646980286, 'learning_rate': 6.306412474110122e-05, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.49, 'epoch': 1.24}
 62%|███████████████████████████████████████████████████                               | 12520/20117 [8:00:43<5:14:04,  2.48s/it] 62%|███████████████████████████████████████████████████                               | 12521/20117 [8:00:45<5:13:31,  2.48s/it] 62%|███████████████████████████████████████████████████                               | 12522/20117 [8:00:48<5:14:44,  2.49s/it] 62%|███████████████████████████████████████████████████                               | 12523/20117 [8:00:50<5:16:17,  2.50s/it] 62%|███████████████████████████████████████████████████                               | 12524/20117 [8:00:53<5:16:47,  2.50s/it] 62%|███████████████████████████████████████████████████                               | 12525/20117 [8:00:55<5:14:01,  2.48s/it] 62%|███████████████████████████████████████████████████                               | 12526/20117 [8:00:58<5:28:57,  2.60s/it] 62%|███████████████████████████████████████████████████                               | 12527/20117 [8:01:00<5:27:56,  2.59s/it] 62%|███████████████████████████████████████████████████                               | 12528/20117 [8:01:03<5:25:37,  2.57s/it] 62%|███████████████████████████████████████████████████                               | 12529/20117 [8:01:05<5:22:29,  2.55s/it] 62%|███████████████████████████████████████████████████                               | 12530/20117 [8:01:08<5:21:05,  2.54s/it]                                                                                                                                 {'loss': 0.1772, 'grad_norm': 0.5871978402137756, 'learning_rate': 6.291832221995388e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 289.93, 'epoch': 1.25}
 62%|███████████████████████████████████████████████████                               | 12530/20117 [8:01:08<5:21:05,  2.54s/it] 62%|███████████████████████████████████████████████████                               | 12531/20117 [8:01:10<5:18:34,  2.52s/it] 62%|███████████████████████████████████████████████████                               | 12532/20117 [8:01:13<5:12:46,  2.47s/it] 62%|███████████████████████████████████████████████████                               | 12533/20117 [8:01:15<5:11:47,  2.47s/it] 62%|███████████████████████████████████████████████████                               | 12534/20117 [8:01:18<5:09:41,  2.45s/it] 62%|███████████████████████████████████████████████████                               | 12535/20117 [8:01:20<5:08:09,  2.44s/it] 62%|███████████████████████████████████████████████████                               | 12536/20117 [8:01:23<5:09:10,  2.45s/it] 62%|███████████████████████████████████████████████████                               | 12537/20117 [8:01:25<5:07:55,  2.44s/it] 62%|███████████████████████████████████████████████████                               | 12538/20117 [8:01:27<5:03:52,  2.41s/it] 62%|███████████████████████████████████████████████████                               | 12539/20117 [8:01:30<5:06:31,  2.43s/it] 62%|███████████████████████████████████████████████████                               | 12540/20117 [8:01:32<5:07:15,  2.43s/it]                                                                                                                                 {'loss': 0.134, 'grad_norm': 0.6150623559951782, 'learning_rate': 6.277261103881638e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.62, 'epoch': 1.25}
 62%|███████████████████████████████████████████████████                               | 12540/20117 [8:01:32<5:07:15,  2.43s/it] 62%|███████████████████████████████████████████████████                               | 12541/20117 [8:01:35<5:08:02,  2.44s/it] 62%|███████████████████████████████████████████████████                               | 12542/20117 [8:01:37<5:07:36,  2.44s/it] 62%|███████████████████████████████████████████████████▏                              | 12543/20117 [8:01:39<5:01:26,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12544/20117 [8:01:42<5:00:57,  2.38s/it] 62%|███████████████████████████████████████████████████▏                              | 12545/20117 [8:01:44<5:01:23,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12546/20117 [8:01:47<5:01:27,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12547/20117 [8:01:49<5:03:03,  2.40s/it] 62%|███████████████████████████████████████████████████▏                              | 12548/20117 [8:01:51<5:03:55,  2.41s/it] 62%|███████████████████████████████████████████████████▏                              | 12549/20117 [8:01:54<5:06:42,  2.43s/it] 62%|███████████████████████████████████████████████████▏                              | 12550/20117 [8:01:56<5:05:51,  2.43s/it]                                                                                                                                 {'loss': 0.2192, 'grad_norm': 0.6367026567459106, 'learning_rate': 6.262699155660601e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.54, 'epoch': 1.25}
 62%|███████████████████████████████████████████████████▏                              | 12550/20117 [8:01:56<5:05:51,  2.43s/it] 62%|███████████████████████████████████████████████████▏                              | 12551/20117 [8:01:59<5:04:57,  2.42s/it] 62%|███████████████████████████████████████████████████▏                              | 12552/20117 [8:02:01<5:00:52,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12553/20117 [8:02:03<5:01:27,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12554/20117 [8:02:06<5:00:54,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12555/20117 [8:02:08<5:02:21,  2.40s/it] 62%|███████████████████████████████████████████████████▏                              | 12556/20117 [8:02:11<5:03:04,  2.41s/it] 62%|███████████████████████████████████████████████████▏                              | 12557/20117 [8:02:13<5:03:50,  2.41s/it] 62%|███████████████████████████████████████████████████▏                              | 12558/20117 [8:02:16<5:04:44,  2.42s/it] 62%|███████████████████████████████████████████████████▏                              | 12559/20117 [8:02:18<5:05:02,  2.42s/it] 62%|███████████████████████████████████████████████████▏                              | 12560/20117 [8:02:20<4:59:32,  2.38s/it]                                                                                                                                 {'loss': 0.1669, 'grad_norm': 0.29298296570777893, 'learning_rate': 6.248146413201444e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.23, 'epoch': 1.25}
 62%|███████████████████████████████████████████████████▏                              | 12560/20117 [8:02:20<4:59:32,  2.38s/it] 62%|███████████████████████████████████████████████████▏                              | 12561/20117 [8:02:23<5:00:56,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12562/20117 [8:02:25<4:58:03,  2.37s/it] 62%|███████████████████████████████████████████████████▏                              | 12563/20117 [8:02:27<5:00:31,  2.39s/it] 62%|███████████████████████████████████████████████████▏                              | 12564/20117 [8:02:30<5:04:17,  2.42s/it] 62%|███████████████████████████████████████████████████▏                              | 12565/20117 [8:02:32<5:03:48,  2.41s/it] 62%|███████████████████████████████████████████████████▏                              | 12566/20117 [8:02:35<5:06:02,  2.43s/it] 62%|███████████████████████████████████████████████████▏                              | 12567/20117 [8:02:37<5:07:58,  2.45s/it] 62%|███████████████████████████████████████████████████▏                              | 12568/20117 [8:02:40<5:06:25,  2.44s/it] 62%|███████████████████████████████████████████████████▏                              | 12569/20117 [8:02:42<4:58:51,  2.38s/it] 62%|███████████████████████████████████████████████████▏                              | 12570/20117 [8:02:44<4:52:33,  2.33s/it]                                                                                                                                 {'loss': 0.1134, 'grad_norm': 0.27840226888656616, 'learning_rate': 6.233602912350639e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.22, 'epoch': 1.25}
 62%|███████████████████████████████████████████████████▏                              | 12570/20117 [8:02:44<4:52:33,  2.33s/it] 62%|███████████████████████████████████████████████████▏                              | 12571/20117 [8:02:47<5:08:35,  2.45s/it] 62%|███████████████████████████████████████████████████▏                              | 12572/20117 [8:02:49<5:05:44,  2.43s/it] 62%|███████████████████████████████████████████████████▏                              | 12573/20117 [8:02:52<5:03:57,  2.42s/it] 63%|███████████████████████████████████████████████████▎                              | 12574/20117 [8:02:54<5:02:14,  2.40s/it] 63%|███████████████████████████████████████████████████▎                              | 12575/20117 [8:02:56<4:58:43,  2.38s/it] 63%|███████████████████████████████████████████████████▎                              | 12576/20117 [8:02:59<4:58:43,  2.38s/it] 63%|███████████████████████████████████████████████████▎                              | 12577/20117 [8:03:01<4:59:20,  2.38s/it] 63%|███████████████████████████████████████████████████▎                              | 12578/20117 [8:03:04<5:01:16,  2.40s/it] 63%|███████████████████████████████████████████████████▎                              | 12579/20117 [8:03:06<5:02:15,  2.41s/it] 63%|███████████████████████████████████████████████████▎                              | 12580/20117 [8:03:08<4:58:10,  2.37s/it]                                                                                                                                 {'loss': 0.1726, 'grad_norm': 0.49214980006217957, 'learning_rate': 6.219068688931908e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 394.03, 'epoch': 1.25}
 63%|███████████████████████████████████████████████████▎                              | 12580/20117 [8:03:08<4:58:10,  2.37s/it] 63%|███████████████████████████████████████████████████▎                              | 12581/20117 [8:03:11<4:55:38,  2.35s/it] 63%|███████████████████████████████████████████████████▎                              | 12582/20117 [8:03:13<4:52:19,  2.33s/it] 63%|███████████████████████████████████████████████████▎                              | 12583/20117 [8:03:15<4:48:24,  2.30s/it] 63%|███████████████████████████████████████████████████▎                              | 12584/20117 [8:03:18<4:55:40,  2.36s/it] 63%|███████████████████████████████████████████████████▎                              | 12585/20117 [8:03:20<5:00:14,  2.39s/it] 63%|███████████████████████████████████████████████████▎                              | 12586/20117 [8:03:22<5:00:33,  2.39s/it] 63%|███████████████████████████████████████████████████▎                              | 12587/20117 [8:03:25<5:01:13,  2.40s/it] 63%|███████████████████████████████████████████████████▎                              | 12588/20117 [8:03:27<5:00:34,  2.40s/it] 63%|███████████████████████████████████████████████████▎                              | 12589/20117 [8:03:30<4:58:50,  2.38s/it] 63%|███████████████████████████████████████████████████▎                              | 12590/20117 [8:03:32<5:00:11,  2.39s/it]                                                                                                                                 {'loss': 0.1014, 'grad_norm': 0.5485401153564453, 'learning_rate': 6.20454377874612e-05, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 262.31, 'epoch': 1.25}
 63%|███████████████████████████████████████████████████▎                              | 12590/20117 [8:03:32<5:00:11,  2.39s/it] 63%|███████████████████████████████████████████████████▎                              | 12591/20117 [8:03:34<5:00:04,  2.39s/it] 63%|███████████████████████████████████████████████████▎                              | 12592/20117 [8:03:37<5:00:45,  2.40s/it] 63%|███████████████████████████████████████████████████▎                              | 12593/20117 [8:03:39<5:02:37,  2.41s/it] 63%|███████████████████████████████████████████████████▎                              | 12594/20117 [8:03:42<5:05:12,  2.43s/it] 63%|███████████████████████████████████████████████████▎                              | 12595/20117 [8:03:44<5:09:05,  2.47s/it] 63%|███████████████████████████████████████████████████▎                              | 12596/20117 [8:03:47<5:08:32,  2.46s/it] 63%|███████████████████████████████████████████████████▎                              | 12597/20117 [8:03:49<5:09:44,  2.47s/it] 63%|███████████████████████████████████████████████████▎                              | 12598/20117 [8:03:52<5:12:14,  2.49s/it] 63%|███████████████████████████████████████████████████▎                              | 12599/20117 [8:03:54<5:07:16,  2.45s/it] 63%|███████████████████████████████████████████████████▎                              | 12600/20117 [8:03:57<5:08:10,  2.46s/it]                                                                                                                                 {'loss': 0.1883, 'grad_norm': 0.5006467700004578, 'learning_rate': 6.190028217571186e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.5, 'epoch': 1.25}
 63%|███████████████████████████████████████████████████▎                              | 12600/20117 [8:03:57<5:08:10,  2.46s/it] 63%|███████████████████████████████████████████████████▎                              | 12601/20117 [8:03:59<5:05:08,  2.44s/it] 63%|███████████████████████████████████████████████████▎                              | 12602/20117 [8:04:01<5:02:44,  2.42s/it] 63%|███████████████████████████████████████████████████▎                              | 12603/20117 [8:04:04<5:02:09,  2.41s/it] 63%|███████████████████████████████████████████████████▍                              | 12604/20117 [8:04:06<5:03:37,  2.42s/it] 63%|███████████████████████████████████████████████████▍                              | 12605/20117 [8:04:09<5:03:42,  2.43s/it] 63%|███████████████████████████████████████████████████▍                              | 12606/20117 [8:04:11<5:04:10,  2.43s/it] 63%|███████████████████████████████████████████████████▍                              | 12607/20117 [8:04:13<5:02:14,  2.41s/it] 63%|███████████████████████████████████████████████████▍                              | 12608/20117 [8:04:16<5:00:43,  2.40s/it] 63%|███████████████████████████████████████████████████▍                              | 12609/20117 [8:04:18<5:00:49,  2.40s/it] 63%|███████████████████████████████████████████████████▍                              | 12610/20117 [8:04:20<4:56:12,  2.37s/it]                                                                                                                                 {'loss': 0.1559, 'grad_norm': 0.4242802560329437, 'learning_rate': 6.175522041162016e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.2, 'epoch': 1.25}
 63%|███████████████████████████████████████████████████▍                              | 12610/20117 [8:04:20<4:56:12,  2.37s/it] 63%|███████████████████████████████████████████████████▍                              | 12611/20117 [8:04:23<4:57:10,  2.38s/it] 63%|███████████████████████████████████████████████████▍                              | 12612/20117 [8:04:26<5:07:52,  2.46s/it] 63%|███████████████████████████████████████████████████▍                              | 12613/20117 [8:04:28<5:24:24,  2.59s/it] 63%|███████████████████████████████████████████████████▍                              | 12614/20117 [8:04:31<5:22:14,  2.58s/it] 63%|███████████████████████████████████████████████████▍                              | 12615/20117 [8:04:33<5:15:58,  2.53s/it] 63%|███████████████████████████████████████████████████▍                              | 12616/20117 [8:04:36<5:08:51,  2.47s/it] 63%|███████████████████████████████████████████████████▍                              | 12617/20117 [8:04:38<5:06:46,  2.45s/it] 63%|███████████████████████████████████████████████████▍                              | 12618/20117 [8:04:41<5:06:23,  2.45s/it] 63%|███████████████████████████████████████████████████▍                              | 12619/20117 [8:04:43<5:04:46,  2.44s/it] 63%|███████████████████████████████████████████████████▍                              | 12620/20117 [8:04:45<5:01:22,  2.41s/it]                                                                                                                                 {'loss': 0.1551, 'grad_norm': 0.532721996307373, 'learning_rate': 6.161025285250373e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.02, 'epoch': 1.25}
 63%|███████████████████████████████████████████████████▍                              | 12620/20117 [8:04:45<5:01:22,  2.41s/it] 63%|███████████████████████████████████████████████████▍                              | 12621/20117 [8:04:48<5:00:24,  2.40s/it] 63%|███████████████████████████████████████████████████▍                              | 12622/20117 [8:04:50<5:02:45,  2.42s/it] 63%|███████████████████████████████████████████████████▍                              | 12623/20117 [8:04:53<5:16:56,  2.54s/it] 63%|███████████████████████████████████████████████████▍                              | 12624/20117 [8:04:55<5:13:24,  2.51s/it] 63%|███████████████████████████████████████████████████▍                              | 12625/20117 [8:04:58<5:08:48,  2.47s/it] 63%|███████████████████████████████████████████████████▍                              | 12626/20117 [8:05:00<5:05:44,  2.45s/it] 63%|███████████████████████████████████████████████████▍                              | 12627/20117 [8:05:03<5:03:17,  2.43s/it] 63%|███████████████████████████████████████████████████▍                              | 12628/20117 [8:05:05<5:01:54,  2.42s/it] 63%|███████████████████████████████████████████████████▍                              | 12629/20117 [8:05:07<4:58:28,  2.39s/it] 63%|███████████████████████████████████████████████████▍                              | 12630/20117 [8:05:10<4:59:11,  2.40s/it]                                                                                                                                 {'loss': 0.1887, 'grad_norm': 0.6736690998077393, 'learning_rate': 6.146537985544843e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.69, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▍                              | 12630/20117 [8:05:10<4:59:11,  2.40s/it] 63%|███████████████████████████████████████████████████▍                              | 12631/20117 [8:05:12<4:56:11,  2.37s/it] 63%|███████████████████████████████████████████████████▍                              | 12632/20117 [8:05:15<5:01:16,  2.42s/it] 63%|███████████████████████████████████████████████████▍                              | 12633/20117 [8:05:17<5:01:49,  2.42s/it] 63%|███████████████████████████████████████████████████▍                              | 12634/20117 [8:05:19<5:03:05,  2.43s/it] 63%|███████████████████████████████████████████████████▌                              | 12635/20117 [8:05:22<5:06:01,  2.45s/it] 63%|███████████████████████████████████████████████████▌                              | 12636/20117 [8:05:24<5:05:08,  2.45s/it] 63%|███████████████████████████████████████████████████▌                              | 12637/20117 [8:05:27<5:02:52,  2.43s/it] 63%|███████████████████████████████████████████████████▌                              | 12638/20117 [8:05:29<5:00:30,  2.41s/it] 63%|███████████████████████████████████████████████████▌                              | 12639/20117 [8:05:32<5:03:38,  2.44s/it] 63%|███████████████████████████████████████████████████▌                              | 12640/20117 [8:05:34<5:00:34,  2.41s/it]                                                                                                                                 {'loss': 0.1409, 'grad_norm': 0.4985780417919159, 'learning_rate': 6.132060177730698e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.02, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▌                              | 12640/20117 [8:05:34<5:00:34,  2.41s/it] 63%|███████████████████████████████████████████████████▌                              | 12641/20117 [8:05:36<5:00:21,  2.41s/it] 63%|███████████████████████████████████████████████████▌                              | 12642/20117 [8:05:39<5:01:01,  2.42s/it] 63%|███████████████████████████████████████████████████▌                              | 12643/20117 [8:05:41<4:59:55,  2.41s/it] 63%|███████████████████████████████████████████████████▌                              | 12644/20117 [8:05:44<4:58:58,  2.40s/it] 63%|███████████████████████████████████████████████████▌                              | 12645/20117 [8:05:46<4:55:14,  2.37s/it] 63%|███████████████████████████████████████████████████▌                              | 12646/20117 [8:05:48<4:57:13,  2.39s/it] 63%|███████████████████████████████████████████████████▌                              | 12647/20117 [8:05:51<4:58:01,  2.39s/it] 63%|███████████████████████████████████████████████████▌                              | 12648/20117 [8:05:53<4:54:57,  2.37s/it] 63%|███████████████████████████████████████████████████▌                              | 12649/20117 [8:05:55<4:49:45,  2.33s/it] 63%|███████████████████████████████████████████████████▌                              | 12650/20117 [8:05:58<4:48:11,  2.32s/it]                                                                                                                                 {'loss': 0.1823, 'grad_norm': 49.2108268737793, 'learning_rate': 6.117591897469847e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.61, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▌                              | 12650/20117 [8:05:58<4:48:11,  2.32s/it] 63%|███████████████████████████████████████████████████▌                              | 12651/20117 [8:06:00<4:48:08,  2.32s/it] 63%|███████████████████████████████████████████████████▌                              | 12652/20117 [8:06:02<4:52:10,  2.35s/it] 63%|███████████████████████████████████████████████████▌                              | 12653/20117 [8:06:05<4:53:41,  2.36s/it] 63%|███████████████████████████████████████████████████▌                              | 12654/20117 [8:06:07<4:53:36,  2.36s/it] 63%|███████████████████████████████████████████████████▌                              | 12655/20117 [8:06:09<4:51:49,  2.35s/it] 63%|███████████████████████████████████████████████████▌                              | 12656/20117 [8:06:12<4:53:42,  2.36s/it] 63%|███████████████████████████████████████████████████▌                              | 12657/20117 [8:06:14<4:58:27,  2.40s/it] 63%|███████████████████████████████████████████████████▌                              | 12658/20117 [8:06:17<4:52:42,  2.35s/it] 63%|███████████████████████████████████████████████████▌                              | 12659/20117 [8:06:19<4:53:44,  2.36s/it] 63%|███████████████████████████████████████████████████▌                              | 12660/20117 [8:06:21<4:50:53,  2.34s/it]                                                                                                                                 {'loss': 0.2102, 'grad_norm': 0.4028538465499878, 'learning_rate': 6.1031331804007154e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.74, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▌                              | 12660/20117 [8:06:21<4:50:53,  2.34s/it] 63%|███████████████████████████████████████████████████▌                              | 12661/20117 [8:06:23<4:49:32,  2.33s/it] 63%|███████████████████████████████████████████████████▌                              | 12662/20117 [8:06:26<4:45:18,  2.30s/it] 63%|███████████████████████████████████████████████████▌                              | 12663/20117 [8:06:28<4:42:56,  2.28s/it] 63%|███████████████████████████████████████████████████▌                              | 12664/20117 [8:06:30<4:48:24,  2.32s/it] 63%|███████████████████████████████████████████████████▌                              | 12665/20117 [8:06:33<4:54:18,  2.37s/it] 63%|███████████████████████████████████████████████████▋                              | 12666/20117 [8:06:35<4:54:38,  2.37s/it] 63%|███████████████████████████████████████████████████▋                              | 12667/20117 [8:06:38<4:58:03,  2.40s/it] 63%|███████████████████████████████████████████████████▋                              | 12668/20117 [8:06:40<4:57:24,  2.40s/it] 63%|███████████████████████████████████████████████████▋                              | 12669/20117 [8:06:43<4:58:39,  2.41s/it] 63%|███████████████████████████████████████████████████▋                              | 12670/20117 [8:06:45<4:59:59,  2.42s/it]                                                                                                                                 {'loss': 0.153, 'grad_norm': 0.47767624258995056, 'learning_rate': 6.0886840621381856e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.38, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▋                              | 12670/20117 [8:06:45<4:59:59,  2.42s/it] 63%|███████████████████████████████████████████████████▋                              | 12671/20117 [8:06:47<4:57:48,  2.40s/it] 63%|███████████████████████████████████████████████████▋                              | 12672/20117 [8:06:50<4:59:20,  2.41s/it] 63%|███████████████████████████████████████████████████▋                              | 12673/20117 [8:06:52<4:59:02,  2.41s/it] 63%|███████████████████████████████████████████████████▋                              | 12674/20117 [8:06:55<4:59:18,  2.41s/it] 63%|███████████████████████████████████████████████████▋                              | 12675/20117 [8:06:57<4:59:43,  2.42s/it] 63%|███████████████████████████████████████████████████▋                              | 12676/20117 [8:07:00<5:15:37,  2.55s/it] 63%|███████████████████████████████████████████████████▋                              | 12677/20117 [8:07:02<5:03:47,  2.45s/it] 63%|███████████████████████████████████████████████████▋                              | 12678/20117 [8:07:05<5:03:47,  2.45s/it] 63%|███████████████████████████████████████████████████▋                              | 12679/20117 [8:07:07<5:04:15,  2.45s/it] 63%|███████████████████████████████████████████████████▋                              | 12680/20117 [8:07:09<5:03:10,  2.45s/it]                                                                                                                                 {'loss': 0.15, 'grad_norm': 0.3719147741794586, 'learning_rate': 6.0742445782734825e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.13, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▋                              | 12680/20117 [8:07:09<5:03:10,  2.45s/it] 63%|███████████████████████████████████████████████████▋                              | 12681/20117 [8:07:12<5:00:53,  2.43s/it] 63%|███████████████████████████████████████████████████▋                              | 12682/20117 [8:07:14<4:58:29,  2.41s/it] 63%|███████████████████████████████████████████████████▋                              | 12683/20117 [8:07:16<4:54:23,  2.38s/it] 63%|███████████████████████████████████████████████████▋                              | 12684/20117 [8:07:19<4:55:24,  2.38s/it] 63%|███████████████████████████████████████████████████▋                              | 12685/20117 [8:07:21<4:56:15,  2.39s/it] 63%|███████████████████████████████████████████████████▋                              | 12686/20117 [8:07:24<4:54:12,  2.38s/it] 63%|███████████████████████████████████████████████████▋                              | 12687/20117 [8:07:26<5:00:03,  2.42s/it] 63%|███████████████████████████████████████████████████▋                              | 12688/20117 [8:07:29<5:00:38,  2.43s/it] 63%|███████████████████████████████████████████████████▋                              | 12689/20117 [8:07:31<4:59:38,  2.42s/it] 63%|███████████████████████████████████████████████████▋                              | 12690/20117 [8:07:33<4:59:38,  2.42s/it]                                                                                                                                 {'loss': 0.144, 'grad_norm': 0.32970771193504333, 'learning_rate': 6.0598147643741124e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 285.76, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▋                              | 12690/20117 [8:07:33<4:59:38,  2.42s/it] 63%|███████████████████████████████████████████████████▋                              | 12691/20117 [8:07:36<4:59:32,  2.42s/it] 63%|███████████████████████████████████████████████████▋                              | 12692/20117 [8:07:38<4:58:47,  2.41s/it] 63%|███████████████████████████████████████████████████▋                              | 12693/20117 [8:07:41<5:00:35,  2.43s/it] 63%|███████████████████████████████████████████████████▋                              | 12694/20117 [8:07:43<4:55:28,  2.39s/it] 63%|███████████████████████████████████████████████████▋                              | 12695/20117 [8:07:45<4:54:15,  2.38s/it] 63%|███████████████████████████████████████████████████▊                              | 12696/20117 [8:07:48<4:55:16,  2.39s/it] 63%|███████████████████████████████████████████████████▊                              | 12697/20117 [8:07:50<4:55:56,  2.39s/it] 63%|███████████████████████████████████████████████████▊                              | 12698/20117 [8:07:53<4:58:45,  2.42s/it] 63%|███████████████████████████████████████████████████▊                              | 12699/20117 [8:07:55<5:00:29,  2.43s/it] 63%|███████████████████████████████████████████████████▊                              | 12700/20117 [8:07:58<5:03:08,  2.45s/it]                                                                                                                                 {'loss': 0.149, 'grad_norm': 0.40551966428756714, 'learning_rate': 6.045394655983753e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 275.71, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▊                              | 12700/20117 [8:07:58<5:03:08,  2.45s/it] 63%|███████████████████████████████████████████████████▊                              | 12701/20117 [8:08:00<5:02:47,  2.45s/it] 63%|███████████████████████████████████████████████████▊                              | 12702/20117 [8:08:02<5:02:19,  2.45s/it] 63%|███████████████████████████████████████████████████▊                              | 12703/20117 [8:08:05<5:04:04,  2.46s/it] 63%|███████████████████████████████████████████████████▊                              | 12704/20117 [8:08:07<5:02:35,  2.45s/it] 63%|███████████████████████████████████████████████████▊                              | 12705/20117 [8:08:10<5:02:09,  2.45s/it] 63%|███████████████████████████████████████████████████▊                              | 12706/20117 [8:08:12<4:58:47,  2.42s/it] 63%|███████████████████████████████████████████████████▊                              | 12707/20117 [8:08:15<4:58:46,  2.42s/it] 63%|███████████████████████████████████████████████████▊                              | 12708/20117 [8:08:17<4:57:33,  2.41s/it] 63%|███████████████████████████████████████████████████▊                              | 12709/20117 [8:08:19<4:55:23,  2.39s/it] 63%|███████████████████████████████████████████████████▊                              | 12710/20117 [8:08:22<4:55:27,  2.39s/it]                                                                                                                                 {'loss': 0.1615, 'grad_norm': 0.40349385142326355, 'learning_rate': 6.0309842886221826e-05, 'memory/max_active (GiB)': 18.16, 'memory/max_allocated (GiB)': 18.16, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 374.56, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▊                              | 12710/20117 [8:08:22<4:55:27,  2.39s/it] 63%|███████████████████████████████████████████████████▊                              | 12711/20117 [8:08:24<4:57:01,  2.41s/it] 63%|███████████████████████████████████████████████████▊                              | 12712/20117 [8:08:27<4:54:26,  2.39s/it] 63%|███████████████████████████████████████████████████▊                              | 12713/20117 [8:08:29<4:55:08,  2.39s/it] 63%|███████████████████████████████████████████████████▊                              | 12714/20117 [8:08:31<4:55:36,  2.40s/it] 63%|███████████████████████████████████████████████████▊                              | 12715/20117 [8:08:34<4:52:53,  2.37s/it] 63%|███████████████████████████████████████████████████▊                              | 12716/20117 [8:08:36<4:56:00,  2.40s/it] 63%|███████████████████████████████████████████████████▊                              | 12717/20117 [8:08:39<4:58:55,  2.42s/it] 63%|███████████████████████████████████████████████████▊                              | 12718/20117 [8:08:41<5:01:02,  2.44s/it] 63%|███████████████████████████████████████████████████▊                              | 12719/20117 [8:08:43<4:58:51,  2.42s/it] 63%|███████████████████████████████████████████████████▊                              | 12720/20117 [8:08:46<4:59:49,  2.43s/it]                                                                                                                                 {'loss': 0.1828, 'grad_norm': 0.5158276557922363, 'learning_rate': 6.0165836977851796e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 404.51, 'epoch': 1.26}
 63%|███████████████████████████████████████████████████▊                              | 12720/20117 [8:08:46<4:59:49,  2.43s/it] 63%|███████████████████████████████████████████████████▊                              | 12721/20117 [8:08:48<4:59:19,  2.43s/it] 63%|███████████████████████████████████████████████████▊                              | 12722/20117 [8:08:51<4:57:08,  2.41s/it] 63%|███████████████████████████████████████████████████▊                              | 12723/20117 [8:08:53<4:56:08,  2.40s/it] 63%|███████████████████████████████████████████████████▊                              | 12724/20117 [8:08:55<4:50:40,  2.36s/it] 63%|███████████████████████████████████████████████████▊                              | 12725/20117 [8:08:58<4:52:33,  2.37s/it] 63%|███████████████████████████████████████████████████▊                              | 12726/20117 [8:09:00<4:48:05,  2.34s/it] 63%|███████████████████████████████████████████████████▉                              | 12727/20117 [8:09:02<4:50:47,  2.36s/it] 63%|███████████████████████████████████████████████████▉                              | 12728/20117 [8:09:05<4:51:18,  2.37s/it] 63%|███████████████████████████████████████████████████▉                              | 12729/20117 [8:09:07<4:54:31,  2.39s/it] 63%|███████████████████████████████████████████████████▉                              | 12730/20117 [8:09:10<5:11:46,  2.53s/it]                                                                                                                                 {'loss': 0.1535, 'grad_norm': 0.42637962102890015, 'learning_rate': 6.0021929189444416e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 268.86, 'epoch': 1.27}
 63%|███████████████████████████████████████████████████▉                              | 12730/20117 [8:09:10<5:11:46,  2.53s/it] 63%|███████████████████████████████████████████████████▉                              | 12731/20117 [8:09:13<5:07:56,  2.50s/it] 63%|███████████████████████████████████████████████████▉                              | 12732/20117 [8:09:15<5:06:56,  2.49s/it] 63%|███████████████████████████████████████████████████▉                              | 12733/20117 [8:09:17<5:00:06,  2.44s/it] 63%|███████████████████████████████████████████████████▉                              | 12734/20117 [8:09:20<4:52:24,  2.38s/it] 63%|███████████████████████████████████████████████████▉                              | 12735/20117 [8:09:22<4:49:49,  2.36s/it] 63%|███████████████████████████████████████████████████▉                              | 12736/20117 [8:09:24<4:46:13,  2.33s/it] 63%|███████████████████████████████████████████████████▉                              | 12737/20117 [8:09:27<4:49:55,  2.36s/it] 63%|███████████████████████████████████████████████████▉                              | 12738/20117 [8:09:29<4:53:10,  2.38s/it] 63%|███████████████████████████████████████████████████▉                              | 12739/20117 [8:09:31<4:56:23,  2.41s/it] 63%|███████████████████████████████████████████████████▉                              | 12740/20117 [8:09:34<4:51:23,  2.37s/it]                                                                                                                                 {'loss': 0.1774, 'grad_norm': 0.3736035227775574, 'learning_rate': 5.987811987547504e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 410.2, 'epoch': 1.27}
 63%|███████████████████████████████████████████████████▉                              | 12740/20117 [8:09:34<4:51:23,  2.37s/it] 63%|███████████████████████████████████████████████████▉                              | 12741/20117 [8:09:36<4:53:00,  2.38s/it] 63%|███████████████████████████████████████████████████▉                              | 12742/20117 [8:09:39<4:51:14,  2.37s/it] 63%|███████████████████████████████████████████████████▉                              | 12743/20117 [8:09:41<4:52:47,  2.38s/it] 63%|███████████████████████████████████████████████████▉                              | 12744/20117 [8:09:43<4:52:15,  2.38s/it] 63%|███████████████████████████████████████████████████▉                              | 12745/20117 [8:09:46<4:58:59,  2.43s/it] 63%|███████████████████████████████████████████████████▉                              | 12746/20117 [8:09:48<4:52:58,  2.38s/it] 63%|███████████████████████████████████████████████████▉                              | 12747/20117 [8:09:50<4:49:13,  2.35s/it] 63%|███████████████████████████████████████████████████▉                              | 12748/20117 [8:09:53<4:44:19,  2.32s/it] 63%|███████████████████████████████████████████████████▉                              | 12749/20117 [8:09:55<4:45:08,  2.32s/it] 63%|███████████████████████████████████████████████████▉                              | 12750/20117 [8:09:57<4:50:56,  2.37s/it]                                                                                                                                 {'loss': 0.1869, 'grad_norm': 0.479159414768219, 'learning_rate': 5.9734409390176315e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 290.14, 'epoch': 1.27}
 63%|███████████████████████████████████████████████████▉                              | 12750/20117 [8:09:57<4:50:56,  2.37s/it] 63%|███████████████████████████████████████████████████▉                              | 12751/20117 [8:10:00<4:53:39,  2.39s/it] 63%|███████████████████████████████████████████████████▉                              | 12752/20117 [8:10:02<4:55:13,  2.41s/it] 63%|███████████████████████████████████████████████████▉                              | 12753/20117 [8:10:05<4:57:14,  2.42s/it] 63%|███████████████████████████████████████████████████▉                              | 12754/20117 [8:10:07<4:57:58,  2.43s/it] 63%|███████████████████████████████████████████████████▉                              | 12755/20117 [8:10:10<4:57:27,  2.42s/it] 63%|███████████████████████████████████████████████████▉                              | 12756/20117 [8:10:12<4:57:01,  2.42s/it] 63%|███████████████████████████████████████████████████▉                              | 12757/20117 [8:10:14<4:56:16,  2.42s/it] 63%|████████████████████████████████████████████████████                              | 12758/20117 [8:10:17<4:54:33,  2.40s/it] 63%|████████████████████████████████████████████████████                              | 12759/20117 [8:10:19<4:56:57,  2.42s/it] 63%|████████████████████████████████████████████████████                              | 12760/20117 [8:10:22<4:54:40,  2.40s/it]                                                                                                                                 {'loss': 0.1731, 'grad_norm': 0.4126437306404114, 'learning_rate': 5.959079808753765e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.34, 'epoch': 1.27}
 63%|████████████████████████████████████████████████████                              | 12760/20117 [8:10:22<4:54:40,  2.40s/it] 63%|████████████████████████████████████████████████████                              | 12761/20117 [8:10:24<4:55:51,  2.41s/it] 63%|████████████████████████████████████████████████████                              | 12762/20117 [8:10:26<4:51:29,  2.38s/it] 63%|████████████████████████████████████████████████████                              | 12763/20117 [8:10:29<4:53:01,  2.39s/it] 63%|████████████████████████████████████████████████████                              | 12764/20117 [8:10:31<4:51:54,  2.38s/it] 63%|████████████████████████████████████████████████████                              | 12765/20117 [8:10:34<4:52:54,  2.39s/it] 63%|████████████████████████████████████████████████████                              | 12766/20117 [8:10:36<4:53:24,  2.39s/it] 63%|████████████████████████████████████████████████████                              | 12767/20117 [8:10:38<4:49:39,  2.36s/it] 63%|████████████████████████████████████████████████████                              | 12768/20117 [8:10:41<4:51:12,  2.38s/it] 63%|████████████████████████████████████████████████████                              | 12769/20117 [8:10:43<4:51:40,  2.38s/it] 63%|████████████████████████████████████████████████████                              | 12770/20117 [8:10:45<4:52:30,  2.39s/it]                                                                                                                                 {'loss': 0.1612, 'grad_norm': 0.5344352722167969, 'learning_rate': 5.944728632130392e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.8, 'epoch': 1.27}
 63%|████████████████████████████████████████████████████                              | 12770/20117 [8:10:45<4:52:30,  2.39s/it] 63%|████████████████████████████████████████████████████                              | 12771/20117 [8:10:48<4:54:44,  2.41s/it] 63%|████████████████████████████████████████████████████                              | 12772/20117 [8:10:50<4:51:22,  2.38s/it] 63%|████████████████████████████████████████████████████                              | 12773/20117 [8:10:53<4:52:33,  2.39s/it] 63%|████████████████████████████████████████████████████                              | 12774/20117 [8:10:55<4:51:39,  2.38s/it] 64%|████████████████████████████████████████████████████                              | 12775/20117 [8:10:57<4:52:01,  2.39s/it] 64%|████████████████████████████████████████████████████                              | 12776/20117 [8:11:00<4:55:05,  2.41s/it] 64%|████████████████████████████████████████████████████                              | 12777/20117 [8:11:02<4:54:38,  2.41s/it] 64%|████████████████████████████████████████████████████                              | 12778/20117 [8:11:05<4:57:40,  2.43s/it] 64%|████████████████████████████████████████████████████                              | 12779/20117 [8:11:07<4:56:04,  2.42s/it] 64%|████████████████████████████████████████████████████                              | 12780/20117 [8:11:10<4:57:01,  2.43s/it]                                                                                                                                 {'loss': 0.1296, 'grad_norm': 0.4766976833343506, 'learning_rate': 5.9303874444975005e-05, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 292.47, 'epoch': 1.27}
 64%|████████████████████████████████████████████████████                              | 12780/20117 [8:11:10<4:57:01,  2.43s/it] 64%|████████████████████████████████████████████████████                              | 12781/20117 [8:11:13<5:13:58,  2.57s/it] 64%|████████████████████████████████████████████████████                              | 12782/20117 [8:11:15<5:09:22,  2.53s/it] 64%|████████████████████████████████████████████████████                              | 12783/20117 [8:11:17<5:08:19,  2.52s/it] 64%|████████████████████████████████████████████████████                              | 12784/20117 [8:11:20<5:06:01,  2.50s/it] 64%|████████████████████████████████████████████████████                              | 12785/20117 [8:11:22<5:04:39,  2.49s/it] 64%|████████████████████████████████████████████████████                              | 12786/20117 [8:11:25<5:01:24,  2.47s/it] 64%|████████████████████████████████████████████████████                              | 12787/20117 [8:11:27<4:59:05,  2.45s/it] 64%|████████████████████████████████████████████████████▏                             | 12788/20117 [8:11:30<4:59:32,  2.45s/it] 64%|████████████████████████████████████████████████████▏                             | 12789/20117 [8:11:32<5:02:53,  2.48s/it] 64%|████████████████████████████████████████████████████▏                             | 12790/20117 [8:11:35<5:04:11,  2.49s/it]                                                                                                                                 {'loss': 0.181, 'grad_norm': 0.42096179723739624, 'learning_rate': 5.9160562811804644e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.43, 'epoch': 1.27}
 64%|████████████████████████████████████████████████████▏                             | 12790/20117 [8:11:35<5:04:11,  2.49s/it] 64%|████████████████████████████████████████████████████▏                             | 12791/20117 [8:11:37<5:01:24,  2.47s/it] 64%|████████████████████████████████████████████████████▏                             | 12792/20117 [8:11:40<5:01:28,  2.47s/it] 64%|████████████████████████████████████████████████████▏                             | 12793/20117 [8:11:42<4:59:40,  2.46s/it] 64%|████████████████████████████████████████████████████▏                             | 12794/20117 [8:11:44<4:57:50,  2.44s/it] 64%|████████████████████████████████████████████████████▏                             | 12795/20117 [8:11:47<4:59:03,  2.45s/it] 64%|████████████████████████████████████████████████████▏                             | 12796/20117 [8:11:49<4:56:43,  2.43s/it] 64%|████████████████████████████████████████████████████▏                             | 12797/20117 [8:11:52<4:57:23,  2.44s/it] 64%|████████████████████████████████████████████████████▏                             | 12798/20117 [8:11:54<4:55:16,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12799/20117 [8:11:57<4:56:05,  2.43s/it] 64%|████████████████████████████████████████████████████▏                             | 12800/20117 [8:11:59<4:56:44,  2.43s/it]                                                                                                                                 {'loss': 0.1565, 'grad_norm': 0.41255730390548706, 'learning_rate': 5.901735177479972e-05, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 291.39, 'epoch': 1.27}
 64%|████████████████████████████████████████████████████▏                             | 12800/20117 [8:11:59<4:56:44,  2.43s/it] 64%|████████████████████████████████████████████████████▏                             | 12801/20117 [8:12:01<4:55:16,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12802/20117 [8:12:04<4:54:38,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12803/20117 [8:12:06<4:54:54,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12804/20117 [8:12:09<4:53:21,  2.41s/it] 64%|████████████████████████████████████████████████████▏                             | 12805/20117 [8:12:11<4:54:26,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12806/20117 [8:12:13<4:54:09,  2.41s/it] 64%|████████████████████████████████████████████████████▏                             | 12807/20117 [8:12:16<4:54:25,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12808/20117 [8:12:18<4:54:22,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12809/20117 [8:12:21<4:55:01,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12810/20117 [8:12:23<4:54:45,  2.42s/it]                                                                                                                                 {'loss': 0.1288, 'grad_norm': 0.4876402020454407, 'learning_rate': 5.8874241686719234e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 297.94, 'epoch': 1.27}
 64%|████████████████████████████████████████████████████▏                             | 12810/20117 [8:12:23<4:54:45,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12811/20117 [8:12:26<4:52:39,  2.40s/it] 64%|████████████████████████████████████████████████████▏                             | 12812/20117 [8:12:28<4:55:10,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12813/20117 [8:12:30<4:55:28,  2.43s/it] 64%|████████████████████████████████████████████████████▏                             | 12814/20117 [8:12:33<4:53:52,  2.41s/it] 64%|████████████████████████████████████████████████████▏                             | 12815/20117 [8:12:35<4:55:32,  2.43s/it] 64%|████████████████████████████████████████████████████▏                             | 12816/20117 [8:12:38<4:54:20,  2.42s/it] 64%|████████████████████████████████████████████████████▏                             | 12817/20117 [8:12:40<4:52:03,  2.40s/it] 64%|████████████████████████████████████████████████████▏                             | 12818/20117 [8:12:42<4:47:28,  2.36s/it] 64%|████████████████████████████████████████████████████▎                             | 12819/20117 [8:12:45<4:45:56,  2.35s/it] 64%|████████████████████████████████████████████████████▎                             | 12820/20117 [8:12:47<4:51:10,  2.39s/it]                                                                                                                                 {'loss': 0.1144, 'grad_norm': 0.47068750858306885, 'learning_rate': 5.873123290007363e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 293.08, 'epoch': 1.27}
 64%|████████████████████████████████████████████████████▎                             | 12820/20117 [8:12:47<4:51:10,  2.39s/it] 64%|████████████████████████████████████████████████████▎                             | 12821/20117 [8:12:50<4:55:29,  2.43s/it] 64%|████████████████████████████████████████████████████▎                             | 12822/20117 [8:12:52<4:55:26,  2.43s/it] 64%|████████████████████████████████████████████████████▎                             | 12823/20117 [8:12:55<5:00:46,  2.47s/it] 64%|████████████████████████████████████████████████████▎                             | 12824/20117 [8:12:57<4:58:16,  2.45s/it] 64%|████████████████████████████████████████████████████▎                             | 12825/20117 [8:12:59<4:52:53,  2.41s/it] 64%|████████████████████████████████████████████████████▎                             | 12826/20117 [8:13:02<4:52:10,  2.40s/it] 64%|████████████████████████████████████████████████████▎                             | 12827/20117 [8:13:04<4:53:14,  2.41s/it] 64%|████████████████████████████████████████████████████▎                             | 12828/20117 [8:13:07<4:56:02,  2.44s/it] 64%|████████████████████████████████████████████████████▎                             | 12829/20117 [8:13:09<4:59:10,  2.46s/it] 64%|████████████████████████████████████████████████████▎                             | 12830/20117 [8:13:12<5:00:43,  2.48s/it]                                                                                                                                 {'loss': 0.1776, 'grad_norm': 0.5882211327552795, 'learning_rate': 5.8588325767123694e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.75, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▎                             | 12830/20117 [8:13:12<5:00:43,  2.48s/it] 64%|████████████████████████████████████████████████████▎                             | 12831/20117 [8:13:14<5:03:03,  2.50s/it] 64%|████████████████████████████████████████████████████▎                             | 12832/20117 [8:13:17<5:03:20,  2.50s/it] 64%|████████████████████████████████████████████████████▎                             | 12833/20117 [8:13:20<5:19:51,  2.63s/it] 64%|████████████████████████████████████████████████████▎                             | 12834/20117 [8:13:22<5:12:05,  2.57s/it] 64%|████████████████████████████████████████████████████▎                             | 12835/20117 [8:13:25<5:07:03,  2.53s/it] 64%|████████████████████████████████████████████████████▎                             | 12836/20117 [8:13:27<5:05:14,  2.52s/it] 64%|████████████████████████████████████████████████████▎                             | 12837/20117 [8:13:30<5:03:09,  2.50s/it] 64%|████████████████████████████████████████████████████▎                             | 12838/20117 [8:13:32<5:05:58,  2.52s/it] 64%|████████████████████████████████████████████████████▎                             | 12839/20117 [8:13:35<5:06:03,  2.52s/it] 64%|████████████████████████████████████████████████████▎                             | 12840/20117 [8:13:37<5:08:39,  2.54s/it]                                                                                                                                 {'loss': 0.1555, 'grad_norm': 0.43064388632774353, 'learning_rate': 5.844552063987997e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.44, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▎                             | 12840/20117 [8:13:37<5:08:39,  2.54s/it] 64%|████████████████████████████████████████████████████▎                             | 12841/20117 [8:13:40<5:07:43,  2.54s/it] 64%|████████████████████████████████████████████████████▎                             | 12842/20117 [8:13:42<5:06:26,  2.53s/it] 64%|████████████████████████████████████████████████████▎                             | 12843/20117 [8:13:45<5:04:55,  2.52s/it] 64%|████████████████████████████████████████████████████▎                             | 12844/20117 [8:13:47<5:02:39,  2.50s/it] 64%|████████████████████████████████████████████████████▎                             | 12845/20117 [8:13:50<4:59:11,  2.47s/it] 64%|████████████████████████████████████████████████████▎                             | 12846/20117 [8:13:52<4:57:36,  2.46s/it] 64%|████████████████████████████████████████████████████▎                             | 12847/20117 [8:13:54<4:54:34,  2.43s/it] 64%|████████████████████████████████████████████████████▎                             | 12848/20117 [8:13:57<4:53:36,  2.42s/it] 64%|████████████████████████████████████████████████████▎                             | 12849/20117 [8:13:59<4:48:42,  2.38s/it] 64%|████████████████████████████████████████████████████▍                             | 12850/20117 [8:14:02<4:51:05,  2.40s/it]                                                                                                                                 {'loss': 0.1722, 'grad_norm': 0.6673071980476379, 'learning_rate': 5.830281787010166e-05, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.48, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▍                             | 12850/20117 [8:14:02<4:51:05,  2.40s/it] 64%|████████████████████████████████████████████████████▍                             | 12851/20117 [8:14:04<4:52:28,  2.42s/it] 64%|████████████████████████████████████████████████████▍                             | 12852/20117 [8:14:06<4:53:21,  2.42s/it] 64%|████████████████████████████████████████████████████▍                             | 12853/20117 [8:14:09<4:52:48,  2.42s/it] 64%|████████████████████████████████████████████████████▍                             | 12854/20117 [8:14:11<4:51:42,  2.41s/it] 64%|████████████████████████████████████████████████████▍                             | 12855/20117 [8:14:14<4:56:43,  2.45s/it] 64%|████████████████████████████████████████████████████▍                             | 12856/20117 [8:14:16<4:54:57,  2.44s/it] 64%|████████████████████████████████████████████████████▍                             | 12857/20117 [8:14:19<4:55:58,  2.45s/it] 64%|████████████████████████████████████████████████████▍                             | 12858/20117 [8:14:21<4:51:56,  2.41s/it] 64%|████████████████████████████████████████████████████▍                             | 12859/20117 [8:14:23<4:55:07,  2.44s/it] 64%|████████████████████████████████████████████████████▍                             | 12860/20117 [8:14:26<4:55:26,  2.44s/it]                                                                                                                                 {'loss': 0.1647, 'grad_norm': 0.6424853801727295, 'learning_rate': 5.8160217809295826e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.7, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▍                             | 12860/20117 [8:14:26<4:55:26,  2.44s/it] 64%|████████████████████████████████████████████████████▍                             | 12861/20117 [8:14:28<4:53:46,  2.43s/it] 64%|████████████████████████████████████████████████████▍                             | 12862/20117 [8:14:31<4:50:39,  2.40s/it] 64%|████████████████████████████████████████████████████▍                             | 12863/20117 [8:14:33<4:51:39,  2.41s/it] 64%|████████████████████████████████████████████████████▍                             | 12864/20117 [8:14:35<4:48:22,  2.39s/it] 64%|████████████████████████████████████████████████████▍                             | 12865/20117 [8:14:38<4:51:01,  2.41s/it] 64%|████████████████████████████████████████████████████▍                             | 12866/20117 [8:14:40<4:50:19,  2.40s/it] 64%|████████████████████████████████████████████████████▍                             | 12867/20117 [8:14:43<4:51:02,  2.41s/it] 64%|████████████████████████████████████████████████████▍                             | 12868/20117 [8:14:45<4:47:53,  2.38s/it] 64%|████████████████████████████████████████████████████▍                             | 12869/20117 [8:14:47<4:48:31,  2.39s/it] 64%|████████████████████████████████████████████████████▍                             | 12870/20117 [8:14:50<4:47:33,  2.38s/it]                                                                                                                                 {'loss': 0.1304, 'grad_norm': 0.5545419454574585, 'learning_rate': 5.801772080871659e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.79, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▍                             | 12870/20117 [8:14:50<4:47:33,  2.38s/it] 64%|████████████████████████████████████████████████████▍                             | 12871/20117 [8:14:52<4:48:05,  2.39s/it] 64%|████████████████████████████████████████████████████▍                             | 12872/20117 [8:14:55<4:49:56,  2.40s/it] 64%|████████████████████████████████████████████████████▍                             | 12873/20117 [8:14:57<4:48:58,  2.39s/it] 64%|████████████████████████████████████████████████████▍                             | 12874/20117 [8:14:59<4:47:45,  2.38s/it] 64%|████████████████████████████████████████████████████▍                             | 12875/20117 [8:15:02<4:48:29,  2.39s/it] 64%|████████████████████████████████████████████████████▍                             | 12876/20117 [8:15:04<4:48:40,  2.39s/it] 64%|████████████████████████████████████████████████████▍                             | 12877/20117 [8:15:07<4:50:06,  2.40s/it] 64%|████████████████████████████████████████████████████▍                             | 12878/20117 [8:15:09<4:49:59,  2.40s/it] 64%|████████████████████████████████████████████████████▍                             | 12879/20117 [8:15:11<4:52:18,  2.42s/it] 64%|████████████████████████████████████████████████████▌                             | 12880/20117 [8:15:14<4:50:32,  2.41s/it]                                                                                                                                 {'loss': 0.1984, 'grad_norm': 0.5145922899246216, 'learning_rate': 5.787532721936413e-05, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.98, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▌                             | 12880/20117 [8:15:14<4:50:32,  2.41s/it] 64%|████████████████████████████████████████████████████▌                             | 12881/20117 [8:15:16<4:49:13,  2.40s/it] 64%|████████████████████████████████████████████████████▌                             | 12882/20117 [8:15:19<4:48:40,  2.39s/it] 64%|████████████████████████████████████████████████████▌                             | 12883/20117 [8:15:21<4:47:17,  2.38s/it] 64%|████████████████████████████████████████████████████▌                             | 12884/20117 [8:15:23<4:48:43,  2.40s/it] 64%|████████████████████████████████████████████████████▌                             | 12885/20117 [8:15:26<5:01:33,  2.50s/it] 64%|████████████████████████████████████████████████████▌                             | 12886/20117 [8:15:29<4:58:21,  2.48s/it] 64%|████████████████████████████████████████████████████▌                             | 12887/20117 [8:15:31<4:56:43,  2.46s/it] 64%|████████████████████████████████████████████████████▌                             | 12888/20117 [8:15:33<4:53:15,  2.43s/it] 64%|████████████████████████████████████████████████████▌                             | 12889/20117 [8:15:36<4:52:34,  2.43s/it] 64%|████████████████████████████████████████████████████▌                             | 12890/20117 [8:15:38<4:49:01,  2.40s/it]                                                                                                                                 {'loss': 0.1239, 'grad_norm': 0.4928934574127197, 'learning_rate': 5.7733037391984024e-05, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 291.25, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▌                             | 12890/20117 [8:15:38<4:49:01,  2.40s/it] 64%|████████████████████████████████████████████████████▌                             | 12891/20117 [8:15:40<4:48:58,  2.40s/it] 64%|████████████████████████████████████████████████████▌                             | 12892/20117 [8:15:43<4:42:28,  2.35s/it] 64%|████████████████████████████████████████████████████▌                             | 12893/20117 [8:15:45<4:37:37,  2.31s/it] 64%|████████████████████████████████████████████████████▌                             | 12894/20117 [8:15:47<4:35:37,  2.29s/it] 64%|████████████████████████████████████████████████████▌                             | 12895/20117 [8:15:50<4:37:41,  2.31s/it] 64%|████████████████████████████████████████████████████▌                             | 12896/20117 [8:15:52<4:42:32,  2.35s/it] 64%|████████████████████████████████████████████████████▌                             | 12897/20117 [8:15:54<4:43:49,  2.36s/it] 64%|████████████████████████████████████████████████████▌                             | 12898/20117 [8:15:57<4:45:23,  2.37s/it] 64%|████████████████████████████████████████████████████▌                             | 12899/20117 [8:15:59<4:43:18,  2.35s/it] 64%|████████████████████████████████████████████████████▌                             | 12900/20117 [8:16:01<4:44:10,  2.36s/it]                                                                                                                                 {'loss': 0.1584, 'grad_norm': 0.5820671916007996, 'learning_rate': 5.759085167706611e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.13, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▌                             | 12900/20117 [8:16:01<4:44:10,  2.36s/it] 64%|████████████████████████████████████████████████████▌                             | 12901/20117 [8:16:04<4:43:54,  2.36s/it] 64%|████████████████████████████████████████████████████▌                             | 12902/20117 [8:16:06<4:44:15,  2.36s/it] 64%|████████████████████████████████████████████████████▌                             | 12903/20117 [8:16:08<4:42:25,  2.35s/it] 64%|████████████████████████████████████████████████████▌                             | 12904/20117 [8:16:11<4:45:13,  2.37s/it] 64%|████████████████████████████████████████████████████▌                             | 12905/20117 [8:16:13<4:45:12,  2.37s/it] 64%|████████████████████████████████████████████████████▌                             | 12906/20117 [8:16:16<4:40:10,  2.33s/it] 64%|████████████████████████████████████████████████████▌                             | 12907/20117 [8:16:18<4:40:26,  2.33s/it] 64%|████████████████████████████████████████████████████▌                             | 12908/20117 [8:16:20<4:44:07,  2.36s/it] 64%|████████████████████████████████████████████████████▌                             | 12909/20117 [8:16:23<4:45:31,  2.38s/it] 64%|████████████████████████████████████████████████████▌                             | 12910/20117 [8:16:25<4:49:49,  2.41s/it]                                                                                                                                 {'loss': 0.1653, 'grad_norm': 0.42122918367385864, 'learning_rate': 5.7448770424843926e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.56, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▌                             | 12910/20117 [8:16:25<4:49:49,  2.41s/it] 64%|████████████████████████████████████████████████████▋                             | 12911/20117 [8:16:28<4:49:03,  2.41s/it] 64%|████████████████████████████████████████████████████▋                             | 12912/20117 [8:16:30<4:50:13,  2.42s/it] 64%|████████████████████████████████████████████████████▋                             | 12913/20117 [8:16:32<4:46:56,  2.39s/it] 64%|████████████████████████████████████████████████████▋                             | 12914/20117 [8:16:35<4:48:17,  2.40s/it] 64%|████████████████████████████████████████████████████▋                             | 12915/20117 [8:16:37<4:45:03,  2.37s/it] 64%|████████████████████████████████████████████████████▋                             | 12916/20117 [8:16:39<4:45:21,  2.38s/it] 64%|████████████████████████████████████████████████████▋                             | 12917/20117 [8:16:42<4:44:58,  2.37s/it] 64%|████████████████████████████████████████████████████▋                             | 12918/20117 [8:16:44<4:45:14,  2.38s/it] 64%|████████████████████████████████████████████████████▋                             | 12919/20117 [8:16:47<4:48:27,  2.40s/it] 64%|████████████████████████████████████████████████████▋                             | 12920/20117 [8:16:49<4:52:28,  2.44s/it]                                                                                                                                 {'loss': 0.1152, 'grad_norm': 0.30348268151283264, 'learning_rate': 5.730679398529355e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.17, 'epoch': 1.28}
 64%|████████████████████████████████████████████████████▋                             | 12920/20117 [8:16:49<4:52:28,  2.44s/it] 64%|████████████████████████████████████████████████████▋                             | 12921/20117 [8:16:52<4:52:21,  2.44s/it] 64%|████████████████████████████████████████████████████▋                             | 12922/20117 [8:16:54<4:50:32,  2.42s/it] 64%|████████████████████████████████████████████████████▋                             | 12923/20117 [8:16:56<4:49:41,  2.42s/it] 64%|████████████████████████████████████████████████████▋                             | 12924/20117 [8:16:59<4:48:10,  2.40s/it] 64%|████████████████████████████████████████████████████▋                             | 12925/20117 [8:17:01<4:44:14,  2.37s/it] 64%|████████████████████████████████████████████████████▋                             | 12926/20117 [8:17:04<4:48:21,  2.41s/it] 64%|████████████████████████████████████████████████████▋                             | 12927/20117 [8:17:06<4:49:02,  2.41s/it] 64%|████████████████████████████████████████████████████▋                             | 12928/20117 [8:17:08<4:47:06,  2.40s/it] 64%|████████████████████████████████████████████████████▋                             | 12929/20117 [8:17:11<4:49:36,  2.42s/it] 64%|████████████████████████████████████████████████████▋                             | 12930/20117 [8:17:13<4:50:31,  2.43s/it]                                                                                                                                 {'loss': 0.1603, 'grad_norm': 0.4230218529701233, 'learning_rate': 5.716492270813305e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.96, 'epoch': 1.29}
 64%|████████████████████████████████████████████████████▋                             | 12930/20117 [8:17:13<4:50:31,  2.43s/it] 64%|████████████████████████████████████████████████████▋                             | 12931/20117 [8:17:16<4:51:04,  2.43s/it] 64%|████████████████████████████████████████████████████▋                             | 12932/20117 [8:17:18<4:48:19,  2.41s/it] 64%|████████████████████████████████████████████████████▋                             | 12933/20117 [8:17:20<4:48:25,  2.41s/it] 64%|████████████████████████████████████████████████████▋                             | 12934/20117 [8:17:23<4:41:46,  2.35s/it] 64%|████████████████████████████████████████████████████▋                             | 12935/20117 [8:17:25<4:43:48,  2.37s/it] 64%|████████████████████████████████████████████████████▋                             | 12936/20117 [8:17:28<5:02:42,  2.53s/it] 64%|████████████████████████████████████████████████████▋                             | 12937/20117 [8:17:30<4:58:06,  2.49s/it] 64%|████████████████████████████████████████████████████▋                             | 12938/20117 [8:17:33<4:55:07,  2.47s/it] 64%|████████████████████████████████████████████████████▋                             | 12939/20117 [8:17:35<4:52:25,  2.44s/it] 64%|████████████████████████████████████████████████████▋                             | 12940/20117 [8:17:38<4:51:57,  2.44s/it]                                                                                                                                 {'loss': 0.1618, 'grad_norm': 0.4649568200111389, 'learning_rate': 5.7023156942821274e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.8, 'epoch': 1.29}
 64%|████████████████████████████████████████████████████▋                             | 12940/20117 [8:17:38<4:51:57,  2.44s/it] 64%|████████████████████████████████████████████████████▋                             | 12941/20117 [8:17:40<4:51:33,  2.44s/it] 64%|████████████████████████████████████████████████████▊                             | 12942/20117 [8:17:42<4:46:26,  2.40s/it] 64%|████████████████████████████████████████████████████▊                             | 12943/20117 [8:17:45<4:47:03,  2.40s/it] 64%|████████████████████████████████████████████████████▊                             | 12944/20117 [8:17:47<4:44:55,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12945/20117 [8:17:50<4:44:25,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12946/20117 [8:17:52<4:43:20,  2.37s/it] 64%|████████████████████████████████████████████████████▊                             | 12947/20117 [8:17:54<4:44:51,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12948/20117 [8:17:57<4:44:11,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12949/20117 [8:17:59<4:43:59,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12950/20117 [8:18:01<4:42:26,  2.36s/it]                                                                                                                                 {'loss': 0.1442, 'grad_norm': 0.7270090579986572, 'learning_rate': 5.688149703855732e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.17, 'epoch': 1.29}
 64%|████████████████████████████████████████████████████▊                             | 12950/20117 [8:18:01<4:42:26,  2.36s/it] 64%|████████████████████████████████████████████████████▊                             | 12951/20117 [8:18:04<4:45:41,  2.39s/it] 64%|████████████████████████████████████████████████████▊                             | 12952/20117 [8:18:06<4:45:56,  2.39s/it] 64%|████████████████████████████████████████████████████▊                             | 12953/20117 [8:18:09<4:46:07,  2.40s/it] 64%|████████████████████████████████████████████████████▊                             | 12954/20117 [8:18:11<4:48:24,  2.42s/it] 64%|████████████████████████████████████████████████████▊                             | 12955/20117 [8:18:13<4:45:51,  2.39s/it] 64%|████████████████████████████████████████████████████▊                             | 12956/20117 [8:18:16<4:46:36,  2.40s/it] 64%|████████████████████████████████████████████████████▊                             | 12957/20117 [8:18:18<4:42:12,  2.36s/it] 64%|████████████████████████████████████████████████████▊                             | 12958/20117 [8:18:21<4:43:56,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12959/20117 [8:18:23<4:46:17,  2.40s/it] 64%|████████████████████████████████████████████████████▊                             | 12960/20117 [8:18:25<4:47:14,  2.41s/it]                                                                                                                                 {'loss': 0.2068, 'grad_norm': 0.4988739490509033, 'learning_rate': 5.6739943344279455e-05, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.14, 'epoch': 1.29}
 64%|████████████████████████████████████████████████████▊                             | 12960/20117 [8:18:25<4:47:14,  2.41s/it] 64%|████████████████████████████████████████████████████▊                             | 12961/20117 [8:18:28<4:50:57,  2.44s/it] 64%|████████████████████████████████████████████████████▊                             | 12962/20117 [8:18:30<4:47:18,  2.41s/it] 64%|████████████████████████████████████████████████████▊                             | 12963/20117 [8:18:33<4:45:06,  2.39s/it] 64%|████████████████████████████████████████████████████▊                             | 12964/20117 [8:18:35<4:43:56,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12965/20117 [8:18:37<4:47:31,  2.41s/it] 64%|████████████████████████████████████████████████████▊                             | 12966/20117 [8:18:40<4:45:35,  2.40s/it] 64%|████████████████████████████████████████████████████▊                             | 12967/20117 [8:18:42<4:47:44,  2.41s/it] 64%|████████████████████████████████████████████████████▊                             | 12968/20117 [8:18:45<4:46:34,  2.41s/it] 64%|████████████████████████████████████████████████████▊                             | 12969/20117 [8:18:47<4:49:18,  2.43s/it] 64%|████████████████████████████████████████████████████▊                             | 12970/20117 [8:18:49<4:43:54,  2.38s/it]                                                                                                                                 {'loss': 0.2245, 'grad_norm': 0.5430411100387573, 'learning_rate': 5.65984962086644e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 428.47, 'epoch': 1.29}
 64%|████████████████████████████████████████████████████▊                             | 12970/20117 [8:18:49<4:43:54,  2.38s/it] 64%|████████████████████████████████████████████████████▊                             | 12971/20117 [8:18:52<4:45:57,  2.40s/it] 64%|████████████████████████████████████████████████████▉                             | 12972/20117 [8:18:54<4:45:29,  2.40s/it] 64%|████████████████████████████████████████████████████▉                             | 12973/20117 [8:18:57<4:47:06,  2.41s/it] 64%|████████████████████████████████████████████████████▉                             | 12974/20117 [8:18:59<4:45:22,  2.40s/it] 64%|████████████████████████████████████████████████████▉                             | 12975/20117 [8:19:01<4:44:37,  2.39s/it] 65%|████████████████████████████████████████████████████▉                             | 12976/20117 [8:19:04<4:43:29,  2.38s/it] 65%|████████████████████████████████████████████████████▉                             | 12977/20117 [8:19:06<4:43:41,  2.38s/it] 65%|████████████████████████████████████████████████████▉                             | 12978/20117 [8:19:09<4:44:37,  2.39s/it] 65%|████████████████████████████████████████████████████▉                             | 12979/20117 [8:19:11<4:44:31,  2.39s/it] 65%|████████████████████████████████████████████████████▉                             | 12980/20117 [8:19:13<4:45:36,  2.40s/it]                                                                                                                                 {'loss': 0.1454, 'grad_norm': 0.6165452003479004, 'learning_rate': 5.645715598012626e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.19, 'epoch': 1.29}
 65%|████████████████████████████████████████████████████▉                             | 12980/20117 [8:19:13<4:45:36,  2.40s/it] 65%|████████████████████████████████████████████████████▉                             | 12981/20117 [8:19:16<4:48:13,  2.42s/it] 65%|████████████████████████████████████████████████████▉                             | 12982/20117 [8:19:18<4:48:04,  2.42s/it] 65%|████████████████████████████████████████████████████▉                             | 12983/20117 [8:19:21<4:49:01,  2.43s/it] 65%|████████████████████████████████████████████████████▉                             | 12984/20117 [8:19:23<4:49:48,  2.44s/it] 65%|████████████████████████████████████████████████████▉                             | 12985/20117 [8:19:26<4:49:23,  2.43s/it] 65%|████████████████████████████████████████████████████▉                             | 12986/20117 [8:19:28<4:47:00,  2.41s/it] 65%|████████████████████████████████████████████████████▉                             | 12987/20117 [8:19:30<4:48:20,  2.43s/it] 65%|████████████████████████████████████████████████████▉                             | 12988/20117 [8:19:33<4:43:31,  2.39s/it] 65%|████████████████████████████████████████████████████▉                             | 12989/20117 [8:19:35<4:37:16,  2.33s/it] 65%|████████████████████████████████████████████████████▉                             | 12990/20117 [8:19:38<4:44:47,  2.40s/it]                                                                                                                                 {'loss': 0.186, 'grad_norm': 0.525731086730957, 'learning_rate': 5.631592300681593e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.42, 'epoch': 1.29}
 65%|████████████████████████████████████████████████████▉                             | 12990/20117 [8:19:38<4:44:47,  2.40s/it] 65%|████████████████████████████████████████████████████▉                             | 12991/20117 [8:19:40<4:37:59,  2.34s/it] 65%|████████████████████████████████████████████████████▉                             | 12992/20117 [8:19:42<4:37:29,  2.34s/it] 65%|████████████████████████████████████████████████████▉                             | 12993/20117 [8:19:44<4:41:41,  2.37s/it] 65%|████████████████████████████████████████████████████▉                             | 12994/20117 [8:19:47<4:41:01,  2.37s/it] 65%|████████████████████████████████████████████████████▉                             | 12995/20117 [8:19:49<4:39:53,  2.36s/it] 65%|████████████████████████████████████████████████████▉                             | 12996/20117 [8:19:52<4:41:12,  2.37s/it] 65%|████████████████████████████████████████████████████▉                             | 12997/20117 [8:19:54<4:42:40,  2.38s/it] 65%|████████████████████████████████████████████████████▉                             | 12998/20117 [8:19:56<4:43:07,  2.39s/it] 65%|████████████████████████████████████████████████████▉                             | 12999/20117 [8:19:59<4:39:20,  2.35s/it] 65%|████████████████████████████████████████████████████▉                             | 13000/20117 [8:20:01<4:39:59,  2.36s/it]                                                                                                                                 {'loss': 0.1512, 'grad_norm': 0.843223512172699, 'learning_rate': 5.617479763662011e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 288.5, 'epoch': 1.29}
 65%|████████████████████████████████████████████████████▉                             | 13000/20117 [8:20:01<4:39:59,  2.36s/it] 65%|████████████████████████████████████████████████████▉                             | 13001/20117 [8:20:03<4:40:03,  2.36s/it] 65%|████████████████████████████████████████████████████▉                             | 13002/20117 [8:20:06<4:39:18,  2.36s/it] 65%|█████████████████████████████████████████████████████                             | 13003/20117 [8:20:08<4:34:43,  2.32s/it] 65%|█████████████████████████████████████████████████████                             | 13004/20117 [8:20:10<4:32:32,  2.30s/it] 65%|█████████████████████████████████████████████████████                             | 13005/20117 [8:20:13<4:37:17,  2.34s/it] 65%|█████████████████████████████████████████████████████                             | 13006/20117 [8:20:15<4:39:28,  2.36s/it] 65%|█████████████████████████████████████████████████████                             | 13007/20117 [8:20:18<4:44:10,  2.40s/it] 65%|█████████████████████████████████████████████████████                             | 13008/20117 [8:20:20<4:43:50,  2.40s/it] 65%|█████████████████████████████████████████████████████                             | 13009/20117 [8:20:22<4:44:05,  2.40s/it] 65%|█████████████████████████████████████████████████████                             | 13010/20117 [8:20:25<4:47:56,  2.43s/it]                                                                                                                                 {'loss': 0.1475, 'grad_norm': 0.557815432548523, 'learning_rate': 5.6033780217160346e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.45, 'epoch': 1.29}
 65%|█████████████████████████████████████████████████████                             | 13010/20117 [8:20:25<4:47:56,  2.43s/it] 65%|█████████████████████████████████████████████████████                             | 13011/20117 [8:20:27<4:46:55,  2.42s/it] 65%|█████████████████████████████████████████████████████                             | 13012/20117 [8:20:30<4:48:35,  2.44s/it] 65%|█████████████████████████████████████████████████████                             | 13013/20117 [8:20:32<4:49:17,  2.44s/it] 65%|█████████████████████████████████████████████████████                             | 13014/20117 [8:20:35<4:45:17,  2.41s/it] 65%|█████████████████████████████████████████████████████                             | 13015/20117 [8:20:37<4:47:29,  2.43s/it] 65%|█████████████████████████████████████████████████████                             | 13016/20117 [8:20:39<4:47:49,  2.43s/it] 65%|█████████████████████████████████████████████████████                             | 13017/20117 [8:20:42<4:47:20,  2.43s/it] 65%|█████████████████████████████████████████████████████                             | 13018/20117 [8:20:44<4:47:44,  2.43s/it] 65%|█████████████████████████████████████████████████████                             | 13019/20117 [8:20:47<4:45:30,  2.41s/it] 65%|█████████████████████████████████████████████████████                             | 13020/20117 [8:20:49<4:43:50,  2.40s/it]                                                                                                                                 {'loss': 0.1323, 'grad_norm': 0.4504510462284088, 'learning_rate': 5.589287109579242e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.31, 'epoch': 1.29}
 65%|█████████████████████████████████████████████████████                             | 13020/20117 [8:20:49<4:43:50,  2.40s/it] 65%|█████████████████████████████████████████████████████                             | 13021/20117 [8:20:51<4:43:48,  2.40s/it] 65%|█████████████████████████████████████████████████████                             | 13022/20117 [8:20:54<4:40:28,  2.37s/it] 65%|█████████████████████████████████████████████████████                             | 13023/20117 [8:20:56<4:43:46,  2.40s/it] 65%|█████████████████████████████████████████████████████                             | 13024/20117 [8:20:59<4:43:48,  2.40s/it] 65%|█████████████████████████████████████████████████████                             | 13025/20117 [8:21:01<4:46:36,  2.42s/it] 65%|█████████████████████████████████████████████████████                             | 13026/20117 [8:21:03<4:44:56,  2.41s/it] 65%|█████████████████████████████████████████████████████                             | 13027/20117 [8:21:06<4:44:41,  2.41s/it] 65%|█████████████████████████████████████████████████████                             | 13028/20117 [8:21:08<4:44:47,  2.41s/it] 65%|█████████████████████████████████████████████████████                             | 13029/20117 [8:21:11<4:42:18,  2.39s/it] 65%|█████████████████████████████████████████████████████                             | 13030/20117 [8:21:13<4:42:02,  2.39s/it]                                                                                                                                 {'loss': 0.1525, 'grad_norm': 0.6401469707489014, 'learning_rate': 5.575207061960519e-05, 'memory/max_active (GiB)': 19.67, 'memory/max_allocated (GiB)': 19.67, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.06, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████                             | 13030/20117 [8:21:13<4:42:02,  2.39s/it] 65%|█████████████████████████████████████████████████████                             | 13031/20117 [8:21:15<4:38:40,  2.36s/it] 65%|█████████████████████████████████████████████████████                             | 13032/20117 [8:21:18<4:42:34,  2.39s/it] 65%|█████████████████████████████████████████████████████                             | 13033/20117 [8:21:20<4:39:43,  2.37s/it] 65%|█████████████████████████████████████████████████████▏                            | 13034/20117 [8:21:23<4:42:22,  2.39s/it] 65%|█████████████████████████████████████████████████████▏                            | 13035/20117 [8:21:25<4:42:13,  2.39s/it] 65%|█████████████████████████████████████████████████████▏                            | 13036/20117 [8:21:27<4:41:43,  2.39s/it] 65%|█████████████████████████████████████████████████████▏                            | 13037/20117 [8:21:30<4:42:44,  2.40s/it] 65%|█████████████████████████████████████████████████████▏                            | 13038/20117 [8:21:32<4:40:19,  2.38s/it] 65%|█████████████████████████████████████████████████████▏                            | 13039/20117 [8:21:34<4:42:44,  2.40s/it] 65%|█████████████████████████████████████████████████████▏                            | 13040/20117 [8:21:37<4:39:23,  2.37s/it]                                                                                                                                 {'loss': 0.2351, 'grad_norm': 0.603769838809967, 'learning_rate': 5.561137913542008e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 391.65, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▏                            | 13040/20117 [8:21:37<4:39:23,  2.37s/it] 65%|█████████████████████████████████████████████████████▏                            | 13041/20117 [8:21:40<4:53:33,  2.49s/it] 65%|█████████████████████████████████████████████████████▏                            | 13042/20117 [8:21:42<4:51:01,  2.47s/it] 65%|█████████████████████████████████████████████████████▏                            | 13043/20117 [8:21:44<4:49:38,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13044/20117 [8:21:47<4:50:00,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13045/20117 [8:21:49<4:49:50,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13046/20117 [8:21:52<4:48:27,  2.45s/it] 65%|█████████████████████████████████████████████████████▏                            | 13047/20117 [8:21:54<4:46:05,  2.43s/it] 65%|█████████████████████████████████████████████████████▏                            | 13048/20117 [8:21:57<4:47:35,  2.44s/it] 65%|█████████████████████████████████████████████████████▏                            | 13049/20117 [8:21:59<4:44:50,  2.42s/it] 65%|█████████████████████████████████████████████████████▏                            | 13050/20117 [8:22:01<4:47:56,  2.44s/it]                                                                                                                                 {'loss': 0.1351, 'grad_norm': 0.2735394239425659, 'learning_rate': 5.5470796989789874e-05, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 321.21, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▏                            | 13050/20117 [8:22:01<4:47:56,  2.44s/it] 65%|█████████████████████████████████████████████████████▏                            | 13051/20117 [8:22:04<4:48:56,  2.45s/it] 65%|█████████████████████████████████████████████████████▏                            | 13052/20117 [8:22:06<4:49:18,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13053/20117 [8:22:09<4:50:04,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13054/20117 [8:22:11<4:49:54,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13055/20117 [8:22:14<4:49:08,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13056/20117 [8:22:16<4:49:50,  2.46s/it] 65%|█████████████████████████████████████████████████████▏                            | 13057/20117 [8:22:19<4:48:34,  2.45s/it] 65%|█████████████████████████████████████████████████████▏                            | 13058/20117 [8:22:21<4:47:14,  2.44s/it] 65%|█████████████████████████████████████████████████████▏                            | 13059/20117 [8:22:24<4:45:57,  2.43s/it] 65%|█████████████████████████████████████████████████████▏                            | 13060/20117 [8:22:26<4:46:35,  2.44s/it]                                                                                                                                 {'loss': 0.204, 'grad_norm': 0.46087950468063354, 'learning_rate': 5.533032452899818e-05, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.8, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▏                            | 13060/20117 [8:22:26<4:46:35,  2.44s/it] 65%|█████████████████████████████████████████████████████▏                            | 13061/20117 [8:22:28<4:45:29,  2.43s/it] 65%|█████████████████████████████████████████████████████▏                            | 13062/20117 [8:22:31<4:45:37,  2.43s/it] 65%|█████████████████████████████████████████████████████▏                            | 13063/20117 [8:22:33<4:45:13,  2.43s/it] 65%|█████████████████████████████████████████████████████▎                            | 13064/20117 [8:22:36<4:41:52,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13065/20117 [8:22:38<4:42:02,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13066/20117 [8:22:40<4:44:16,  2.42s/it] 65%|█████████████████████████████████████████████████████▎                            | 13067/20117 [8:22:43<4:44:36,  2.42s/it] 65%|█████████████████████████████████████████████████████▎                            | 13068/20117 [8:22:45<4:45:24,  2.43s/it] 65%|█████████████████████████████████████████████████████▎                            | 13069/20117 [8:22:48<4:42:04,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13070/20117 [8:22:50<4:43:34,  2.41s/it]                                                                                                                                 {'loss': 0.1539, 'grad_norm': 0.48193469643592834, 'learning_rate': 5.518996209905829e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 293.56, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▎                            | 13070/20117 [8:22:50<4:43:34,  2.41s/it] 65%|█████████████████████████████████████████████████████▎                            | 13071/20117 [8:22:52<4:42:27,  2.41s/it] 65%|█████████████████████████████████████████████████████▎                            | 13072/20117 [8:22:55<4:41:33,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13073/20117 [8:22:57<4:45:56,  2.44s/it] 65%|█████████████████████████████████████████████████████▎                            | 13074/20117 [8:23:00<4:41:50,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13075/20117 [8:23:02<4:42:44,  2.41s/it] 65%|█████████████████████████████████████████████████████▎                            | 13076/20117 [8:23:04<4:37:19,  2.36s/it] 65%|█████████████████████████████████████████████████████▎                            | 13077/20117 [8:23:07<4:32:32,  2.32s/it] 65%|█████████████████████████████████████████████████████▎                            | 13078/20117 [8:23:09<4:29:29,  2.30s/it] 65%|█████████████████████████████████████████████████████▎                            | 13079/20117 [8:23:11<4:29:03,  2.29s/it] 65%|█████████████████████████████████████████████████████▎                            | 13080/20117 [8:23:14<4:34:09,  2.34s/it]                                                                                                                                 {'loss': 0.1775, 'grad_norm': 0.4664101302623749, 'learning_rate': 5.5049710045712596e-05, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 283.89, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▎                            | 13080/20117 [8:23:14<4:34:09,  2.34s/it] 65%|█████████████████████████████████████████████████████▎                            | 13081/20117 [8:23:16<4:37:18,  2.36s/it] 65%|█████████████████████████████████████████████████████▎                            | 13082/20117 [8:23:19<4:44:03,  2.42s/it] 65%|█████████████████████████████████████████████████████▎                            | 13083/20117 [8:23:21<4:38:28,  2.38s/it] 65%|█████████████████████████████████████████████████████▎                            | 13084/20117 [8:23:23<4:41:29,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13085/20117 [8:23:26<4:40:54,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13086/20117 [8:23:28<4:40:51,  2.40s/it] 65%|█████████████████████████████████████████████████████▎                            | 13087/20117 [8:23:31<4:42:16,  2.41s/it] 65%|█████████████████████████████████████████████████████▎                            | 13088/20117 [8:23:33<4:38:40,  2.38s/it] 65%|█████████████████████████████████████████████████████▎                            | 13089/20117 [8:23:35<4:33:56,  2.34s/it] 65%|█████████████████████████████████████████████████████▎                            | 13090/20117 [8:23:37<4:31:51,  2.32s/it]                                                                                                                                 {'loss': 0.1428, 'grad_norm': 0.4421910047531128, 'learning_rate': 5.490956871443149e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.54, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▎                            | 13090/20117 [8:23:37<4:31:51,  2.32s/it] 65%|█████████████████████████████████████████████████████▎                            | 13091/20117 [8:23:40<4:28:58,  2.30s/it] 65%|█████████████████████████████████████████████████████▎                            | 13092/20117 [8:23:42<4:30:26,  2.31s/it] 65%|█████████████████████████████████████████████████████▎                            | 13093/20117 [8:23:45<4:50:12,  2.48s/it] 65%|█████████████████████████████████████████████████████▎                            | 13094/20117 [8:23:47<4:48:36,  2.47s/it] 65%|█████████████████████████████████████████████████████▍                            | 13095/20117 [8:23:50<4:43:00,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13096/20117 [8:23:52<4:43:22,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13097/20117 [8:23:54<4:41:27,  2.41s/it] 65%|█████████████████████████████████████████████████████▍                            | 13098/20117 [8:23:57<4:42:48,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13099/20117 [8:23:59<4:44:08,  2.43s/it] 65%|█████████████████████████████████████████████████████▍                            | 13100/20117 [8:24:02<4:46:51,  2.45s/it]                                                                                                                                 {'loss': 0.2018, 'grad_norm': 0.5099435448646545, 'learning_rate': 5.4769538450412706e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.63, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▍                            | 13100/20117 [8:24:02<4:46:51,  2.45s/it] 65%|█████████████████████████████████████████████████████▍                            | 13101/20117 [8:24:04<4:46:13,  2.45s/it] 65%|█████████████████████████████████████████████████████▍                            | 13102/20117 [8:24:07<4:46:22,  2.45s/it] 65%|█████████████████████████████████████████████████████▍                            | 13103/20117 [8:24:09<4:43:22,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13104/20117 [8:24:11<4:43:05,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13105/20117 [8:24:14<4:42:49,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13106/20117 [8:24:16<4:42:01,  2.41s/it] 65%|█████████████████████████████████████████████████████▍                            | 13107/20117 [8:24:19<4:40:54,  2.40s/it] 65%|█████████████████████████████████████████████████████▍                            | 13108/20117 [8:24:21<4:41:12,  2.41s/it] 65%|█████████████████████████████████████████████████████▍                            | 13109/20117 [8:24:23<4:42:31,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13110/20117 [8:24:26<4:37:24,  2.38s/it]                                                                                                                                 {'loss': 0.1445, 'grad_norm': 0.4641667604446411, 'learning_rate': 5.462961959858042e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.8, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▍                            | 13110/20117 [8:24:26<4:37:24,  2.38s/it] 65%|█████████████████████████████████████████████████████▍                            | 13111/20117 [8:24:28<4:42:27,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13112/20117 [8:24:31<4:41:22,  2.41s/it] 65%|█████████████████████████████████████████████████████▍                            | 13113/20117 [8:24:33<4:41:58,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13114/20117 [8:24:36<4:45:11,  2.44s/it] 65%|█████████████████████████████████████████████████████▍                            | 13115/20117 [8:24:38<4:44:40,  2.44s/it] 65%|█████████████████████████████████████████████████████▍                            | 13116/20117 [8:24:40<4:41:49,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13117/20117 [8:24:43<4:41:40,  2.41s/it] 65%|█████████████████████████████████████████████████████▍                            | 13118/20117 [8:24:45<4:42:55,  2.43s/it] 65%|█████████████████████████████████████████████████████▍                            | 13119/20117 [8:24:48<4:42:31,  2.42s/it] 65%|█████████████████████████████████████████████████████▍                            | 13120/20117 [8:24:50<4:43:09,  2.43s/it]                                                                                                                                 {'loss': 0.1464, 'grad_norm': 0.3798070251941681, 'learning_rate': 5.448981250358429e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.06, 'epoch': 1.3}
 65%|█████████████████████████████████████████████████████▍                            | 13120/20117 [8:24:50<4:43:09,  2.43s/it] 65%|█████████████████████████████████████████████████████▍                            | 13121/20117 [8:24:53<4:42:56,  2.43s/it] 65%|█████████████████████████████████████████████████████▍                            | 13122/20117 [8:24:55<4:42:50,  2.43s/it] 65%|█████████████████████████████████████████████████████▍                            | 13123/20117 [8:24:57<4:43:40,  2.43s/it] 65%|█████████████████████████████████████████████████████▍                            | 13124/20117 [8:25:00<4:44:37,  2.44s/it] 65%|█████████████████████████████████████████████████████▍                            | 13125/20117 [8:25:02<4:43:03,  2.43s/it] 65%|█████████████████████████████████████████████████████▌                            | 13126/20117 [8:25:05<4:40:20,  2.41s/it] 65%|█████████████████████████████████████████████████████▌                            | 13127/20117 [8:25:07<4:41:25,  2.42s/it] 65%|█████████████████████████████████████████████████████▌                            | 13128/20117 [8:25:10<4:43:13,  2.43s/it] 65%|█████████████████████████████████████████████████████▌                            | 13129/20117 [8:25:12<4:38:41,  2.39s/it] 65%|█████████████████████████████████████████████████████▌                            | 13130/20117 [8:25:14<4:40:12,  2.41s/it]                                                                                                                                 {'loss': 0.2212, 'grad_norm': 0.49534204602241516, 'learning_rate': 5.435011750979881e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.64, 'epoch': 1.31}
 65%|█████████████████████████████████████████████████████▌                            | 13130/20117 [8:25:14<4:40:12,  2.41s/it] 65%|█████████████████████████████████████████████████████▌                            | 13131/20117 [8:25:17<4:42:12,  2.42s/it] 65%|█████████████████████████████████████████████████████▌                            | 13132/20117 [8:25:19<4:45:53,  2.46s/it] 65%|█████████████████████████████████████████████████████▌                            | 13133/20117 [8:25:22<4:44:55,  2.45s/it] 65%|█████████████████████████████████████████████████████▌                            | 13134/20117 [8:25:24<4:44:16,  2.44s/it] 65%|█████████████████████████████████████████████████████▌                            | 13135/20117 [8:25:27<4:46:11,  2.46s/it] 65%|█████████████████████████████████████████████████████▌                            | 13136/20117 [8:25:29<4:44:57,  2.45s/it] 65%|█████████████████████████████████████████████████████▌                            | 13137/20117 [8:25:31<4:40:23,  2.41s/it] 65%|█████████████████████████████████████████████████████▌                            | 13138/20117 [8:25:34<4:41:27,  2.42s/it] 65%|█████████████████████████████████████████████████████▌                            | 13139/20117 [8:25:36<4:41:55,  2.42s/it] 65%|█████████████████████████████████████████████████████▌                            | 13140/20117 [8:25:39<4:40:54,  2.42s/it]                                                                                                                                 {'loss': 0.1626, 'grad_norm': 0.586558997631073, 'learning_rate': 5.421053496132218e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.07, 'epoch': 1.31}
 65%|█████████████████████████████████████████████████████▌                            | 13140/20117 [8:25:39<4:40:54,  2.42s/it] 65%|█████████████████████████████████████████████████████▌                            | 13141/20117 [8:25:41<4:40:49,  2.42s/it] 65%|█████████████████████████████████████████████████████▌                            | 13142/20117 [8:25:43<4:41:30,  2.42s/it] 65%|█████████████████████████████████████████████████████▌                            | 13143/20117 [8:25:46<4:48:02,  2.48s/it] 65%|█████████████████████████████████████████████████████▌                            | 13144/20117 [8:25:49<5:06:17,  2.64s/it] 65%|█████████████████████████████████████████████████████▌                            | 13145/20117 [8:25:52<5:06:47,  2.64s/it] 65%|█████████████████████████████████████████████████████▌                            | 13146/20117 [8:25:54<5:03:26,  2.61s/it] 65%|█████████████████████████████████████████████████████▌                            | 13147/20117 [8:25:57<5:02:50,  2.61s/it] 65%|█████████████████████████████████████████████████████▌                            | 13148/20117 [8:25:59<4:58:18,  2.57s/it] 65%|█████████████████████████████████████████████████████▌                            | 13149/20117 [8:26:02<4:57:07,  2.56s/it] 65%|█████████████████████████████████████████████████████▌                            | 13150/20117 [8:26:05<4:58:39,  2.57s/it]                                                                                                                                 {'loss': 0.1488, 'grad_norm': 0.2836264967918396, 'learning_rate': 5.40710652019758e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.48, 'epoch': 1.31}
 65%|█████████████████████████████████████████████████████▌                            | 13150/20117 [8:26:05<4:58:39,  2.57s/it] 65%|█████████████████████████████████████████████████████▌                            | 13151/20117 [8:26:07<4:59:43,  2.58s/it] 65%|█████████████████████████████████████████████████████▌                            | 13152/20117 [8:26:10<4:54:12,  2.53s/it] 65%|█████████████████████████████████████████████████████▌                            | 13153/20117 [8:26:12<4:49:03,  2.49s/it] 65%|█████████████████████████████████████████████████████▌                            | 13154/20117 [8:26:14<4:49:50,  2.50s/it] 65%|█████████████████████████████████████████████████████▌                            | 13155/20117 [8:26:17<4:44:01,  2.45s/it] 65%|█████████████████████████████████████████████████████▋                            | 13156/20117 [8:26:19<4:43:23,  2.44s/it] 65%|█████████████████████████████████████████████████████▋                            | 13157/20117 [8:26:22<4:42:54,  2.44s/it] 65%|█████████████████████████████████████████████████████▋                            | 13158/20117 [8:26:24<4:45:14,  2.46s/it] 65%|█████████████████████████████████████████████████████▋                            | 13159/20117 [8:26:27<4:49:38,  2.50s/it] 65%|█████████████████████████████████████████████████████▋                            | 13160/20117 [8:26:29<4:46:25,  2.47s/it]                                                                                                                                 {'loss': 0.1667, 'grad_norm': 0.6480442881584167, 'learning_rate': 5.3931708575303096e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.57, 'epoch': 1.31}
 65%|█████████████████████████████████████████████████████▋                            | 13160/20117 [8:26:29<4:46:25,  2.47s/it] 65%|█████████████████████████████████████████████████████▋                            | 13161/20117 [8:26:32<4:48:44,  2.49s/it] 65%|█████████████████████████████████████████████████████▋                            | 13162/20117 [8:26:34<4:49:51,  2.50s/it] 65%|█████████████████████████████████████████████████████▋                            | 13163/20117 [8:26:37<4:44:26,  2.45s/it] 65%|█████████████████████████████████████████████████████▋                            | 13164/20117 [8:26:39<4:44:07,  2.45s/it] 65%|█████████████████████████████████████████████████████▋                            | 13165/20117 [8:26:41<4:43:19,  2.45s/it] 65%|█████████████████████████████████████████████████████▋                            | 13166/20117 [8:26:44<4:39:29,  2.41s/it] 65%|█████████████████████████████████████████████████████▋                            | 13167/20117 [8:26:46<4:35:05,  2.37s/it] 65%|█████████████████████████████████████████████████████▋                            | 13168/20117 [8:26:48<4:33:45,  2.36s/it] 65%|█████████████████████████████████████████████████████▋                            | 13169/20117 [8:26:51<4:32:15,  2.35s/it] 65%|█████████████████████████████████████████████████████▋                            | 13170/20117 [8:26:53<4:37:11,  2.39s/it]                                                                                                                                 {'loss': 0.1342, 'grad_norm': 0.5337100625038147, 'learning_rate': 5.379246542456897e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.13, 'epoch': 1.31}
 65%|█████████████████████████████████████████████████████▋                            | 13170/20117 [8:26:53<4:37:11,  2.39s/it] 65%|█████████████████████████████████████████████████████▋                            | 13171/20117 [8:26:56<4:41:13,  2.43s/it] 65%|█████████████████████████████████████████████████████▋                            | 13172/20117 [8:26:58<4:45:04,  2.46s/it] 65%|█████████████████████████████████████████████████████▋                            | 13173/20117 [8:27:01<4:43:16,  2.45s/it] 65%|█████████████████████████████████████████████████████▋                            | 13174/20117 [8:27:03<4:42:11,  2.44s/it] 65%|█████████████████████████████████████████████████████▋                            | 13175/20117 [8:27:05<4:37:58,  2.40s/it] 65%|█████████████████████████████████████████████████████▋                            | 13176/20117 [8:27:08<4:39:48,  2.42s/it] 66%|█████████████████████████████████████████████████████▋                            | 13177/20117 [8:27:10<4:38:06,  2.40s/it] 66%|█████████████████████████████████████████████████████▋                            | 13178/20117 [8:27:13<4:38:28,  2.41s/it] 66%|█████████████████████████████████████████████████████▋                            | 13179/20117 [8:27:15<4:33:47,  2.37s/it] 66%|█████████████████████████████████████████████████████▋                            | 13180/20117 [8:27:17<4:32:02,  2.35s/it]                                                                                                                                 {'loss': 0.1469, 'grad_norm': 0.4983210265636444, 'learning_rate': 5.365333609275864e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.8, 'epoch': 1.31}
 66%|█████████████████████████████████████████████████████▋                            | 13180/20117 [8:27:17<4:32:02,  2.35s/it] 66%|█████████████████████████████████████████████████████▋                            | 13181/20117 [8:27:20<4:29:04,  2.33s/it] 66%|█████████████████████████████████████████████████████▋                            | 13182/20117 [8:27:22<4:26:24,  2.30s/it] 66%|█████████████████████████████████████████████████████▋                            | 13183/20117 [8:27:24<4:30:18,  2.34s/it] 66%|█████████████████████████████████████████████████████▋                            | 13184/20117 [8:27:27<4:33:03,  2.36s/it] 66%|█████████████████████████████████████████████████████▋                            | 13185/20117 [8:27:29<4:34:32,  2.38s/it] 66%|█████████████████████████████████████████████████████▋                            | 13186/20117 [8:27:31<4:37:26,  2.40s/it] 66%|█████████████████████████████████████████████████████▊                            | 13187/20117 [8:27:34<4:35:57,  2.39s/it] 66%|█████████████████████████████████████████████████████▊                            | 13188/20117 [8:27:36<4:36:25,  2.39s/it] 66%|█████████████████████████████████████████████████████▊                            | 13189/20117 [8:27:39<4:33:38,  2.37s/it] 66%|█████████████████████████████████████████████████████▊                            | 13190/20117 [8:27:41<4:36:08,  2.39s/it]                                                                                                                                 {'loss': 0.1507, 'grad_norm': 0.4759822487831116, 'learning_rate': 5.351432092257716e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 282.24, 'epoch': 1.31}
 66%|█████████████████████████████████████████████████████▊                            | 13190/20117 [8:27:41<4:36:08,  2.39s/it] 66%|█████████████████████████████████████████████████████▊                            | 13191/20117 [8:27:44<4:40:48,  2.43s/it] 66%|█████████████████████████████████████████████████████▊                            | 13192/20117 [8:27:46<4:43:04,  2.45s/it] 66%|█████████████████████████████████████████████████████▊                            | 13193/20117 [8:27:48<4:42:34,  2.45s/it] 66%|█████████████████████████████████████████████████████▊                            | 13194/20117 [8:27:51<4:43:28,  2.46s/it] 66%|█████████████████████████████████████████████████████▊                            | 13195/20117 [8:27:53<4:42:43,  2.45s/it] 66%|█████████████████████████████████████████████████████▊                            | 13196/20117 [8:27:56<4:43:11,  2.46s/it] 66%|█████████████████████████████████████████████████████▊                            | 13197/20117 [8:27:59<4:56:31,  2.57s/it] 66%|█████████████████████████████████████████████████████▊                            | 13198/20117 [8:28:01<4:50:53,  2.52s/it] 66%|█████████████████████████████████████████████████████▊                            | 13199/20117 [8:28:04<4:50:15,  2.52s/it] 66%|█████████████████████████████████████████████████████▊                            | 13200/20117 [8:28:06<4:47:47,  2.50s/it]                                                                                                                                 {'loss': 0.1901, 'grad_norm': 0.2128848135471344, 'learning_rate': 5.3375420256448175e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.2, 'epoch': 1.31}
 66%|█████████████████████████████████████████████████████▊                            | 13200/20117 [8:28:06<4:47:47,  2.50s/it] 66%|█████████████████████████████████████████████████████▊                            | 13201/20117 [8:28:08<4:44:13,  2.47s/it] 66%|█████████████████████████████████████████████████████▊                            | 13202/20117 [8:28:11<4:41:33,  2.44s/it] 66%|█████████████████████████████████████████████████████▊                            | 13203/20117 [8:28:13<4:39:25,  2.42s/it] 66%|█████████████████████████████████████████████████████▊                            | 13204/20117 [8:28:16<4:40:54,  2.44s/it] 66%|█████████████████████████████████████████████████████▊                            | 13205/20117 [8:28:18<4:36:41,  2.40s/it] 66%|█████████████████████████████████████████████████████▊                            | 13206/20117 [8:28:20<4:38:48,  2.42s/it] 66%|█████████████████████████████████████████████████████▊                            | 13207/20117 [8:28:23<4:38:55,  2.42s/it] 66%|█████████████████████████████████████████████████████▊                            | 13208/20117 [8:28:25<4:37:17,  2.41s/it] 66%|█████████████████████████████████████████████████████▊                            | 13209/20117 [8:28:28<4:38:51,  2.42s/it] 66%|█████████████████████████████████████████████████████▊                            | 13210/20117 [8:28:30<4:39:56,  2.43s/it]                                                                                                                                 {'loss': 0.1259, 'grad_norm': 0.23509669303894043, 'learning_rate': 5.323663443651345e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.3, 'epoch': 1.31}
 66%|█████████████████████████████████████████████████████▊                            | 13210/20117 [8:28:30<4:39:56,  2.43s/it] 66%|█████████████████████████████████████████████████████▊                            | 13211/20117 [8:28:33<4:39:47,  2.43s/it] 66%|█████████████████████████████████████████████████████▊                            | 13212/20117 [8:28:35<4:38:08,  2.42s/it] 66%|█████████████████████████████████████████████████████▊                            | 13213/20117 [8:28:37<4:38:35,  2.42s/it] 66%|█████████████████████████████████████████████████████▊                            | 13214/20117 [8:28:40<4:36:32,  2.40s/it] 66%|█████████████████████████████████████████████████████▊                            | 13215/20117 [8:28:42<4:36:07,  2.40s/it] 66%|█████████████████████████████████████████████████████▊                            | 13216/20117 [8:28:45<4:35:14,  2.39s/it] 66%|█████████████████████████████████████████████████████▊                            | 13217/20117 [8:28:47<4:33:41,  2.38s/it] 66%|█████████████████████████████████████████████████████▉                            | 13218/20117 [8:28:49<4:33:16,  2.38s/it] 66%|█████████████████████████████████████████████████████▉                            | 13219/20117 [8:28:52<4:32:57,  2.37s/it] 66%|█████████████████████████████████████████████████████▉                            | 13220/20117 [8:28:54<4:35:16,  2.39s/it]                                                                                                                                 {'loss': 0.1589, 'grad_norm': 0.4961312711238861, 'learning_rate': 5.309796380463174e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 288.96, 'epoch': 1.31}
 66%|█████████████████████████████████████████████████████▉                            | 13220/20117 [8:28:54<4:35:16,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13221/20117 [8:28:56<4:34:29,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13222/20117 [8:28:59<4:33:01,  2.38s/it] 66%|█████████████████████████████████████████████████████▉                            | 13223/20117 [8:29:01<4:33:31,  2.38s/it] 66%|█████████████████████████████████████████████████████▉                            | 13224/20117 [8:29:04<4:36:50,  2.41s/it] 66%|█████████████████████████████████████████████████████▉                            | 13225/20117 [8:29:06<4:34:40,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13226/20117 [8:29:08<4:36:43,  2.41s/it] 66%|█████████████████████████████████████████████████████▉                            | 13227/20117 [8:29:11<4:38:50,  2.43s/it] 66%|█████████████████████████████████████████████████████▉                            | 13228/20117 [8:29:13<4:33:28,  2.38s/it] 66%|█████████████████████████████████████████████████████▉                            | 13229/20117 [8:29:16<4:37:26,  2.42s/it] 66%|█████████████████████████████████████████████████████▉                            | 13230/20117 [8:29:18<4:35:03,  2.40s/it]                                                                                                                                 {'loss': 0.2004, 'grad_norm': 0.6080997586250305, 'learning_rate': 5.295940870237817e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.85, 'epoch': 1.32}
 66%|█████████████████████████████████████████████████████▉                            | 13230/20117 [8:29:18<4:35:03,  2.40s/it] 66%|█████████████████████████████████████████████████████▉                            | 13231/20117 [8:29:20<4:34:33,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13232/20117 [8:29:23<4:34:47,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13233/20117 [8:29:25<4:31:03,  2.36s/it] 66%|█████████████████████████████████████████████████████▉                            | 13234/20117 [8:29:28<4:35:26,  2.40s/it] 66%|█████████████████████████████████████████████████████▉                            | 13235/20117 [8:29:30<4:31:23,  2.37s/it] 66%|█████████████████████████████████████████████████████▉                            | 13236/20117 [8:29:32<4:33:08,  2.38s/it] 66%|█████████████████████████████████████████████████████▉                            | 13237/20117 [8:29:35<4:31:42,  2.37s/it] 66%|█████████████████████████████████████████████████████▉                            | 13238/20117 [8:29:37<4:33:13,  2.38s/it] 66%|█████████████████████████████████████████████████████▉                            | 13239/20117 [8:29:40<4:36:06,  2.41s/it] 66%|█████████████████████████████████████████████████████▉                            | 13240/20117 [8:29:42<4:33:49,  2.39s/it]                                                                                                                                 {'loss': 0.1845, 'grad_norm': 0.4995987117290497, 'learning_rate': 5.2820969471043204e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.17, 'epoch': 1.32}
 66%|█████████████████████████████████████████████████████▉                            | 13240/20117 [8:29:42<4:33:49,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13241/20117 [8:29:44<4:33:58,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13242/20117 [8:29:47<4:33:58,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13243/20117 [8:29:49<4:37:05,  2.42s/it] 66%|█████████████████████████████████████████████████████▉                            | 13244/20117 [8:29:51<4:34:09,  2.39s/it] 66%|█████████████████████████████████████████████████████▉                            | 13245/20117 [8:29:54<4:41:32,  2.46s/it] 66%|█████████████████████████████████████████████████████▉                            | 13246/20117 [8:29:57<4:41:30,  2.46s/it] 66%|█████████████████████████████████████████████████████▉                            | 13247/20117 [8:29:59<4:49:21,  2.53s/it] 66%|██████████████████████████████████████████████████████                            | 13248/20117 [8:30:02<4:46:02,  2.50s/it] 66%|██████████████████████████████████████████████████████                            | 13249/20117 [8:30:04<4:42:06,  2.46s/it] 66%|██████████████████████████████████████████████████████                            | 13250/20117 [8:30:06<4:40:51,  2.45s/it]                                                                                                                                 {'loss': 0.1338, 'grad_norm': 0.4047294557094574, 'learning_rate': 5.2682646451631945e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.72, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████                            | 13250/20117 [8:30:06<4:40:51,  2.45s/it] 66%|██████████████████████████████████████████████████████                            | 13251/20117 [8:30:09<4:40:24,  2.45s/it] 66%|██████████████████████████████████████████████████████                            | 13252/20117 [8:30:12<4:53:12,  2.56s/it] 66%|██████████████████████████████████████████████████████                            | 13253/20117 [8:30:14<4:48:32,  2.52s/it] 66%|██████████████████████████████████████████████████████                            | 13254/20117 [8:30:17<4:47:06,  2.51s/it] 66%|██████████████████████████████████████████████████████                            | 13255/20117 [8:30:19<4:45:03,  2.49s/it] 66%|██████████████████████████████████████████████████████                            | 13256/20117 [8:30:22<4:42:12,  2.47s/it] 66%|██████████████████████████████████████████████████████                            | 13257/20117 [8:30:24<4:36:04,  2.41s/it] 66%|██████████████████████████████████████████████████████                            | 13258/20117 [8:30:26<4:31:03,  2.37s/it] 66%|██████████████████████████████████████████████████████                            | 13259/20117 [8:30:28<4:25:38,  2.32s/it] 66%|██████████████████████████████████████████████████████                            | 13260/20117 [8:30:31<4:22:50,  2.30s/it]                                                                                                                                 {'loss': 0.2004, 'grad_norm': 0.6713951826095581, 'learning_rate': 5.254443998486327e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.71, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████                            | 13260/20117 [8:30:31<4:22:50,  2.30s/it] 66%|██████████████████████████████████████████████████████                            | 13261/20117 [8:30:33<4:26:28,  2.33s/it] 66%|██████████████████████████████████████████████████████                            | 13262/20117 [8:30:35<4:29:32,  2.36s/it] 66%|██████████████████████████████████████████████████████                            | 13263/20117 [8:30:38<4:31:22,  2.38s/it] 66%|██████████████████████████████████████████████████████                            | 13264/20117 [8:30:40<4:30:26,  2.37s/it] 66%|██████████████████████████████████████████████████████                            | 13265/20117 [8:30:43<4:30:30,  2.37s/it] 66%|██████████████████████████████████████████████████████                            | 13266/20117 [8:30:45<4:27:00,  2.34s/it] 66%|██████████████████████████████████████████████████████                            | 13267/20117 [8:30:47<4:27:10,  2.34s/it] 66%|██████████████████████████████████████████████████████                            | 13268/20117 [8:30:49<4:24:40,  2.32s/it] 66%|██████████████████████████████████████████████████████                            | 13269/20117 [8:30:52<4:27:07,  2.34s/it] 66%|██████████████████████████████████████████████████████                            | 13270/20117 [8:30:54<4:28:06,  2.35s/it]                                                                                                                                 {'loss': 0.1612, 'grad_norm': 0.3636043667793274, 'learning_rate': 5.240635041116884e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.85, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████                            | 13270/20117 [8:30:54<4:28:06,  2.35s/it] 66%|██████████████████████████████████████████████████████                            | 13271/20117 [8:30:56<4:27:17,  2.34s/it] 66%|██████████████████████████████████████████████████████                            | 13272/20117 [8:30:59<4:23:23,  2.31s/it] 66%|██████████████████████████████████████████████████████                            | 13273/20117 [8:31:01<4:25:33,  2.33s/it] 66%|██████████████████████████████████████████████████████                            | 13274/20117 [8:31:04<4:30:25,  2.37s/it] 66%|██████████████████████████████████████████████████████                            | 13275/20117 [8:31:06<4:35:09,  2.41s/it] 66%|██████████████████████████████████████████████████████                            | 13276/20117 [8:31:09<4:37:49,  2.44s/it] 66%|██████████████████████████████████████████████████████                            | 13277/20117 [8:31:11<4:37:50,  2.44s/it] 66%|██████████████████████████████████████████████████████                            | 13278/20117 [8:31:13<4:38:50,  2.45s/it] 66%|██████████████████████████████████████████████████████▏                           | 13279/20117 [8:31:16<4:37:18,  2.43s/it] 66%|██████████████████████████████████████████████████████▏                           | 13280/20117 [8:31:18<4:38:55,  2.45s/it]                                                                                                                                 {'loss': 0.1599, 'grad_norm': 0.6122763156890869, 'learning_rate': 5.226837807069251e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.88, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████▏                           | 13280/20117 [8:31:18<4:38:55,  2.45s/it] 66%|██████████████████████████████████████████████████████▏                           | 13281/20117 [8:31:21<4:39:47,  2.46s/it] 66%|██████████████████████████████████████████████████████▏                           | 13282/20117 [8:31:23<4:38:15,  2.44s/it] 66%|██████████████████████████████████████████████████████▏                           | 13283/20117 [8:31:26<4:36:33,  2.43s/it] 66%|██████████████████████████████████████████████████████▏                           | 13284/20117 [8:31:28<4:34:19,  2.41s/it] 66%|██████████████████████████████████████████████████████▏                           | 13285/20117 [8:31:30<4:36:07,  2.42s/it] 66%|██████████████████████████████████████████████████████▏                           | 13286/20117 [8:31:33<4:36:33,  2.43s/it] 66%|██████████████████████████████████████████████████████▏                           | 13287/20117 [8:31:35<4:34:25,  2.41s/it] 66%|██████████████████████████████████████████████████████▏                           | 13288/20117 [8:31:38<4:40:35,  2.47s/it] 66%|██████████████████████████████████████████████████████▏                           | 13289/20117 [8:31:40<4:39:48,  2.46s/it] 66%|██████████████████████████████████████████████████████▏                           | 13290/20117 [8:31:43<4:37:57,  2.44s/it]                                                                                                                                 {'loss': 0.152, 'grad_norm': 0.3268527686595917, 'learning_rate': 5.213052330328929e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.71, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████▏                           | 13290/20117 [8:31:43<4:37:57,  2.44s/it] 66%|██████████████████████████████████████████████████████▏                           | 13291/20117 [8:31:45<4:38:26,  2.45s/it] 66%|██████████████████████████████████████████████████████▏                           | 13292/20117 [8:31:48<4:42:42,  2.49s/it] 66%|██████████████████████████████████████████████████████▏                           | 13293/20117 [8:31:50<4:39:05,  2.45s/it] 66%|██████████████████████████████████████████████████████▏                           | 13294/20117 [8:31:53<4:38:24,  2.45s/it] 66%|██████████████████████████████████████████████████████▏                           | 13295/20117 [8:31:55<4:38:16,  2.45s/it] 66%|██████████████████████████████████████████████████████▏                           | 13296/20117 [8:31:57<4:36:34,  2.43s/it] 66%|██████████████████████████████████████████████████████▏                           | 13297/20117 [8:32:00<4:35:24,  2.42s/it] 66%|██████████████████████████████████████████████████████▏                           | 13298/20117 [8:32:02<4:36:47,  2.44s/it] 66%|██████████████████████████████████████████████████████▏                           | 13299/20117 [8:32:05<4:35:49,  2.43s/it] 66%|██████████████████████████████████████████████████████▏                           | 13300/20117 [8:32:07<4:34:56,  2.42s/it]                                                                                                                                 {'loss': 0.2408, 'grad_norm': 0.27024024724960327, 'learning_rate': 5.199278644852464e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.21, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████▏                           | 13300/20117 [8:32:07<4:34:56,  2.42s/it] 66%|██████████████████████████████████████████████████████▏                           | 13301/20117 [8:32:09<4:32:23,  2.40s/it] 66%|██████████████████████████████████████████████████████▏                           | 13302/20117 [8:32:12<4:32:03,  2.40s/it] 66%|██████████████████████████████████████████████████████▏                           | 13303/20117 [8:32:14<4:40:31,  2.47s/it] 66%|██████████████████████████████████████████████████████▏                           | 13304/20117 [8:32:17<4:40:29,  2.47s/it] 66%|██████████████████████████████████████████████████████▏                           | 13305/20117 [8:32:19<4:35:13,  2.42s/it] 66%|██████████████████████████████████████████████████████▏                           | 13306/20117 [8:32:22<4:35:12,  2.42s/it] 66%|██████████████████████████████████████████████████████▏                           | 13307/20117 [8:32:24<4:33:50,  2.41s/it] 66%|██████████████████████████████████████████████████████▏                           | 13308/20117 [8:32:26<4:34:51,  2.42s/it] 66%|██████████████████████████████████████████████████████▏                           | 13309/20117 [8:32:29<4:33:51,  2.41s/it] 66%|██████████████████████████████████████████████████████▎                           | 13310/20117 [8:32:31<4:36:11,  2.43s/it]                                                                                                                                 {'loss': 0.1638, 'grad_norm': 0.34981921315193176, 'learning_rate': 5.18551678456735e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 320.81, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████▎                           | 13310/20117 [8:32:31<4:36:11,  2.43s/it] 66%|██████████████████████████████████████████████████████▎                           | 13311/20117 [8:32:34<4:35:53,  2.43s/it] 66%|██████████████████████████████████████████████████████▎                           | 13312/20117 [8:32:36<4:31:20,  2.39s/it] 66%|██████████████████████████████████████████████████████▎                           | 13313/20117 [8:32:38<4:31:38,  2.40s/it] 66%|██████████████████████████████████████████████████████▎                           | 13314/20117 [8:32:41<4:32:52,  2.41s/it] 66%|██████████████████████████████████████████████████████▎                           | 13315/20117 [8:32:43<4:32:09,  2.40s/it] 66%|██████████████████████████████████████████████████████▎                           | 13316/20117 [8:32:46<4:34:40,  2.42s/it] 66%|██████████████████████████████████████████████████████▎                           | 13317/20117 [8:32:48<4:33:56,  2.42s/it] 66%|██████████████████████████████████████████████████████▎                           | 13318/20117 [8:32:51<4:34:06,  2.42s/it] 66%|██████████████████████████████████████████████████████▎                           | 13319/20117 [8:32:53<4:32:02,  2.40s/it] 66%|██████████████████████████████████████████████████████▎                           | 13320/20117 [8:32:55<4:33:15,  2.41s/it]                                                                                                                                 {'loss': 0.1738, 'grad_norm': 0.44818684458732605, 'learning_rate': 5.1717667833719627e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.42, 'epoch': 1.32}
 66%|██████████████████████████████████████████████████████▎                           | 13320/20117 [8:32:55<4:33:15,  2.41s/it] 66%|██████████████████████████████████████████████████████▎                           | 13321/20117 [8:32:58<4:33:10,  2.41s/it] 66%|██████████████████████████████████████████████████████▎                           | 13322/20117 [8:33:00<4:33:09,  2.41s/it] 66%|██████████████████████████████████████████████████████▎                           | 13323/20117 [8:33:03<4:34:57,  2.43s/it] 66%|██████████████████████████████████████████████████████▎                           | 13324/20117 [8:33:05<4:36:06,  2.44s/it] 66%|██████████████████████████████████████████████████████▎                           | 13325/20117 [8:33:08<4:35:12,  2.43s/it] 66%|██████████████████████████████████████████████████████▎                           | 13326/20117 [8:33:10<4:37:46,  2.45s/it] 66%|██████████████████████████████████████████████████████▎                           | 13327/20117 [8:33:12<4:33:53,  2.42s/it] 66%|██████████████████████████████████████████████████████▎                           | 13328/20117 [8:33:15<4:33:21,  2.42s/it] 66%|██████████████████████████████████████████████████████▎                           | 13329/20117 [8:33:17<4:29:59,  2.39s/it] 66%|██████████████████████████████████████████████████████▎                           | 13330/20117 [8:33:20<4:29:52,  2.39s/it]                                                                                                                                 {'loss': 0.1738, 'grad_norm': 0.37409475445747375, 'learning_rate': 5.1580286751354545e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.43, 'epoch': 1.33}
 66%|██████████████████████████████████████████████████████▎                           | 13330/20117 [8:33:20<4:29:52,  2.39s/it] 66%|██████████████████████████████████████████████████████▎                           | 13331/20117 [8:33:22<4:30:43,  2.39s/it] 66%|██████████████████████████████████████████████████████▎                           | 13332/20117 [8:33:24<4:31:27,  2.40s/it] 66%|██████████████████████████████████████████████████████▎                           | 13333/20117 [8:33:27<4:29:19,  2.38s/it] 66%|██████████████████████████████████████████████████████▎                           | 13334/20117 [8:33:29<4:31:05,  2.40s/it] 66%|██████████████████████████████████████████████████████▎                           | 13335/20117 [8:33:31<4:28:19,  2.37s/it] 66%|██████████████████████████████████████████████████████▎                           | 13336/20117 [8:33:34<4:28:31,  2.38s/it] 66%|██████████████████████████████████████████████████████▎                           | 13337/20117 [8:33:36<4:26:47,  2.36s/it] 66%|██████████████████████████████████████████████████████▎                           | 13338/20117 [8:33:39<4:29:56,  2.39s/it] 66%|██████████████████████████████████████████████████████▎                           | 13339/20117 [8:33:41<4:29:37,  2.39s/it] 66%|██████████████████████████████████████████████████████▍                           | 13340/20117 [8:33:43<4:28:46,  2.38s/it]                                                                                                                                 {'loss': 0.1665, 'grad_norm': 0.48389047384262085, 'learning_rate': 5.144302493697697e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 402.78, 'epoch': 1.33}
 66%|██████████████████████████████████████████████████████▍                           | 13340/20117 [8:33:43<4:28:46,  2.38s/it] 66%|██████████████████████████████████████████████████████▍                           | 13341/20117 [8:33:46<4:30:00,  2.39s/it] 66%|██████████████████████████████████████████████████████▍                           | 13342/20117 [8:33:48<4:33:55,  2.43s/it] 66%|██████████████████████████████████████████████████████▍                           | 13343/20117 [8:33:51<4:32:41,  2.42s/it] 66%|██████████████████████████████████████████████████████▍                           | 13344/20117 [8:33:53<4:30:51,  2.40s/it] 66%|██████████████████████████████████████████████████████▍                           | 13345/20117 [8:33:55<4:28:56,  2.38s/it] 66%|██████████████████████████████████████████████████████▍                           | 13346/20117 [8:33:58<4:23:56,  2.34s/it] 66%|██████████████████████████████████████████████████████▍                           | 13347/20117 [8:34:00<4:26:06,  2.36s/it] 66%|██████████████████████████████████████████████████████▍                           | 13348/20117 [8:34:02<4:27:34,  2.37s/it] 66%|██████████████████████████████████████████████████████▍                           | 13349/20117 [8:34:05<4:26:20,  2.36s/it] 66%|██████████████████████████████████████████████████████▍                           | 13350/20117 [8:34:07<4:28:23,  2.38s/it]                                                                                                                                 {'loss': 0.131, 'grad_norm': 0.3536114990711212, 'learning_rate': 5.13058827286917e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.79, 'epoch': 1.33}
 66%|██████████████████████████████████████████████████████▍                           | 13350/20117 [8:34:07<4:28:23,  2.38s/it] 66%|██████████████████████████████████████████████████████▍                           | 13351/20117 [8:34:10<4:28:19,  2.38s/it] 66%|██████████████████████████████████████████████████████▍                           | 13352/20117 [8:34:12<4:25:06,  2.35s/it] 66%|██████████████████████████████████████████████████████▍                           | 13353/20117 [8:34:14<4:21:47,  2.32s/it] 66%|██████████████████████████████████████████████████████▍                           | 13354/20117 [8:34:16<4:19:31,  2.30s/it] 66%|██████████████████████████████████████████████████████▍                           | 13355/20117 [8:34:19<4:20:39,  2.31s/it] 66%|██████████████████████████████████████████████████████▍                           | 13356/20117 [8:34:21<4:23:25,  2.34s/it] 66%|██████████████████████████████████████████████████████▍                           | 13357/20117 [8:34:24<4:41:05,  2.49s/it] 66%|██████████████████████████████████████████████████████▍                           | 13358/20117 [8:34:26<4:38:17,  2.47s/it] 66%|██████████████████████████████████████████████████████▍                           | 13359/20117 [8:34:29<4:36:37,  2.46s/it] 66%|██████████████████████████████████████████████████████▍                           | 13360/20117 [8:34:31<4:33:45,  2.43s/it]                                                                                                                                 {'loss': 0.1576, 'grad_norm': 0.40804916620254517, 'learning_rate': 5.116886046430903e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 297.51, 'epoch': 1.33}
 66%|██████████████████████████████████████████████████████▍                           | 13360/20117 [8:34:31<4:33:45,  2.43s/it] 66%|██████████████████████████████████████████████████████▍                           | 13361/20117 [8:34:34<4:33:20,  2.43s/it] 66%|██████████████████████████████████████████████████████▍                           | 13362/20117 [8:34:36<4:28:58,  2.39s/it] 66%|██████████████████████████████████████████████████████▍                           | 13363/20117 [8:34:38<4:30:49,  2.41s/it] 66%|██████████████████████████████████████████████████████▍                           | 13364/20117 [8:34:41<4:23:38,  2.34s/it] 66%|██████████████████████████████████████████████████████▍                           | 13365/20117 [8:34:43<4:21:18,  2.32s/it] 66%|██████████████████████████████████████████████████████▍                           | 13366/20117 [8:34:45<4:16:45,  2.28s/it] 66%|██████████████████████████████████████████████████████▍                           | 13367/20117 [8:34:47<4:19:52,  2.31s/it] 66%|██████████████████████████████████████████████████████▍                           | 13368/20117 [8:34:50<4:25:26,  2.36s/it] 66%|██████████████████████████████████████████████████████▍                           | 13369/20117 [8:34:52<4:27:15,  2.38s/it] 66%|██████████████████████████████████████████████████████▍                           | 13370/20117 [8:34:55<4:28:55,  2.39s/it]                                                                                                                                 {'loss': 0.1632, 'grad_norm': 0.5419421792030334, 'learning_rate': 5.10319584813437e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.55, 'epoch': 1.33}
 66%|██████████████████████████████████████████████████████▍                           | 13370/20117 [8:34:55<4:28:55,  2.39s/it] 66%|██████████████████████████████████████████████████████▌                           | 13371/20117 [8:34:57<4:27:44,  2.38s/it] 66%|██████████████████████████████████████████████████████▌                           | 13372/20117 [8:34:59<4:28:25,  2.39s/it] 66%|██████████████████████████████████████████████████████▌                           | 13373/20117 [8:35:02<4:30:33,  2.41s/it] 66%|██████████████████████████████████████████████████████▌                           | 13374/20117 [8:35:04<4:33:06,  2.43s/it] 66%|██████████████████████████████████████████████████████▌                           | 13375/20117 [8:35:07<4:35:45,  2.45s/it] 66%|██████████████████████████████████████████████████████▌                           | 13376/20117 [8:35:09<4:33:34,  2.44s/it] 66%|██████████████████████████████████████████████████████▌                           | 13377/20117 [8:35:12<4:33:14,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13378/20117 [8:35:14<4:30:41,  2.41s/it] 67%|██████████████████████████████████████████████████████▌                           | 13379/20117 [8:35:17<4:32:25,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13380/20117 [8:35:19<4:32:15,  2.42s/it]                                                                                                                                 {'loss': 0.1439, 'grad_norm': 0.3658435642719269, 'learning_rate': 5.089517711701426e-05, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 300.73, 'epoch': 1.33}
 67%|██████████████████████████████████████████████████████▌                           | 13380/20117 [8:35:19<4:32:15,  2.42s/it] 67%|██████████████████████████████████████████████████████▌                           | 13381/20117 [8:35:21<4:31:07,  2.42s/it] 67%|██████████████████████████████████████████████████████▌                           | 13382/20117 [8:35:24<4:31:07,  2.42s/it] 67%|██████████████████████████████████████████████████████▌                           | 13383/20117 [8:35:26<4:34:20,  2.44s/it] 67%|██████████████████████████████████████████████████████▌                           | 13384/20117 [8:35:29<4:34:30,  2.45s/it] 67%|██████████████████████████████████████████████████████▌                           | 13385/20117 [8:35:31<4:35:35,  2.46s/it] 67%|██████████████████████████████████████████████████████▌                           | 13386/20117 [8:35:34<4:34:59,  2.45s/it] 67%|██████████████████████████████████████████████████████▌                           | 13387/20117 [8:35:36<4:32:05,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13388/20117 [8:35:38<4:31:52,  2.42s/it] 67%|██████████████████████████████████████████████████████▌                           | 13389/20117 [8:35:41<4:32:23,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13390/20117 [8:35:43<4:32:18,  2.43s/it]                                                                                                                                 {'loss': 0.1575, 'grad_norm': 0.3239598572254181, 'learning_rate': 5.075851670824212e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.4, 'epoch': 1.33}
 67%|██████████████████████████████████████████████████████▌                           | 13390/20117 [8:35:43<4:32:18,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13391/20117 [8:35:46<4:32:01,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13392/20117 [8:35:48<4:30:35,  2.41s/it] 67%|██████████████████████████████████████████████████████▌                           | 13393/20117 [8:35:51<4:31:47,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13394/20117 [8:35:53<4:30:47,  2.42s/it] 67%|██████████████████████████████████████████████████████▌                           | 13395/20117 [8:35:55<4:27:46,  2.39s/it] 67%|██████████████████████████████████████████████████████▌                           | 13396/20117 [8:35:58<4:29:07,  2.40s/it] 67%|██████████████████████████████████████████████████████▌                           | 13397/20117 [8:36:00<4:29:28,  2.41s/it] 67%|██████████████████████████████████████████████████████▌                           | 13398/20117 [8:36:03<4:30:17,  2.41s/it] 67%|██████████████████████████████████████████████████████▌                           | 13399/20117 [8:36:05<4:30:16,  2.41s/it] 67%|██████████████████████████████████████████████████████▌                           | 13400/20117 [8:36:07<4:32:09,  2.43s/it]                                                                                                                                 {'loss': 0.1593, 'grad_norm': 0.5001199245452881, 'learning_rate': 5.0621977591650773e-05, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.87, 'epoch': 1.33}
 67%|██████████████████████████████████████████████████████▌                           | 13400/20117 [8:36:07<4:32:09,  2.43s/it] 67%|██████████████████████████████████████████████████████▌                           | 13401/20117 [8:36:10<4:36:56,  2.47s/it] 67%|██████████████████████████████████████████████████████▋                           | 13402/20117 [8:36:12<4:36:29,  2.47s/it] 67%|██████████████████████████████████████████████████████▋                           | 13403/20117 [8:36:15<4:35:51,  2.47s/it] 67%|██████████████████████████████████████████████████████▋                           | 13404/20117 [8:36:17<4:33:20,  2.44s/it] 67%|██████████████████████████████████████████████████████▋                           | 13405/20117 [8:36:20<4:33:42,  2.45s/it] 67%|██████████████████████████████████████████████████████▋                           | 13406/20117 [8:36:22<4:32:49,  2.44s/it] 67%|██████████████████████████████████████████████████████▋                           | 13407/20117 [8:36:25<4:33:47,  2.45s/it] 67%|██████████████████████████████████████████████████████▋                           | 13408/20117 [8:36:27<4:39:49,  2.50s/it] 67%|██████████████████████████████████████████████████████▋                           | 13409/20117 [8:36:30<4:52:56,  2.62s/it] 67%|██████████████████████████████████████████████████████▋                           | 13410/20117 [8:36:33<4:46:26,  2.56s/it]                                                                                                                                 {'loss': 0.1502, 'grad_norm': 0.5077064633369446, 'learning_rate': 5.048556010356491e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 320.65, 'epoch': 1.33}
 67%|██████████████████████████████████████████████████████▋                           | 13410/20117 [8:36:33<4:46:26,  2.56s/it] 67%|██████████████████████████████████████████████████████▋                           | 13411/20117 [8:36:35<4:43:17,  2.53s/it] 67%|██████████████████████████████████████████████████████▋                           | 13412/20117 [8:36:38<4:40:46,  2.51s/it] 67%|██████████████████████████████████████████████████████▋                           | 13413/20117 [8:36:40<4:36:16,  2.47s/it] 67%|██████████████████████████████████████████████████████▋                           | 13414/20117 [8:36:42<4:32:11,  2.44s/it] 67%|██████████████████████████████████████████████████████▋                           | 13415/20117 [8:36:45<4:32:28,  2.44s/it] 67%|██████████████████████████████████████████████████████▋                           | 13416/20117 [8:36:47<4:30:27,  2.42s/it] 67%|██████████████████████████████████████████████████████▋                           | 13417/20117 [8:36:50<4:31:55,  2.44s/it] 67%|██████████████████████████████████████████████████████▋                           | 13418/20117 [8:36:52<4:31:23,  2.43s/it] 67%|██████████████████████████████████████████████████████▋                           | 13419/20117 [8:36:54<4:26:46,  2.39s/it] 67%|██████████████████████████████████████████████████████▋                           | 13420/20117 [8:36:57<4:26:19,  2.39s/it]                                                                                                                                 {'loss': 0.16, 'grad_norm': 0.573063313961029, 'learning_rate': 5.0349264580009616e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.59, 'epoch': 1.33}
 67%|██████████████████████████████████████████████████████▋                           | 13420/20117 [8:36:57<4:26:19,  2.39s/it] 67%|██████████████████████████████████████████████████████▋                           | 13421/20117 [8:36:59<4:23:48,  2.36s/it] 67%|██████████████████████████████████████████████████████▋                           | 13422/20117 [8:37:01<4:25:26,  2.38s/it] 67%|██████████████████████████████████████████████████████▋                           | 13423/20117 [8:37:04<4:20:20,  2.33s/it] 67%|██████████████████████████████████████████████████████▋                           | 13424/20117 [8:37:06<4:23:38,  2.36s/it] 67%|██████████████████████████████████████████████████████▋                           | 13425/20117 [8:37:08<4:24:56,  2.38s/it] 67%|██████████████████████████████████████████████████████▋                           | 13426/20117 [8:37:11<4:26:31,  2.39s/it] 67%|██████████████████████████████████████████████████████▋                           | 13427/20117 [8:37:13<4:26:42,  2.39s/it] 67%|██████████████████████████████████████████████████████▋                           | 13428/20117 [8:37:16<4:25:50,  2.38s/it] 67%|██████████████████████████████████████████████████████▋                           | 13429/20117 [8:37:18<4:24:41,  2.37s/it] 67%|██████████████████████████████████████████████████████▋                           | 13430/20117 [8:37:20<4:23:23,  2.36s/it]                                                                                                                                 {'loss': 0.1975, 'grad_norm': 0.6947848200798035, 'learning_rate': 5.021309135670959e-05, 'memory/max_active (GiB)': 19.69, 'memory/max_allocated (GiB)': 19.69, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.35, 'epoch': 1.34}
 67%|██████████████████████████████████████████████████████▋                           | 13430/20117 [8:37:20<4:23:23,  2.36s/it] 67%|██████████████████████████████████████████████████████▋                           | 13431/20117 [8:37:23<4:23:02,  2.36s/it] 67%|██████████████████████████████████████████████████████▊                           | 13432/20117 [8:37:25<4:25:58,  2.39s/it] 67%|██████████████████████████████████████████████████████▊                           | 13433/20117 [8:37:28<4:27:29,  2.40s/it] 67%|██████████████████████████████████████████████████████▊                           | 13434/20117 [8:37:30<4:25:52,  2.39s/it] 67%|██████████████████████████████████████████████████████▊                           | 13435/20117 [8:37:32<4:26:37,  2.39s/it] 67%|██████████████████████████████████████████████████████▊                           | 13436/20117 [8:37:35<4:24:21,  2.37s/it] 67%|██████████████████████████████████████████████████████▊                           | 13437/20117 [8:37:37<4:27:27,  2.40s/it] 67%|██████████████████████████████████████████████████████▊                           | 13438/20117 [8:37:39<4:24:53,  2.38s/it] 67%|██████████████████████████████████████████████████████▊                           | 13439/20117 [8:37:42<4:26:22,  2.39s/it] 67%|██████████████████████████████████████████████████████▊                           | 13440/20117 [8:37:44<4:26:00,  2.39s/it]                                                                                                                                 {'loss': 0.1626, 'grad_norm': 0.6331676244735718, 'learning_rate': 5.007704076908825e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 377.99, 'epoch': 1.34}
 67%|██████████████████████████████████████████████████████▊                           | 13440/20117 [8:37:44<4:26:00,  2.39s/it] 67%|██████████████████████████████████████████████████████▊                           | 13441/20117 [8:37:47<4:26:33,  2.40s/it] 67%|██████████████████████████████████████████████████████▊                           | 13442/20117 [8:37:49<4:25:25,  2.39s/it] 67%|██████████████████████████████████████████████████████▊                           | 13443/20117 [8:37:51<4:19:20,  2.33s/it] 67%|██████████████████████████████████████████████████████▊                           | 13444/20117 [8:37:53<4:16:11,  2.30s/it] 67%|██████████████████████████████████████████████████████▊                           | 13445/20117 [8:37:56<4:17:40,  2.32s/it] 67%|██████████████████████████████████████████████████████▊                           | 13446/20117 [8:37:58<4:21:39,  2.35s/it] 67%|██████████████████████████████████████████████████████▊                           | 13447/20117 [8:38:01<4:24:58,  2.38s/it] 67%|██████████████████████████████████████████████████████▊                           | 13448/20117 [8:38:03<4:25:26,  2.39s/it] 67%|██████████████████████████████████████████████████████▊                           | 13449/20117 [8:38:05<4:24:28,  2.38s/it] 67%|██████████████████████████████████████████████████████▊                           | 13450/20117 [8:38:08<4:22:20,  2.36s/it]                                                                                                                                 {'loss': 0.1846, 'grad_norm': 0.6537986397743225, 'learning_rate': 4.994111315226697e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 409.47, 'epoch': 1.34}
 67%|██████████████████████████████████████████████████████▊                           | 13450/20117 [8:38:08<4:22:20,  2.36s/it] 67%|██████████████████████████████████████████████████████▊                           | 13451/20117 [8:38:10<4:21:58,  2.36s/it] 67%|██████████████████████████████████████████████████████▊                           | 13452/20117 [8:38:12<4:21:20,  2.35s/it] 67%|██████████████████████████████████████████████████████▊                           | 13453/20117 [8:38:15<4:20:28,  2.35s/it] 67%|██████████████████████████████████████████████████████▊                           | 13454/20117 [8:38:17<4:19:36,  2.34s/it] 67%|██████████████████████████████████████████████████████▊                           | 13455/20117 [8:38:19<4:20:07,  2.34s/it] 67%|██████████████████████████████████████████████████████▊                           | 13456/20117 [8:38:22<4:19:44,  2.34s/it] 67%|██████████████████████████████████████████████████████▊                           | 13457/20117 [8:38:24<4:17:05,  2.32s/it] 67%|██████████████████████████████████████████████████████▊                           | 13458/20117 [8:38:26<4:16:48,  2.31s/it] 67%|██████████████████████████████████████████████████████▊                           | 13459/20117 [8:38:29<4:16:31,  2.31s/it] 67%|██████████████████████████████████████████████████████▊                           | 13460/20117 [8:38:31<4:21:11,  2.35s/it]                                                                                                                                 {'loss': 0.1483, 'grad_norm': 0.27798891067504883, 'learning_rate': 4.980530884106416e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 297.57, 'epoch': 1.34}
 67%|██████████████████████████████████████████████████████▊                           | 13460/20117 [8:38:31<4:21:11,  2.35s/it] 67%|██████████████████████████████████████████████████████▊                           | 13461/20117 [8:38:34<4:23:05,  2.37s/it] 67%|██████████████████████████████████████████████████████▊                           | 13462/20117 [8:38:36<4:25:04,  2.39s/it] 67%|██████████████████████████████████████████████████████▉                           | 13463/20117 [8:38:39<4:41:05,  2.53s/it] 67%|██████████████████████████████████████████████████████▉                           | 13464/20117 [8:38:41<4:37:48,  2.51s/it] 67%|██████████████████████████████████████████████████████▉                           | 13465/20117 [8:38:44<4:33:27,  2.47s/it] 67%|██████████████████████████████████████████████████████▉                           | 13466/20117 [8:38:46<4:31:53,  2.45s/it] 67%|██████████████████████████████████████████████████████▉                           | 13467/20117 [8:38:48<4:25:29,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13468/20117 [8:38:51<4:28:25,  2.42s/it] 67%|██████████████████████████████████████████████████████▉                           | 13469/20117 [8:38:53<4:24:57,  2.39s/it] 67%|██████████████████████████████████████████████████████▉                           | 13470/20117 [8:38:56<4:23:59,  2.38s/it]                                                                                                                                 {'loss': 0.1544, 'grad_norm': 0.455538272857666, 'learning_rate': 4.9669628169994586e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.23, 'epoch': 1.34}
 67%|██████████████████████████████████████████████████████▉                           | 13470/20117 [8:38:56<4:23:59,  2.38s/it] 67%|██████████████████████████████████████████████████████▉                           | 13471/20117 [8:38:58<4:26:02,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13472/20117 [8:39:00<4:28:14,  2.42s/it] 67%|██████████████████████████████████████████████████████▉                           | 13473/20117 [8:39:03<4:27:38,  2.42s/it] 67%|██████████████████████████████████████████████████████▉                           | 13474/20117 [8:39:05<4:26:17,  2.41s/it] 67%|██████████████████████████████████████████████████████▉                           | 13475/20117 [8:39:08<4:26:56,  2.41s/it] 67%|██████████████████████████████████████████████████████▉                           | 13476/20117 [8:39:10<4:25:20,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13477/20117 [8:39:12<4:23:42,  2.38s/it] 67%|██████████████████████████████████████████████████████▉                           | 13478/20117 [8:39:15<4:24:31,  2.39s/it] 67%|██████████████████████████████████████████████████████▉                           | 13479/20117 [8:39:17<4:25:32,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13480/20117 [8:39:20<4:25:23,  2.40s/it]                                                                                                                                 {'loss': 0.1629, 'grad_norm': 0.8112931251525879, 'learning_rate': 4.9534071473268375e-05, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.59, 'epoch': 1.34}
 67%|██████████████████████████████████████████████████████▉                           | 13480/20117 [8:39:20<4:25:23,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13481/20117 [8:39:22<4:25:51,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13482/20117 [8:39:24<4:25:14,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13483/20117 [8:39:27<4:24:59,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13484/20117 [8:39:29<4:19:52,  2.35s/it] 67%|██████████████████████████████████████████████████████▉                           | 13485/20117 [8:39:31<4:22:07,  2.37s/it] 67%|██████████████████████████████████████████████████████▉                           | 13486/20117 [8:39:34<4:21:19,  2.36s/it] 67%|██████████████████████████████████████████████████████▉                           | 13487/20117 [8:39:36<4:22:56,  2.38s/it] 67%|██████████████████████████████████████████████████████▉                           | 13488/20117 [8:39:39<4:24:39,  2.40s/it] 67%|██████████████████████████████████████████████████████▉                           | 13489/20117 [8:39:41<4:21:15,  2.37s/it] 67%|██████████████████████████████████████████████████████▉                           | 13490/20117 [8:39:43<4:22:07,  2.37s/it]                                                                                                                                 {'loss': 0.1238, 'grad_norm': 0.541522204875946, 'learning_rate': 4.939863908479037e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.61, 'epoch': 1.34}
 67%|██████████████████████████████████████████████████████▉                           | 13490/20117 [8:39:43<4:22:07,  2.37s/it] 67%|██████████████████████████████████████████████████████▉                           | 13491/20117 [8:39:46<4:21:39,  2.37s/it] 67%|██████████████████████████████████████████████████████▉                           | 13492/20117 [8:39:48<4:23:36,  2.39s/it] 67%|██████████████████████████████████████████████████████▉                           | 13493/20117 [8:39:50<4:22:24,  2.38s/it] 67%|███████████████████████████████████████████████████████                           | 13494/20117 [8:39:53<4:23:39,  2.39s/it] 67%|███████████████████████████████████████████████████████                           | 13495/20117 [8:39:55<4:21:27,  2.37s/it] 67%|███████████████████████████████████████████████████████                           | 13496/20117 [8:39:58<4:23:45,  2.39s/it] 67%|███████████████████████████████████████████████████████                           | 13497/20117 [8:40:00<4:21:45,  2.37s/it] 67%|███████████████████████████████████████████████████████                           | 13498/20117 [8:40:02<4:20:20,  2.36s/it] 67%|███████████████████████████████████████████████████████                           | 13499/20117 [8:40:05<4:23:25,  2.39s/it] 67%|███████████████████████████████████████████████████████                           | 13500/20117 [8:40:07<4:23:13,  2.39s/it]                                                                                                                                 {'loss': 0.1664, 'grad_norm': 0.574373185634613, 'learning_rate': 4.9263331338159105e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.93, 'epoch': 1.34}
 67%|███████████████████████████████████████████████████████                           | 13500/20117 [8:40:07<4:23:13,  2.39s/it] 67%|███████████████████████████████████████████████████████                           | 13501/20117 [8:40:10<4:24:04,  2.39s/it] 67%|███████████████████████████████████████████████████████                           | 13502/20117 [8:40:12<4:26:24,  2.42s/it] 67%|███████████████████████████████████████████████████████                           | 13503/20117 [8:40:14<4:24:55,  2.40s/it] 67%|███████████████████████████████████████████████████████                           | 13504/20117 [8:40:17<4:24:23,  2.40s/it] 67%|███████████████████████████████████████████████████████                           | 13505/20117 [8:40:19<4:23:59,  2.40s/it] 67%|███████████████████████████████████████████████████████                           | 13506/20117 [8:40:22<4:24:22,  2.40s/it] 67%|███████████████████████████████████████████████████████                           | 13507/20117 [8:40:24<4:24:30,  2.40s/it] 67%|███████████████████████████████████████████████████████                           | 13508/20117 [8:40:26<4:24:25,  2.40s/it] 67%|███████████████████████████████████████████████████████                           | 13509/20117 [8:40:29<4:22:53,  2.39s/it] 67%|███████████████████████████████████████████████████████                           | 13510/20117 [8:40:31<4:26:04,  2.42s/it]                                                                                                                                 {'loss': 0.1402, 'grad_norm': 0.35874566435813904, 'learning_rate': 4.9128148566666186e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 328.7, 'epoch': 1.34}
 67%|███████████████████████████████████████████████████████                           | 13510/20117 [8:40:31<4:26:04,  2.42s/it] 67%|███████████████████████████████████████████████████████                           | 13511/20117 [8:40:34<4:23:07,  2.39s/it] 67%|███████████████████████████████████████████████████████                           | 13512/20117 [8:40:36<4:24:26,  2.40s/it] 67%|███████████████████████████████████████████████████████                           | 13513/20117 [8:40:38<4:25:23,  2.41s/it] 67%|███████████████████████████████████████████████████████                           | 13514/20117 [8:40:41<4:27:19,  2.43s/it] 67%|███████████████████████████████████████████████████████                           | 13515/20117 [8:40:44<4:42:38,  2.57s/it] 67%|███████████████████████████████████████████████████████                           | 13516/20117 [8:40:46<4:37:48,  2.53s/it] 67%|███████████████████████████████████████████████████████                           | 13517/20117 [8:40:49<4:30:40,  2.46s/it] 67%|███████████████████████████████████████████████████████                           | 13518/20117 [8:40:51<4:29:03,  2.45s/it] 67%|███████████████████████████████████████████████████████                           | 13519/20117 [8:40:53<4:28:45,  2.44s/it] 67%|███████████████████████████████████████████████████████                           | 13520/20117 [8:40:56<4:27:22,  2.43s/it]                                                                                                                                 {'loss': 0.1482, 'grad_norm': 0.2937113046646118, 'learning_rate': 4.899309110329541e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 328.58, 'epoch': 1.34}
 67%|███████████████████████████████████████████████████████                           | 13520/20117 [8:40:56<4:27:22,  2.43s/it] 67%|███████████████████████████████████████████████████████                           | 13521/20117 [8:40:58<4:27:16,  2.43s/it] 67%|███████████████████████████████████████████████████████                           | 13522/20117 [8:41:01<4:26:41,  2.43s/it] 67%|███████████████████████████████████████████████████████                           | 13523/20117 [8:41:03<4:26:16,  2.42s/it] 67%|███████████████████████████████████████████████████████▏                          | 13524/20117 [8:41:05<4:23:41,  2.40s/it] 67%|███████████████████████████████████████████████████████▏                          | 13525/20117 [8:41:08<4:24:36,  2.41s/it] 67%|███████████████████████████████████████████████████████▏                          | 13526/20117 [8:41:10<4:25:41,  2.42s/it] 67%|███████████████████████████████████████████████████████▏                          | 13527/20117 [8:41:13<4:22:24,  2.39s/it] 67%|███████████████████████████████████████████████████████▏                          | 13528/20117 [8:41:15<4:23:33,  2.40s/it] 67%|███████████████████████████████████████████████████████▏                          | 13529/20117 [8:41:17<4:23:27,  2.40s/it] 67%|███████████████████████████████████████████████████████▏                          | 13530/20117 [8:41:20<4:22:35,  2.39s/it]                                                                                                                                 {'loss': 0.14, 'grad_norm': 0.6201068162918091, 'learning_rate': 4.885815928072176e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.28, 'epoch': 1.35}
 67%|███████████████████████████████████████████████████████▏                          | 13530/20117 [8:41:20<4:22:35,  2.39s/it] 67%|███████████████████████████████████████████████████████▏                          | 13531/20117 [8:41:22<4:24:23,  2.41s/it] 67%|███████████████████████████████████████████████████████▏                          | 13532/20117 [8:41:25<4:24:35,  2.41s/it] 67%|███████████████████████████████████████████████████████▏                          | 13533/20117 [8:41:27<4:23:37,  2.40s/it] 67%|███████████████████████████████████████████████████████▏                          | 13534/20117 [8:41:29<4:21:36,  2.38s/it] 67%|███████████████████████████████████████████████████████▏                          | 13535/20117 [8:41:32<4:19:04,  2.36s/it] 67%|███████████████████████████████████████████████████████▏                          | 13536/20117 [8:41:34<4:16:37,  2.34s/it] 67%|███████████████████████████████████████████████████████▏                          | 13537/20117 [8:41:36<4:21:46,  2.39s/it] 67%|███████████████████████████████████████████████████████▏                          | 13538/20117 [8:41:39<4:23:40,  2.40s/it] 67%|███████████████████████████████████████████████████████▏                          | 13539/20117 [8:41:41<4:21:23,  2.38s/it] 67%|███████████████████████████████████████████████████████▏                          | 13540/20117 [8:41:44<4:18:25,  2.36s/it]                                                                                                                                 {'loss': 0.1536, 'grad_norm': 0.5328193306922913, 'learning_rate': 4.872335343131088e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.94, 'epoch': 1.35}
 67%|███████████████████████████████████████████████████████▏                          | 13540/20117 [8:41:44<4:18:25,  2.36s/it] 67%|███████████████████████████████████████████████████████▏                          | 13541/20117 [8:41:46<4:18:13,  2.36s/it] 67%|███████████████████████████████████████████████████████▏                          | 13542/20117 [8:41:48<4:18:08,  2.36s/it] 67%|███████████████████████████████████████████████████████▏                          | 13543/20117 [8:41:51<4:18:43,  2.36s/it] 67%|███████████████████████████████████████████████████████▏                          | 13544/20117 [8:41:53<4:16:11,  2.34s/it] 67%|███████████████████████████████████████████████████████▏                          | 13545/20117 [8:41:55<4:17:57,  2.36s/it] 67%|███████████████████████████████████████████████████████▏                          | 13546/20117 [8:41:58<4:17:41,  2.35s/it] 67%|███████████████████████████████████████████████████████▏                          | 13547/20117 [8:42:00<4:15:26,  2.33s/it] 67%|███████████████████████████████████████████████████████▏                          | 13548/20117 [8:42:02<4:10:49,  2.29s/it] 67%|███████████████████████████████████████████████████████▏                          | 13549/20117 [8:42:04<4:09:33,  2.28s/it] 67%|███████████████████████████████████████████████████████▏                          | 13550/20117 [8:42:07<4:12:50,  2.31s/it]                                                                                                                                 {'loss': 0.1956, 'grad_norm': 0.6596990823745728, 'learning_rate': 4.8588673887118054e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.18, 'epoch': 1.35}
 67%|███████████████████████████████████████████████████████▏                          | 13550/20117 [8:42:07<4:12:50,  2.31s/it] 67%|███████████████████████████████████████████████████████▏                          | 13551/20117 [8:42:09<4:18:42,  2.36s/it] 67%|███████████████████████████████████████████████████████▏                          | 13552/20117 [8:42:12<4:20:34,  2.38s/it] 67%|███████████████████████████████████████████████████████▏                          | 13553/20117 [8:42:14<4:15:49,  2.34s/it] 67%|███████████████████████████████████████████████████████▏                          | 13554/20117 [8:42:16<4:11:36,  2.30s/it] 67%|███████████████████████████████████████████████████████▎                          | 13555/20117 [8:42:18<4:07:22,  2.26s/it] 67%|███████████████████████████████████████████████████████▎                          | 13556/20117 [8:42:20<4:04:55,  2.24s/it] 67%|███████████████████████████████████████████████████████▎                          | 13557/20117 [8:42:23<4:05:01,  2.24s/it] 67%|███████████████████████████████████████████████████████▎                          | 13558/20117 [8:42:25<4:03:17,  2.23s/it] 67%|███████████████████████████████████████████████████████▎                          | 13559/20117 [8:42:27<4:02:22,  2.22s/it] 67%|███████████████████████████████████████████████████████▎                          | 13560/20117 [8:42:29<4:03:04,  2.22s/it]                                                                                                                                 {'loss': 0.1762, 'grad_norm': 0.7473700642585754, 'learning_rate': 4.845412097988752e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.86, 'epoch': 1.35}
 67%|███████████████████████████████████████████████████████▎                          | 13560/20117 [8:42:29<4:03:04,  2.22s/it] 67%|███████████████████████████████████████████████████████▎                          | 13561/20117 [8:42:32<4:03:35,  2.23s/it] 67%|███████████████████████████████████████████████████████▎                          | 13562/20117 [8:42:34<4:03:24,  2.23s/it] 67%|███████████████████████████████████████████████████████▎                          | 13563/20117 [8:42:36<4:02:00,  2.22s/it] 67%|███████████████████████████████████████████████████████▎                          | 13564/20117 [8:42:38<4:02:11,  2.22s/it] 67%|███████████████████████████████████████████████████████▎                          | 13565/20117 [8:42:40<4:01:48,  2.21s/it] 67%|███████████████████████████████████████████████████████▎                          | 13566/20117 [8:42:43<4:10:05,  2.29s/it] 67%|███████████████████████████████████████████████████████▎                          | 13567/20117 [8:42:45<4:08:55,  2.28s/it] 67%|███████████████████████████████████████████████████████▎                          | 13568/20117 [8:42:47<4:06:01,  2.25s/it] 67%|███████████████████████████████████████████████████████▎                          | 13569/20117 [8:42:50<4:04:08,  2.24s/it] 67%|███████████████████████████████████████████████████████▎                          | 13570/20117 [8:42:52<4:03:20,  2.23s/it]                                                                                                                                 {'loss': 0.169, 'grad_norm': 0.544845700263977, 'learning_rate': 4.831969504105145e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.86, 'epoch': 1.35}
 67%|███████████████████████████████████████████████████████▎                          | 13570/20117 [8:42:52<4:03:20,  2.23s/it] 67%|███████████████████████████████████████████████████████▎                          | 13571/20117 [8:42:54<4:02:22,  2.22s/it] 67%|███████████████████████████████████████████████████████▎                          | 13572/20117 [8:42:56<4:01:55,  2.22s/it] 67%|███████████████████████████████████████████████████████▎                          | 13573/20117 [8:42:58<4:00:47,  2.21s/it] 67%|███████████████████████████████████████████████████████▎                          | 13574/20117 [8:43:01<4:00:01,  2.20s/it] 67%|███████████████████████████████████████████████████████▎                          | 13575/20117 [8:43:03<4:01:14,  2.21s/it] 67%|███████████████████████████████████████████████████████▎                          | 13576/20117 [8:43:05<4:00:50,  2.21s/it] 67%|███████████████████████████████████████████████████████▎                          | 13577/20117 [8:43:07<4:01:53,  2.22s/it] 67%|███████████████████████████████████████████████████████▎                          | 13578/20117 [8:43:10<4:03:33,  2.23s/it] 68%|███████████████████████████████████████████████████████▎                          | 13579/20117 [8:43:12<4:04:13,  2.24s/it] 68%|███████████████████████████████████████████████████████▎                          | 13580/20117 [8:43:14<4:01:50,  2.22s/it]                                                                                                                                 {'loss': 0.1178, 'grad_norm': 0.48342615365982056, 'learning_rate': 4.818539640172941e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.38, 'epoch': 1.35}
 68%|███████████████████████████████████████████████████████▎                          | 13580/20117 [8:43:14<4:01:50,  2.22s/it] 68%|███████████████████████████████████████████████████████▎                          | 13581/20117 [8:43:16<4:02:55,  2.23s/it] 68%|███████████████████████████████████████████████████████▎                          | 13582/20117 [8:43:18<4:01:28,  2.22s/it] 68%|███████████████████████████████████████████████████████▎                          | 13583/20117 [8:43:21<4:00:03,  2.20s/it] 68%|███████████████████████████████████████████████████████▎                          | 13584/20117 [8:43:23<4:00:56,  2.21s/it] 68%|███████████████████████████████████████████████████████▎                          | 13585/20117 [8:43:25<4:02:12,  2.22s/it] 68%|███████████████████████████████████████████████████████▍                          | 13586/20117 [8:43:27<4:00:00,  2.20s/it] 68%|███████████████████████████████████████████████████████▍                          | 13587/20117 [8:43:29<4:02:27,  2.23s/it] 68%|███████████████████████████████████████████████████████▍                          | 13588/20117 [8:43:32<4:04:41,  2.25s/it] 68%|███████████████████████████████████████████████████████▍                          | 13589/20117 [8:43:34<4:04:47,  2.25s/it] 68%|███████████████████████████████████████████████████████▍                          | 13590/20117 [8:43:36<4:04:59,  2.25s/it]                                                                                                                                 {'loss': 0.1605, 'grad_norm': 0.5576662421226501, 'learning_rate': 4.805122539272725e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.46, 'epoch': 1.35}
 68%|███████████████████████████████████████████████████████▍                          | 13590/20117 [8:43:36<4:04:59,  2.25s/it] 68%|███████████████████████████████████████████████████████▍                          | 13591/20117 [8:43:39<4:07:31,  2.28s/it] 68%|███████████████████████████████████████████████████████▍                          | 13592/20117 [8:43:41<4:05:00,  2.25s/it] 68%|███████████████████████████████████████████████████████▍                          | 13593/20117 [8:43:43<4:08:12,  2.28s/it] 68%|███████████████████████████████████████████████████████▍                          | 13594/20117 [8:43:45<4:07:46,  2.28s/it] 68%|███████████████████████████████████████████████████████▍                          | 13595/20117 [8:43:48<4:10:00,  2.30s/it] 68%|███████████████████████████████████████████████████████▍                          | 13596/20117 [8:43:50<4:13:23,  2.33s/it] 68%|███████████████████████████████████████████████████████▍                          | 13597/20117 [8:43:52<4:11:13,  2.31s/it] 68%|███████████████████████████████████████████████████████▍                          | 13598/20117 [8:43:55<4:08:55,  2.29s/it] 68%|███████████████████████████████████████████████████████▍                          | 13599/20117 [8:43:57<4:09:52,  2.30s/it] 68%|███████████████████████████████████████████████████████▍                          | 13600/20117 [8:43:59<4:07:13,  2.28s/it]                                                                                                                                 {'loss': 0.2546, 'grad_norm': 0.6246042847633362, 'learning_rate': 4.791718234453663e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.86, 'epoch': 1.35}
 68%|███████████████████████████████████████████████████████▍                          | 13600/20117 [8:43:59<4:07:13,  2.28s/it] 68%|███████████████████████████████████████████████████████▍                          | 13601/20117 [8:44:02<4:07:08,  2.28s/it] 68%|███████████████████████████████████████████████████████▍                          | 13602/20117 [8:44:04<4:05:56,  2.26s/it] 68%|███████████████████████████████████████████████████████▍                          | 13603/20117 [8:44:06<4:05:16,  2.26s/it] 68%|███████████████████████████████████████████████████████▍                          | 13604/20117 [8:44:08<4:05:43,  2.26s/it] 68%|███████████████████████████████████████████████████████▍                          | 13605/20117 [8:44:11<4:05:42,  2.26s/it] 68%|███████████████████████████████████████████████████████▍                          | 13606/20117 [8:44:13<4:03:20,  2.24s/it] 68%|███████████████████████████████████████████████████████▍                          | 13607/20117 [8:44:15<4:02:02,  2.23s/it] 68%|███████████████████████████████████████████████████████▍                          | 13608/20117 [8:44:17<4:00:22,  2.22s/it] 68%|███████████████████████████████████████████████████████▍                          | 13609/20117 [8:44:19<3:59:58,  2.21s/it] 68%|███████████████████████████████████████████████████████▍                          | 13610/20117 [8:44:21<3:58:53,  2.20s/it]                                                                                                                                 {'loss': 0.1727, 'grad_norm': 0.31434324383735657, 'learning_rate': 4.7783267587333794e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.72, 'epoch': 1.35}
 68%|███████████████████████████████████████████████████████▍                          | 13610/20117 [8:44:21<3:58:53,  2.20s/it] 68%|███████████████████████████████████████████████████████▍                          | 13611/20117 [8:44:24<4:01:53,  2.23s/it] 68%|███████████████████████████████████████████████████████▍                          | 13612/20117 [8:44:26<4:01:13,  2.23s/it] 68%|███████████████████████████████████████████████████████▍                          | 13613/20117 [8:44:28<4:00:28,  2.22s/it] 68%|███████████████████████████████████████████████████████▍                          | 13614/20117 [8:44:30<4:02:14,  2.24s/it] 68%|███████████████████████████████████████████████████████▍                          | 13615/20117 [8:44:33<4:01:04,  2.22s/it] 68%|███████████████████████████████████████████████████████▌                          | 13616/20117 [8:44:35<4:03:21,  2.25s/it] 68%|███████████████████████████████████████████████████████▌                          | 13617/20117 [8:44:37<4:04:50,  2.26s/it] 68%|███████████████████████████████████████████████████████▌                          | 13618/20117 [8:44:40<4:04:58,  2.26s/it] 68%|███████████████████████████████████████████████████████▌                          | 13619/20117 [8:44:42<4:02:44,  2.24s/it] 68%|███████████████████████████████████████████████████████▌                          | 13620/20117 [8:44:44<4:12:35,  2.33s/it]                                                                                                                                 {'loss': 0.1753, 'grad_norm': 0.38281211256980896, 'learning_rate': 4.764948145097919e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 302.55, 'epoch': 1.35}
 68%|███████████████████████████████████████████████████████▌                          | 13620/20117 [8:44:44<4:12:35,  2.33s/it] 68%|███████████████████████████████████████████████████████▌                          | 13621/20117 [8:44:47<4:09:48,  2.31s/it] 68%|███████████████████████████████████████████████████████▌                          | 13622/20117 [8:44:49<4:08:49,  2.30s/it] 68%|███████████████████████████████████████████████████████▌                          | 13623/20117 [8:44:51<4:10:46,  2.32s/it] 68%|███████████████████████████████████████████████████████▌                          | 13624/20117 [8:44:53<4:06:11,  2.28s/it] 68%|███████████████████████████████████████████████████████▌                          | 13625/20117 [8:44:56<4:06:02,  2.27s/it] 68%|███████████████████████████████████████████████████████▌                          | 13626/20117 [8:44:58<4:03:15,  2.25s/it] 68%|███████████████████████████████████████████████████████▌                          | 13627/20117 [8:45:00<4:01:33,  2.23s/it] 68%|███████████████████████████████████████████████████████▌                          | 13628/20117 [8:45:02<4:00:11,  2.22s/it] 68%|███████████████████████████████████████████████████████▌                          | 13629/20117 [8:45:04<4:00:49,  2.23s/it] 68%|███████████████████████████████████████████████████████▌                          | 13630/20117 [8:45:07<4:01:56,  2.24s/it]                                                                                                                                 {'loss': 0.1443, 'grad_norm': 0.3556180000305176, 'learning_rate': 4.7515824265016276e-05, 'memory/max_active (GiB)': 20.66, 'memory/max_allocated (GiB)': 20.66, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.41, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▌                          | 13630/20117 [8:45:07<4:01:56,  2.24s/it] 68%|███████████████████████████████████████████████████████▌                          | 13631/20117 [8:45:09<4:00:24,  2.22s/it] 68%|███████████████████████████████████████████████████████▌                          | 13632/20117 [8:45:11<3:59:58,  2.22s/it] 68%|███████████████████████████████████████████████████████▌                          | 13633/20117 [8:45:13<3:58:09,  2.20s/it] 68%|███████████████████████████████████████████████████████▌                          | 13634/20117 [8:45:15<3:59:07,  2.21s/it] 68%|███████████████████████████████████████████████████████▌                          | 13635/20117 [8:45:18<3:59:27,  2.22s/it] 68%|███████████████████████████████████████████████████████▌                          | 13636/20117 [8:45:20<4:04:22,  2.26s/it] 68%|███████████████████████████████████████████████████████▌                          | 13637/20117 [8:45:22<4:04:21,  2.26s/it] 68%|███████████████████████████████████████████████████████▌                          | 13638/20117 [8:45:25<4:02:02,  2.24s/it] 68%|███████████████████████████████████████████████████████▌                          | 13639/20117 [8:45:27<4:00:07,  2.22s/it] 68%|███████████████████████████████████████████████████████▌                          | 13640/20117 [8:45:29<4:00:42,  2.23s/it]                                                                                                                                 {'loss': 0.1323, 'grad_norm': 0.29845160245895386, 'learning_rate': 4.7382296358670976e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 281.42, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▌                          | 13640/20117 [8:45:29<4:00:42,  2.23s/it] 68%|███████████████████████████████████████████████████████▌                          | 13641/20117 [8:45:31<4:02:16,  2.24s/it] 68%|███████████████████████████████████████████████████████▌                          | 13642/20117 [8:45:34<4:02:32,  2.25s/it] 68%|███████████████████████████████████████████████████████▌                          | 13643/20117 [8:45:36<4:01:42,  2.24s/it] 68%|███████████████████████████████████████████████████████▌                          | 13644/20117 [8:45:38<4:00:36,  2.23s/it] 68%|███████████████████████████████████████████████████████▌                          | 13645/20117 [8:45:40<3:59:36,  2.22s/it] 68%|███████████████████████████████████████████████████████▌                          | 13646/20117 [8:45:42<3:58:46,  2.21s/it] 68%|███████████████████████████████████████████████████████▋                          | 13647/20117 [8:45:45<3:58:30,  2.21s/it] 68%|███████████████████████████████████████████████████████▋                          | 13648/20117 [8:45:47<3:59:59,  2.23s/it] 68%|███████████████████████████████████████████████████████▋                          | 13649/20117 [8:45:49<4:04:32,  2.27s/it] 68%|███████████████████████████████████████████████████████▋                          | 13650/20117 [8:45:52<4:08:54,  2.31s/it]                                                                                                                                 {'loss': 0.1176, 'grad_norm': 0.4179192781448364, 'learning_rate': 4.724889806085079e-05, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.01, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▋                          | 13650/20117 [8:45:52<4:08:54,  2.31s/it] 68%|███████████████████████████████████████████████████████▋                          | 13651/20117 [8:45:54<4:05:39,  2.28s/it] 68%|███████████████████████████████████████████████████████▋                          | 13652/20117 [8:45:56<4:02:56,  2.25s/it] 68%|███████████████████████████████████████████████████████▋                          | 13653/20117 [8:45:58<4:04:51,  2.27s/it] 68%|███████████████████████████████████████████████████████▋                          | 13654/20117 [8:46:01<4:03:17,  2.26s/it] 68%|███████████████████████████████████████████████████████▋                          | 13655/20117 [8:46:03<4:03:12,  2.26s/it] 68%|███████████████████████████████████████████████████████▋                          | 13656/20117 [8:46:05<4:02:31,  2.25s/it] 68%|███████████████████████████████████████████████████████▋                          | 13657/20117 [8:46:07<4:03:10,  2.26s/it] 68%|███████████████████████████████████████████████████████▋                          | 13658/20117 [8:46:10<4:04:19,  2.27s/it] 68%|███████████████████████████████████████████████████████▋                          | 13659/20117 [8:46:12<4:03:01,  2.26s/it] 68%|███████████████████████████████████████████████████████▋                          | 13660/20117 [8:46:14<4:03:01,  2.26s/it]                                                                                                                                 {'loss': 0.1159, 'grad_norm': 0.46176135540008545, 'learning_rate': 4.711562970014384e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.9, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▋                          | 13660/20117 [8:46:14<4:03:01,  2.26s/it] 68%|███████████████████████████████████████████████████████▋                          | 13661/20117 [8:46:16<4:01:01,  2.24s/it] 68%|███████████████████████████████████████████████████████▋                          | 13662/20117 [8:46:18<3:59:23,  2.23s/it] 68%|███████████████████████████████████████████████████████▋                          | 13663/20117 [8:46:21<3:57:41,  2.21s/it] 68%|███████████████████████████████████████████████████████▋                          | 13664/20117 [8:46:23<3:56:14,  2.20s/it] 68%|███████████████████████████████████████████████████████▋                          | 13665/20117 [8:46:25<3:56:27,  2.20s/it] 68%|███████████████████████████████████████████████████████▋                          | 13666/20117 [8:46:27<3:58:23,  2.22s/it] 68%|███████████████████████████████████████████████████████▋                          | 13667/20117 [8:46:29<3:58:40,  2.22s/it] 68%|███████████████████████████████████████████████████████▋                          | 13668/20117 [8:46:32<3:58:45,  2.22s/it] 68%|███████████████████████████████████████████████████████▋                          | 13669/20117 [8:46:34<3:56:52,  2.20s/it] 68%|███████████████████████████████████████████████████████▋                          | 13670/20117 [8:46:36<3:57:35,  2.21s/it]                                                                                                                                 {'loss': 0.1194, 'grad_norm': 0.34104305505752563, 'learning_rate': 4.6982491604818314e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.48, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▋                          | 13670/20117 [8:46:36<3:57:35,  2.21s/it] 68%|███████████████████████████████████████████████████████▋                          | 13671/20117 [8:46:39<4:06:44,  2.30s/it] 68%|███████████████████████████████████████████████████████▋                          | 13672/20117 [8:46:41<4:03:54,  2.27s/it] 68%|███████████████████████████████████████████████████████▋                          | 13673/20117 [8:46:43<4:02:02,  2.25s/it] 68%|███████████████████████████████████████████████████████▋                          | 13674/20117 [8:46:45<3:59:57,  2.23s/it] 68%|███████████████████████████████████████████████████████▋                          | 13675/20117 [8:46:48<4:02:56,  2.26s/it] 68%|███████████████████████████████████████████████████████▋                          | 13676/20117 [8:46:50<4:03:45,  2.27s/it] 68%|███████████████████████████████████████████████████████▋                          | 13677/20117 [8:46:52<4:00:34,  2.24s/it] 68%|███████████████████████████████████████████████████████▊                          | 13678/20117 [8:46:54<4:00:29,  2.24s/it] 68%|███████████████████████████████████████████████████████▊                          | 13679/20117 [8:46:56<3:57:39,  2.21s/it] 68%|███████████████████████████████████████████████████████▊                          | 13680/20117 [8:46:59<3:56:54,  2.21s/it]                                                                                                                                 {'loss': 0.2221, 'grad_norm': 0.8108484148979187, 'learning_rate': 4.684948410282146e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.12, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▊                          | 13680/20117 [8:46:59<3:56:54,  2.21s/it] 68%|███████████████████████████████████████████████████████▊                          | 13681/20117 [8:47:01<3:55:23,  2.19s/it] 68%|███████████████████████████████████████████████████████▊                          | 13682/20117 [8:47:03<3:55:38,  2.20s/it] 68%|███████████████████████████████████████████████████████▊                          | 13683/20117 [8:47:05<3:56:51,  2.21s/it] 68%|███████████████████████████████████████████████████████▊                          | 13684/20117 [8:47:07<3:56:54,  2.21s/it] 68%|███████████████████████████████████████████████████████▊                          | 13685/20117 [8:47:10<3:57:31,  2.22s/it] 68%|███████████████████████████████████████████████████████▊                          | 13686/20117 [8:47:12<3:56:06,  2.20s/it] 68%|███████████████████████████████████████████████████████▊                          | 13687/20117 [8:47:14<3:55:33,  2.20s/it] 68%|███████████████████████████████████████████████████████▊                          | 13688/20117 [8:47:16<3:57:19,  2.21s/it] 68%|███████████████████████████████████████████████████████▊                          | 13689/20117 [8:47:18<3:57:12,  2.21s/it] 68%|███████████████████████████████████████████████████████▊                          | 13690/20117 [8:47:21<3:56:24,  2.21s/it]                                                                                                                                 {'loss': 0.1656, 'grad_norm': 0.5723569393157959, 'learning_rate': 4.671660752177892e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.88, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▊                          | 13690/20117 [8:47:21<3:56:24,  2.21s/it] 68%|███████████████████████████████████████████████████████▊                          | 13691/20117 [8:47:23<3:58:53,  2.23s/it] 68%|███████████████████████████████████████████████████████▊                          | 13692/20117 [8:47:25<3:58:51,  2.23s/it] 68%|███████████████████████████████████████████████████████▊                          | 13693/20117 [8:47:27<3:59:15,  2.23s/it] 68%|███████████████████████████████████████████████████████▊                          | 13694/20117 [8:47:30<3:58:02,  2.22s/it] 68%|███████████████████████████████████████████████████████▊                          | 13695/20117 [8:47:32<4:00:01,  2.24s/it] 68%|███████████████████████████████████████████████████████▊                          | 13696/20117 [8:47:34<4:00:46,  2.25s/it] 68%|███████████████████████████████████████████████████████▊                          | 13697/20117 [8:47:36<4:02:55,  2.27s/it] 68%|███████████████████████████████████████████████████████▊                          | 13698/20117 [8:47:39<4:01:39,  2.26s/it] 68%|███████████████████████████████████████████████████████▊                          | 13699/20117 [8:47:41<4:00:57,  2.25s/it] 68%|███████████████████████████████████████████████████████▊                          | 13700/20117 [8:47:43<3:59:36,  2.24s/it]                                                                                                                                 {'loss': 0.145, 'grad_norm': 0.5756139755249023, 'learning_rate': 4.658386218899371e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.13, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▊                          | 13700/20117 [8:47:43<3:59:36,  2.24s/it] 68%|███████████████████████████████████████████████████████▊                          | 13701/20117 [8:47:46<4:04:17,  2.28s/it] 68%|███████████████████████████████████████████████████████▊                          | 13702/20117 [8:47:48<4:04:47,  2.29s/it] 68%|███████████████████████████████████████████████████████▊                          | 13703/20117 [8:47:50<4:01:37,  2.26s/it] 68%|███████████████████████████████████████████████████████▊                          | 13704/20117 [8:47:52<3:59:25,  2.24s/it] 68%|███████████████████████████████████████████████████████▊                          | 13705/20117 [8:47:54<3:58:08,  2.23s/it] 68%|███████████████████████████████████████████████████████▊                          | 13706/20117 [8:47:57<3:56:51,  2.22s/it] 68%|███████████████████████████████████████████████████████▊                          | 13707/20117 [8:47:59<3:57:33,  2.22s/it] 68%|███████████████████████████████████████████████████████▉                          | 13708/20117 [8:48:01<3:56:34,  2.21s/it] 68%|███████████████████████████████████████████████████████▉                          | 13709/20117 [8:48:03<3:56:22,  2.21s/it] 68%|███████████████████████████████████████████████████████▉                          | 13710/20117 [8:48:06<3:56:53,  2.22s/it]                                                                                                                                 {'loss': 0.1011, 'grad_norm': 0.3543594777584076, 'learning_rate': 4.645124843144574e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 308.42, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▉                          | 13710/20117 [8:48:06<3:56:53,  2.22s/it] 68%|███████████████████████████████████████████████████████▉                          | 13711/20117 [8:48:08<3:56:45,  2.22s/it] 68%|███████████████████████████████████████████████████████▉                          | 13712/20117 [8:48:10<3:57:23,  2.22s/it] 68%|███████████████████████████████████████████████████████▉                          | 13713/20117 [8:48:12<3:56:15,  2.21s/it] 68%|███████████████████████████████████████████████████████▉                          | 13714/20117 [8:48:14<3:58:01,  2.23s/it] 68%|███████████████████████████████████████████████████████▉                          | 13715/20117 [8:48:17<3:58:36,  2.24s/it] 68%|███████████████████████████████████████████████████████▉                          | 13716/20117 [8:48:19<3:58:47,  2.24s/it] 68%|███████████████████████████████████████████████████████▉                          | 13717/20117 [8:48:21<3:58:47,  2.24s/it] 68%|███████████████████████████████████████████████████████▉                          | 13718/20117 [8:48:23<4:00:10,  2.25s/it] 68%|███████████████████████████████████████████████████████▉                          | 13719/20117 [8:48:26<3:58:38,  2.24s/it] 68%|███████████████████████████████████████████████████████▉                          | 13720/20117 [8:48:28<3:57:55,  2.23s/it]                                                                                                                                 {'loss': 0.1649, 'grad_norm': 0.210982084274292, 'learning_rate': 4.631876657579062e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.26, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▉                          | 13720/20117 [8:48:28<3:57:55,  2.23s/it] 68%|███████████████████████████████████████████████████████▉                          | 13721/20117 [8:48:30<3:58:14,  2.23s/it] 68%|███████████████████████████████████████████████████████▉                          | 13722/20117 [8:48:32<3:59:45,  2.25s/it] 68%|███████████████████████████████████████████████████████▉                          | 13723/20117 [8:48:35<4:12:14,  2.37s/it] 68%|███████████████████████████████████████████████████████▉                          | 13724/20117 [8:48:37<4:09:08,  2.34s/it] 68%|███████████████████████████████████████████████████████▉                          | 13725/20117 [8:48:40<4:09:00,  2.34s/it] 68%|███████████████████████████████████████████████████████▉                          | 13726/20117 [8:48:42<4:07:21,  2.32s/it] 68%|███████████████████████████████████████████████████████▉                          | 13727/20117 [8:48:44<4:06:53,  2.32s/it] 68%|███████████████████████████████████████████████████████▉                          | 13728/20117 [8:48:46<4:03:44,  2.29s/it] 68%|███████████████████████████████████████████████████████▉                          | 13729/20117 [8:48:49<4:01:40,  2.27s/it] 68%|███████████████████████████████████████████████████████▉                          | 13730/20117 [8:48:51<4:01:05,  2.26s/it]                                                                                                                                 {'loss': 0.1152, 'grad_norm': 0.4334322214126587, 'learning_rate': 4.6186416948359256e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 384.14, 'epoch': 1.36}
 68%|███████████████████████████████████████████████████████▉                          | 13730/20117 [8:48:51<4:01:05,  2.26s/it] 68%|███████████████████████████████████████████████████████▉                          | 13731/20117 [8:48:53<4:00:58,  2.26s/it] 68%|███████████████████████████████████████████████████████▉                          | 13732/20117 [8:48:55<3:59:33,  2.25s/it] 68%|███████████████████████████████████████████████████████▉                          | 13733/20117 [8:48:58<3:59:40,  2.25s/it] 68%|███████████████████████████████████████████████████████▉                          | 13734/20117 [8:49:00<4:01:19,  2.27s/it] 68%|███████████████████████████████████████████████████████▉                          | 13735/20117 [8:49:02<4:02:41,  2.28s/it] 68%|███████████████████████████████████████████████████████▉                          | 13736/20117 [8:49:05<4:01:19,  2.27s/it] 68%|███████████████████████████████████████████████████████▉                          | 13737/20117 [8:49:07<4:01:52,  2.27s/it] 68%|███████████████████████████████████████████████████████▉                          | 13738/20117 [8:49:09<4:00:34,  2.26s/it] 68%|████████████████████████████████████████████████████████                          | 13739/20117 [8:49:11<4:02:41,  2.28s/it] 68%|████████████████████████████████████████████████████████                          | 13740/20117 [8:49:14<4:01:15,  2.27s/it]                                                                                                                                 {'loss': 0.1034, 'grad_norm': 0.7411201000213623, 'learning_rate': 4.6054199875156665e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.13, 'epoch': 1.37}
 68%|████████████████████████████████████████████████████████                          | 13740/20117 [8:49:14<4:01:15,  2.27s/it] 68%|████████████████████████████████████████████████████████                          | 13741/20117 [8:49:16<4:00:37,  2.26s/it] 68%|████████████████████████████████████████████████████████                          | 13742/20117 [8:49:18<4:00:02,  2.26s/it] 68%|████████████████████████████████████████████████████████                          | 13743/20117 [8:49:20<4:00:15,  2.26s/it] 68%|████████████████████████████████████████████████████████                          | 13744/20117 [8:49:23<4:00:41,  2.27s/it] 68%|████████████████████████████████████████████████████████                          | 13745/20117 [8:49:25<4:00:13,  2.26s/it] 68%|████████████████████████████████████████████████████████                          | 13746/20117 [8:49:27<3:59:23,  2.25s/it] 68%|████████████████████████████████████████████████████████                          | 13747/20117 [8:49:29<3:59:01,  2.25s/it] 68%|████████████████████████████████████████████████████████                          | 13748/20117 [8:49:32<3:58:12,  2.24s/it] 68%|████████████████████████████████████████████████████████                          | 13749/20117 [8:49:34<3:59:56,  2.26s/it] 68%|████████████████████████████████████████████████████████                          | 13750/20117 [8:49:36<3:59:32,  2.26s/it]                                                                                                                                 {'loss': 0.1494, 'grad_norm': 0.6421767473220825, 'learning_rate': 4.5922115681861536e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.47, 'epoch': 1.37}
 68%|████████████████████████████████████████████████████████                          | 13750/20117 [8:49:36<3:59:32,  2.26s/it] 68%|████████████████████████████████████████████████████████                          | 13751/20117 [8:49:38<3:58:29,  2.25s/it] 68%|████████████████████████████████████████████████████████                          | 13752/20117 [8:49:41<3:58:38,  2.25s/it] 68%|████████████████████████████████████████████████████████                          | 13753/20117 [8:49:43<3:57:23,  2.24s/it] 68%|████████████████████████████████████████████████████████                          | 13754/20117 [8:49:45<3:56:13,  2.23s/it] 68%|████████████████████████████████████████████████████████                          | 13755/20117 [8:49:47<3:56:15,  2.23s/it] 68%|████████████████████████████████████████████████████████                          | 13756/20117 [8:49:50<3:57:16,  2.24s/it] 68%|████████████████████████████████████████████████████████                          | 13757/20117 [8:49:52<3:58:01,  2.25s/it] 68%|████████████████████████████████████████████████████████                          | 13758/20117 [8:49:54<3:57:37,  2.24s/it] 68%|████████████████████████████████████████████████████████                          | 13759/20117 [8:49:56<3:57:36,  2.24s/it] 68%|████████████████████████████████████████████████████████                          | 13760/20117 [8:49:58<3:56:04,  2.23s/it]                                                                                                                                 {'loss': 0.1698, 'grad_norm': 0.6301234364509583, 'learning_rate': 4.579016469382505e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.88, 'epoch': 1.37}
 68%|████████████████████████████████████████████████████████                          | 13760/20117 [8:49:58<3:56:04,  2.23s/it] 68%|████████████████████████████████████████████████████████                          | 13761/20117 [8:50:01<3:56:40,  2.23s/it] 68%|████████████████████████████████████████████████████████                          | 13762/20117 [8:50:03<3:56:54,  2.24s/it] 68%|████████████████████████████████████████████████████████                          | 13763/20117 [8:50:05<3:57:56,  2.25s/it] 68%|████████████████████████████████████████████████████████                          | 13764/20117 [8:50:08<3:58:08,  2.25s/it] 68%|████████████████████████████████████████████████████████                          | 13765/20117 [8:50:10<4:00:09,  2.27s/it] 68%|████████████████████████████████████████████████████████                          | 13766/20117 [8:50:12<4:00:36,  2.27s/it] 68%|████████████████████████████████████████████████████████                          | 13767/20117 [8:50:14<4:00:47,  2.28s/it] 68%|████████████████████████████████████████████████████████                          | 13768/20117 [8:50:17<4:01:40,  2.28s/it] 68%|████████████████████████████████████████████████████████                          | 13769/20117 [8:50:19<4:01:29,  2.28s/it] 68%|████████████████████████████████████████████████████████▏                         | 13770/20117 [8:50:21<3:59:10,  2.26s/it]                                                                                                                                 {'loss': 0.1712, 'grad_norm': 0.5952703952789307, 'learning_rate': 4.5658347236070445e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.8, 'epoch': 1.37}
 68%|████████████████████████████████████████████████████████▏                         | 13770/20117 [8:50:21<3:59:10,  2.26s/it] 68%|████████████████████████████████████████████████████████▏                         | 13771/20117 [8:50:23<3:57:53,  2.25s/it] 68%|████████████████████████████████████████████████████████▏                         | 13772/20117 [8:50:26<3:56:56,  2.24s/it] 68%|████████████████████████████████████████████████████████▏                         | 13773/20117 [8:50:28<3:56:49,  2.24s/it] 68%|████████████████████████████████████████████████████████▏                         | 13774/20117 [8:50:30<3:55:25,  2.23s/it] 68%|████████████████████████████████████████████████████████▏                         | 13775/20117 [8:50:33<4:06:19,  2.33s/it] 68%|████████████████████████████████████████████████████████▏                         | 13776/20117 [8:50:35<4:03:09,  2.30s/it] 68%|████████████████████████████████████████████████████████▏                         | 13777/20117 [8:50:37<4:00:55,  2.28s/it] 68%|████████████████████████████████████████████████████████▏                         | 13778/20117 [8:50:39<4:00:23,  2.28s/it] 68%|████████████████████████████████████████████████████████▏                         | 13779/20117 [8:50:42<4:00:34,  2.28s/it] 68%|████████████████████████████████████████████████████████▏                         | 13780/20117 [8:50:44<3:59:13,  2.26s/it]                                                                                                                                 {'loss': 0.1618, 'grad_norm': 0.37993893027305603, 'learning_rate': 4.5526663633292e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 278.0, 'epoch': 1.37}
 68%|████████████████████████████████████████████████████████▏                         | 13780/20117 [8:50:44<3:59:13,  2.26s/it] 69%|████████████████████████████████████████████████████████▏                         | 13781/20117 [8:50:46<3:59:19,  2.27s/it] 69%|████████████████████████████████████████████████████████▏                         | 13782/20117 [8:50:48<3:58:10,  2.26s/it] 69%|████████████████████████████████████████████████████████▏                         | 13783/20117 [8:50:51<4:00:07,  2.27s/it] 69%|████████████████████████████████████████████████████████▏                         | 13784/20117 [8:50:53<4:00:35,  2.28s/it] 69%|████████████████████████████████████████████████████████▏                         | 13785/20117 [8:50:55<3:58:24,  2.26s/it] 69%|████████████████████████████████████████████████████████▏                         | 13786/20117 [8:50:57<3:58:19,  2.26s/it] 69%|████████████████████████████████████████████████████████▏                         | 13787/20117 [8:51:00<3:57:25,  2.25s/it] 69%|████████████████████████████████████████████████████████▏                         | 13788/20117 [8:51:02<3:56:25,  2.24s/it] 69%|████████████████████████████████████████████████████████▏                         | 13789/20117 [8:51:04<3:57:45,  2.25s/it] 69%|████████████████████████████████████████████████████████▏                         | 13790/20117 [8:51:06<3:56:03,  2.24s/it]                                                                                                                                 {'loss': 0.1806, 'grad_norm': 0.5828644633293152, 'learning_rate': 4.5395114209854195e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.91, 'epoch': 1.37}
 69%|████████████████████████████████████████████████████████▏                         | 13790/20117 [8:51:06<3:56:03,  2.24s/it] 69%|████████████████████████████████████████████████████████▏                         | 13791/20117 [8:51:09<3:54:51,  2.23s/it] 69%|████████████████████████████████████████████████████████▏                         | 13792/20117 [8:51:11<4:00:00,  2.28s/it] 69%|████████████████████████████████████████████████████████▏                         | 13793/20117 [8:51:13<4:00:25,  2.28s/it] 69%|████████████████████████████████████████████████████████▏                         | 13794/20117 [8:51:16<4:00:51,  2.29s/it] 69%|████████████████████████████████████████████████████████▏                         | 13795/20117 [8:51:18<3:59:52,  2.28s/it] 69%|████████████████████████████████████████████████████████▏                         | 13796/20117 [8:51:20<3:58:31,  2.26s/it] 69%|████████████████████████████████████████████████████████▏                         | 13797/20117 [8:51:22<3:59:41,  2.28s/it] 69%|████████████████████████████████████████████████████████▏                         | 13798/20117 [8:51:25<3:57:43,  2.26s/it] 69%|████████████████████████████████████████████████████████▏                         | 13799/20117 [8:51:27<3:56:50,  2.25s/it] 69%|████████████████████████████████████████████████████████▎                         | 13800/20117 [8:51:29<3:55:07,  2.23s/it]                                                                                                                                 {'loss': 0.1802, 'grad_norm': 0.6922226548194885, 'learning_rate': 4.526369928979113e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 392.71, 'epoch': 1.37}
 69%|████████████████████████████████████████████████████████▎                         | 13800/20117 [8:51:29<3:55:07,  2.23s/it] 69%|████████████████████████████████████████████████████████▎                         | 13801/20117 [8:51:31<3:53:33,  2.22s/it] 69%|████████████████████████████████████████████████████████▎                         | 13802/20117 [8:51:33<3:53:31,  2.22s/it] 69%|████████████████████████████████████████████████████████▎                         | 13803/20117 [8:51:36<3:51:47,  2.20s/it] 69%|████████████████████████████████████████████████████████▎                         | 13804/20117 [8:51:38<3:51:21,  2.20s/it] 69%|████████████████████████████████████████████████████████▎                         | 13805/20117 [8:51:40<3:50:58,  2.20s/it] 69%|████████████████████████████████████████████████████████▎                         | 13806/20117 [8:51:42<3:49:57,  2.19s/it] 69%|████████████████████████████████████████████████████████▎                         | 13807/20117 [8:51:44<3:52:11,  2.21s/it] 69%|████████████████████████████████████████████████████████▎                         | 13808/20117 [8:51:47<3:51:33,  2.20s/it] 69%|████████████████████████████████████████████████████████▎                         | 13809/20117 [8:51:49<3:53:49,  2.22s/it] 69%|████████████████████████████████████████████████████████▎                         | 13810/20117 [8:51:51<3:54:27,  2.23s/it]                                                                                                                                 {'loss': 0.1622, 'grad_norm': 0.5355440378189087, 'learning_rate': 4.513241919680546e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.78, 'epoch': 1.37}
 69%|████████████████████████████████████████████████████████▎                         | 13810/20117 [8:51:51<3:54:27,  2.23s/it] 69%|████████████████████████████████████████████████████████▎                         | 13811/20117 [8:51:53<3:55:46,  2.24s/it] 69%|████████████████████████████████████████████████████████▎                         | 13812/20117 [8:51:56<3:54:26,  2.23s/it] 69%|████████████████████████████████████████████████████████▎                         | 13813/20117 [8:51:58<3:53:59,  2.23s/it] 69%|████████████████████████████████████████████████████████▎                         | 13814/20117 [8:52:00<3:53:22,  2.22s/it] 69%|████████████████████████████████████████████████████████▎                         | 13815/20117 [8:52:02<3:53:59,  2.23s/it] 69%|████████████████████████████████████████████████████████▎                         | 13816/20117 [8:52:04<3:53:11,  2.22s/it] 69%|████████████████████████████████████████████████████████▎                         | 13817/20117 [8:52:07<3:52:18,  2.21s/it] 69%|████████████████████████████████████████████████████████▎                         | 13818/20117 [8:52:09<3:54:55,  2.24s/it] 69%|████████████████████████████████████████████████████████▎                         | 13819/20117 [8:52:11<3:55:04,  2.24s/it] 69%|████████████████████████████████████████████████████████▎                         | 13820/20117 [8:52:13<3:57:23,  2.26s/it]                                                                                                                                 {'loss': 0.1239, 'grad_norm': 0.3850402534008026, 'learning_rate': 4.500127425426783e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.59, 'epoch': 1.37}
 69%|████████████████████████████████████████████████████████▎                         | 13820/20117 [8:52:13<3:57:23,  2.26s/it] 69%|████████████████████████████████████████████████████████▎                         | 13821/20117 [8:52:16<3:54:52,  2.24s/it] 69%|████████████████████████████████████████████████████████▎                         | 13822/20117 [8:52:18<3:53:51,  2.23s/it] 69%|████████████████████████████████████████████████████████▎                         | 13823/20117 [8:52:20<3:54:48,  2.24s/it] 69%|████████████████████████████████████████████████████████▎                         | 13824/20117 [8:52:22<3:53:38,  2.23s/it] 69%|████████████████████████████████████████████████████████▎                         | 13825/20117 [8:52:25<3:53:19,  2.22s/it] 69%|████████████████████████████████████████████████████████▎                         | 13826/20117 [8:52:27<4:02:39,  2.31s/it] 69%|████████████████████████████████████████████████████████▎                         | 13827/20117 [8:52:29<4:00:05,  2.29s/it] 69%|████████████████████████████████████████████████████████▎                         | 13828/20117 [8:52:32<3:57:58,  2.27s/it] 69%|████████████████████████████████████████████████████████▎                         | 13829/20117 [8:52:34<3:57:47,  2.27s/it] 69%|████████████████████████████████████████████████████████▎                         | 13830/20117 [8:52:36<3:55:10,  2.24s/it]                                                                                                                                 {'loss': 0.1422, 'grad_norm': 0.5045945644378662, 'learning_rate': 4.4870264785215966e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.11, 'epoch': 1.37}
 69%|████████████████████████████████████████████████████████▎                         | 13830/20117 [8:52:36<3:55:10,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13831/20117 [8:52:38<3:55:00,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13832/20117 [8:52:40<3:54:26,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13833/20117 [8:52:43<3:55:28,  2.25s/it] 69%|████████████████████████████████████████████████████████▍                         | 13834/20117 [8:52:45<3:54:44,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13835/20117 [8:52:47<3:54:01,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13836/20117 [8:52:49<3:53:02,  2.23s/it] 69%|████████████████████████████████████████████████████████▍                         | 13837/20117 [8:52:52<3:52:49,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13838/20117 [8:52:54<3:52:21,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13839/20117 [8:52:56<3:51:46,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13840/20117 [8:52:58<3:52:26,  2.22s/it]                                                                                                                                 {'loss': 0.1577, 'grad_norm': 0.552185595035553, 'learning_rate': 4.4739391112353915e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.68, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▍                         | 13840/20117 [8:52:58<3:52:26,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13841/20117 [8:53:00<3:51:38,  2.21s/it] 69%|████████████████████████████████████████████████████████▍                         | 13842/20117 [8:53:03<3:50:45,  2.21s/it] 69%|████████████████████████████████████████████████████████▍                         | 13843/20117 [8:53:05<3:51:52,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13844/20117 [8:53:07<3:52:33,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13845/20117 [8:53:09<3:52:47,  2.23s/it] 69%|████████████████████████████████████████████████████████▍                         | 13846/20117 [8:53:12<3:52:56,  2.23s/it] 69%|████████████████████████████████████████████████████████▍                         | 13847/20117 [8:53:14<3:52:09,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13848/20117 [8:53:16<3:51:01,  2.21s/it] 69%|████████████████████████████████████████████████████████▍                         | 13849/20117 [8:53:18<3:52:07,  2.22s/it] 69%|████████████████████████████████████████████████████████▍                         | 13850/20117 [8:53:20<3:52:59,  2.23s/it]                                                                                                                                 {'loss': 0.1855, 'grad_norm': 0.5017216205596924, 'learning_rate': 4.460865355805109e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.39, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▍                         | 13850/20117 [8:53:20<3:52:59,  2.23s/it] 69%|████████████████████████████████████████████████████████▍                         | 13851/20117 [8:53:23<3:54:19,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13852/20117 [8:53:25<3:53:40,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13853/20117 [8:53:27<3:54:39,  2.25s/it] 69%|████████████████████████████████████████████████████████▍                         | 13854/20117 [8:53:29<3:54:43,  2.25s/it] 69%|████████████████████████████████████████████████████████▍                         | 13855/20117 [8:53:32<3:55:06,  2.25s/it] 69%|████████████████████████████████████████████████████████▍                         | 13856/20117 [8:53:34<3:54:19,  2.25s/it] 69%|████████████████████████████████████████████████████████▍                         | 13857/20117 [8:53:36<3:53:09,  2.23s/it] 69%|████████████████████████████████████████████████████████▍                         | 13858/20117 [8:53:38<3:54:40,  2.25s/it] 69%|████████████████████████████████████████████████████████▍                         | 13859/20117 [8:53:41<3:54:50,  2.25s/it] 69%|████████████████████████████████████████████████████████▍                         | 13860/20117 [8:53:43<3:53:04,  2.24s/it]                                                                                                                                 {'loss': 0.1298, 'grad_norm': 0.33180108666419983, 'learning_rate': 4.447805244434184e-05, 'memory/max_active (GiB)': 21.41, 'memory/max_allocated (GiB)': 21.41, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 410.56, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▍                         | 13860/20117 [8:53:43<3:53:04,  2.24s/it] 69%|████████████████████████████████████████████████████████▍                         | 13861/20117 [8:53:45<3:54:14,  2.25s/it] 69%|████████████████████████████████████████████████████████▌                         | 13862/20117 [8:53:47<3:55:16,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13863/20117 [8:53:50<3:53:34,  2.24s/it] 69%|████████████████████████████████████████████████████████▌                         | 13864/20117 [8:53:52<3:55:07,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13865/20117 [8:53:54<3:57:27,  2.28s/it] 69%|████████████████████████████████████████████████████████▌                         | 13866/20117 [8:53:57<3:56:59,  2.27s/it] 69%|████████████████████████████████████████████████████████▌                         | 13867/20117 [8:53:59<3:54:49,  2.25s/it] 69%|████████████████████████████████████████████████████████▌                         | 13868/20117 [8:54:01<3:56:22,  2.27s/it] 69%|████████████████████████████████████████████████████████▌                         | 13869/20117 [8:54:03<3:55:43,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13870/20117 [8:54:06<3:54:32,  2.25s/it]                                                                                                                                 {'loss': 0.1828, 'grad_norm': 0.5970960855484009, 'learning_rate': 4.4347588092924206e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.8, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▌                         | 13870/20117 [8:54:06<3:54:32,  2.25s/it] 69%|████████████████████████████████████████████████████████▌                         | 13871/20117 [8:54:08<3:54:50,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13872/20117 [8:54:10<3:54:47,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13873/20117 [8:54:12<3:54:51,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13874/20117 [8:54:15<3:55:48,  2.27s/it] 69%|████████████████████████████████████████████████████████▌                         | 13875/20117 [8:54:17<3:55:36,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13876/20117 [8:54:19<3:55:42,  2.27s/it] 69%|████████████████████████████████████████████████████████▌                         | 13877/20117 [8:54:21<3:54:36,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13878/20117 [8:54:24<3:54:57,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13879/20117 [8:54:26<3:54:45,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13880/20117 [8:54:28<3:55:24,  2.26s/it]                                                                                                                                 {'loss': 0.1518, 'grad_norm': 0.588447630405426, 'learning_rate': 4.421726082515953e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.81, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▌                         | 13880/20117 [8:54:28<3:55:24,  2.26s/it] 69%|████████████████████████████████████████████████████████▌                         | 13881/20117 [8:54:31<4:02:55,  2.34s/it] 69%|████████████████████████████████████████████████████████▌                         | 13882/20117 [8:54:33<3:59:48,  2.31s/it] 69%|████████████████████████████████████████████████████████▌                         | 13883/20117 [8:54:35<3:57:54,  2.29s/it] 69%|████████████████████████████████████████████████████████▌                         | 13884/20117 [8:54:37<3:55:34,  2.27s/it] 69%|████████████████████████████████████████████████████████▌                         | 13885/20117 [8:54:40<3:55:34,  2.27s/it] 69%|████████████████████████████████████████████████████████▌                         | 13886/20117 [8:54:42<3:55:53,  2.27s/it] 69%|████████████████████████████████████████████████████████▌                         | 13887/20117 [8:54:44<3:53:18,  2.25s/it] 69%|████████████████████████████████████████████████████████▌                         | 13888/20117 [8:54:46<3:51:53,  2.23s/it] 69%|████████████████████████████████████████████████████████▌                         | 13889/20117 [8:54:49<3:50:31,  2.22s/it] 69%|████████████████████████████████████████████████████████▌                         | 13890/20117 [8:54:51<3:50:53,  2.22s/it]                                                                                                                                 {'loss': 0.2154, 'grad_norm': 0.5242084860801697, 'learning_rate': 4.4087070962071377e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 424.02, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▌                         | 13890/20117 [8:54:51<3:50:53,  2.22s/it] 69%|████████████████████████████████████████████████████████▌                         | 13891/20117 [8:54:53<3:50:49,  2.22s/it] 69%|████████████████████████████████████████████████████████▋                         | 13892/20117 [8:54:55<3:50:18,  2.22s/it] 69%|████████████████████████████████████████████████████████▋                         | 13893/20117 [8:54:57<3:51:04,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13894/20117 [8:55:00<3:52:17,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13895/20117 [8:55:02<3:51:56,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13896/20117 [8:55:04<3:52:24,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13897/20117 [8:55:06<3:52:10,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13898/20117 [8:55:09<3:50:05,  2.22s/it] 69%|████████████████████████████████████████████████████████▋                         | 13899/20117 [8:55:11<3:50:37,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13900/20117 [8:55:13<3:50:07,  2.22s/it]                                                                                                                                 {'loss': 0.1594, 'grad_norm': 0.4128514528274536, 'learning_rate': 4.395701882434493e-05, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.98, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▋                         | 13900/20117 [8:55:13<3:50:07,  2.22s/it] 69%|████████████████████████████████████████████████████████▋                         | 13901/20117 [8:55:15<3:49:15,  2.21s/it] 69%|████████████████████████████████████████████████████████▋                         | 13902/20117 [8:55:17<3:49:04,  2.21s/it] 69%|████████████████████████████████████████████████████████▋                         | 13903/20117 [8:55:20<3:51:07,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13904/20117 [8:55:22<3:51:09,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13905/20117 [8:55:24<3:52:42,  2.25s/it] 69%|████████████████████████████████████████████████████████▋                         | 13906/20117 [8:55:26<3:52:23,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13907/20117 [8:55:29<3:54:07,  2.26s/it] 69%|████████████████████████████████████████████████████████▋                         | 13908/20117 [8:55:31<3:55:17,  2.27s/it] 69%|████████████████████████████████████████████████████████▋                         | 13909/20117 [8:55:33<3:54:47,  2.27s/it] 69%|████████████████████████████████████████████████████████▋                         | 13910/20117 [8:55:36<3:54:41,  2.27s/it]                                                                                                                                 {'loss': 0.1143, 'grad_norm': 0.3702273964881897, 'learning_rate': 4.3827104732326055e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.1, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▋                         | 13910/20117 [8:55:36<3:54:41,  2.27s/it] 69%|████████████████████████████████████████████████████████▋                         | 13911/20117 [8:55:38<3:54:34,  2.27s/it] 69%|████████████████████████████████████████████████████████▋                         | 13912/20117 [8:55:40<3:52:18,  2.25s/it] 69%|████████████████████████████████████████████████████████▋                         | 13913/20117 [8:55:42<3:50:36,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13914/20117 [8:55:45<3:51:08,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13915/20117 [8:55:47<3:50:35,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13916/20117 [8:55:49<3:50:21,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13917/20117 [8:55:51<3:50:01,  2.23s/it] 69%|████████████████████████████████████████████████████████▋                         | 13918/20117 [8:55:53<3:49:30,  2.22s/it] 69%|████████████████████████████████████████████████████████▋                         | 13919/20117 [8:55:56<3:51:20,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13920/20117 [8:55:58<3:51:06,  2.24s/it]                                                                                                                                 {'loss': 0.1272, 'grad_norm': 0.5153867602348328, 'learning_rate': 4.3697329006020614e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 296.93, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▋                         | 13920/20117 [8:55:58<3:51:06,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13921/20117 [8:56:00<3:51:17,  2.24s/it] 69%|████████████████████████████████████████████████████████▋                         | 13922/20117 [8:56:02<3:50:41,  2.23s/it] 69%|████████████████████████████████████████████████████████▊                         | 13923/20117 [8:56:05<3:49:58,  2.23s/it] 69%|████████████████████████████████████████████████████████▊                         | 13924/20117 [8:56:07<3:50:22,  2.23s/it] 69%|████████████████████████████████████████████████████████▊                         | 13925/20117 [8:56:09<3:53:40,  2.26s/it] 69%|████████████████████████████████████████████████████████▊                         | 13926/20117 [8:56:11<3:53:23,  2.26s/it] 69%|████████████████████████████████████████████████████████▊                         | 13927/20117 [8:56:14<3:52:09,  2.25s/it] 69%|████████████████████████████████████████████████████████▊                         | 13928/20117 [8:56:16<3:52:03,  2.25s/it] 69%|████████████████████████████████████████████████████████▊                         | 13929/20117 [8:56:18<3:50:19,  2.23s/it] 69%|████████████████████████████████████████████████████████▊                         | 13930/20117 [8:56:20<3:48:57,  2.22s/it]                                                                                                                                 {'loss': 0.1551, 'grad_norm': 0.45272836089134216, 'learning_rate': 4.356769196509373e-05, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.63, 'epoch': 1.38}
 69%|████████████████████████████████████████████████████████▊                         | 13930/20117 [8:56:20<3:48:57,  2.22s/it] 69%|████████████████████████████████████████████████████████▊                         | 13931/20117 [8:56:23<3:49:13,  2.22s/it] 69%|████████████████████████████████████████████████████████▊                         | 13932/20117 [8:56:25<3:49:04,  2.22s/it] 69%|████████████████████████████████████████████████████████▊                         | 13933/20117 [8:56:27<3:48:40,  2.22s/it] 69%|████████████████████████████████████████████████████████▊                         | 13934/20117 [8:56:29<3:47:40,  2.21s/it] 69%|████████████████████████████████████████████████████████▊                         | 13935/20117 [8:56:32<3:56:54,  2.30s/it] 69%|████████████████████████████████████████████████████████▊                         | 13936/20117 [8:56:34<3:55:25,  2.29s/it] 69%|████████████████████████████████████████████████████████▊                         | 13937/20117 [8:56:36<3:54:39,  2.28s/it] 69%|████████████████████████████████████████████████████████▊                         | 13938/20117 [8:56:38<3:53:58,  2.27s/it] 69%|████████████████████████████████████████████████████████▊                         | 13939/20117 [8:56:41<3:52:29,  2.26s/it] 69%|████████████████████████████████████████████████████████▊                         | 13940/20117 [8:56:43<3:52:37,  2.26s/it]                                                                                                                                 {'loss': 0.1434, 'grad_norm': 0.44970136880874634, 'learning_rate': 4.343819392886873e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.91, 'epoch': 1.39}
 69%|████████████████████████████████████████████████████████▊                         | 13940/20117 [8:56:43<3:52:37,  2.26s/it] 69%|████████████████████████████████████████████████████████▊                         | 13941/20117 [8:56:45<3:51:47,  2.25s/it] 69%|████████████████████████████████████████████████████████▊                         | 13942/20117 [8:56:51<5:46:05,  3.36s/it] 69%|████████████████████████████████████████████████████████▊                         | 13943/20117 [8:56:53<5:10:59,  3.02s/it] 69%|████████████████████████████████████████████████████████▊                         | 13944/20117 [8:56:56<4:48:59,  2.81s/it] 69%|████████████████████████████████████████████████████████▊                         | 13945/20117 [8:56:58<4:32:27,  2.65s/it] 69%|████████████████████████████████████████████████████████▊                         | 13946/20117 [8:57:00<4:26:52,  2.59s/it] 69%|████████████████████████████████████████████████████████▊                         | 13947/20117 [8:57:03<4:17:35,  2.51s/it] 69%|████████████████████████████████████████████████████████▊                         | 13948/20117 [8:57:05<4:09:40,  2.43s/it] 69%|████████████████████████████████████████████████████████▊                         | 13949/20117 [8:57:07<4:03:56,  2.37s/it] 69%|████████████████████████████████████████████████████████▊                         | 13950/20117 [8:57:09<3:58:11,  2.32s/it]                                                                                                                                 {'loss': 0.1548, 'grad_norm': 0.5090652108192444, 'learning_rate': 4.3308835216326696e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.01, 'epoch': 1.39}
 69%|████████████████████████████████████████████████████████▊                         | 13950/20117 [8:57:09<3:58:11,  2.32s/it] 69%|████████████████████████████████████████████████████████▊                         | 13951/20117 [8:57:12<3:53:59,  2.28s/it] 69%|████████████████████████████████████████████████████████▊                         | 13952/20117 [8:57:14<3:52:02,  2.26s/it] 69%|████████████████████████████████████████████████████████▊                         | 13953/20117 [8:57:16<3:50:48,  2.25s/it] 69%|████████████████████████████████████████████████████████▉                         | 13954/20117 [8:57:18<3:49:22,  2.23s/it] 69%|████████████████████████████████████████████████████████▉                         | 13955/20117 [8:57:20<3:48:16,  2.22s/it] 69%|████████████████████████████████████████████████████████▉                         | 13956/20117 [8:57:23<3:49:03,  2.23s/it] 69%|████████████████████████████████████████████████████████▉                         | 13957/20117 [8:57:25<3:47:43,  2.22s/it] 69%|████████████████████████████████████████████████████████▉                         | 13958/20117 [8:57:27<3:48:07,  2.22s/it] 69%|████████████████████████████████████████████████████████▉                         | 13959/20117 [8:57:29<3:48:39,  2.23s/it] 69%|████████████████████████████████████████████████████████▉                         | 13960/20117 [8:57:32<3:52:17,  2.26s/it]                                                                                                                                 {'loss': 0.121, 'grad_norm': 0.4184592366218567, 'learning_rate': 4.3179616146105465e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.76, 'epoch': 1.39}
 69%|████████████████████████████████████████████████████████▉                         | 13960/20117 [8:57:32<3:52:17,  2.26s/it] 69%|████████████████████████████████████████████████████████▉                         | 13961/20117 [8:57:34<3:49:53,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13962/20117 [8:57:36<3:50:37,  2.25s/it] 69%|████████████████████████████████████████████████████████▉                         | 13963/20117 [8:57:38<3:50:27,  2.25s/it] 69%|████████████████████████████████████████████████████████▉                         | 13964/20117 [8:57:41<3:50:15,  2.25s/it] 69%|████████████████████████████████████████████████████████▉                         | 13965/20117 [8:57:43<3:49:42,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13966/20117 [8:57:45<3:49:16,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13967/20117 [8:57:47<3:48:55,  2.23s/it] 69%|████████████████████████████████████████████████████████▉                         | 13968/20117 [8:57:49<3:47:27,  2.22s/it] 69%|████████████████████████████████████████████████████████▉                         | 13969/20117 [8:57:52<3:47:45,  2.22s/it] 69%|████████████████████████████████████████████████████████▉                         | 13970/20117 [8:57:54<3:49:07,  2.24s/it]                                                                                                                                 {'loss': 0.1996, 'grad_norm': 0.6222097277641296, 'learning_rate': 4.305053703649897e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.55, 'epoch': 1.39}
 69%|████████████████████████████████████████████████████████▉                         | 13970/20117 [8:57:54<3:49:07,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13971/20117 [8:57:56<3:49:42,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13972/20117 [8:57:58<3:49:52,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13973/20117 [8:58:01<3:50:31,  2.25s/it] 69%|████████████████████████████████████████████████████████▉                         | 13974/20117 [8:58:03<3:48:25,  2.23s/it] 69%|████████████████████████████████████████████████████████▉                         | 13975/20117 [8:58:05<3:49:35,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13976/20117 [8:58:07<3:49:39,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13977/20117 [8:58:10<3:49:34,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13978/20117 [8:58:12<3:49:29,  2.24s/it] 69%|████████████████████████████████████████████████████████▉                         | 13979/20117 [8:58:14<3:50:46,  2.26s/it] 69%|████████████████████████████████████████████████████████▉                         | 13980/20117 [8:58:16<3:51:31,  2.26s/it]                                                                                                                                 {'loss': 0.1481, 'grad_norm': 0.34182408452033997, 'learning_rate': 4.292159820545627e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.88, 'epoch': 1.39}
 69%|████████████████████████████████████████████████████████▉                         | 13980/20117 [8:58:16<3:51:31,  2.26s/it] 69%|████████████████████████████████████████████████████████▉                         | 13981/20117 [8:58:19<3:50:07,  2.25s/it] 70%|████████████████████████████████████████████████████████▉                         | 13982/20117 [8:58:21<3:51:46,  2.27s/it] 70%|████████████████████████████████████████████████████████▉                         | 13983/20117 [8:58:23<3:49:50,  2.25s/it] 70%|█████████████████████████████████████████████████████████                         | 13984/20117 [8:58:25<3:47:59,  2.23s/it] 70%|█████████████████████████████████████████████████████████                         | 13985/20117 [8:58:28<3:46:43,  2.22s/it] 70%|█████████████████████████████████████████████████████████                         | 13986/20117 [8:58:30<3:56:10,  2.31s/it] 70%|█████████████████████████████████████████████████████████                         | 13987/20117 [8:58:32<3:51:58,  2.27s/it] 70%|█████████████████████████████████████████████████████████                         | 13988/20117 [8:58:34<3:49:55,  2.25s/it] 70%|█████████████████████████████████████████████████████████                         | 13989/20117 [8:58:37<3:50:00,  2.25s/it] 70%|█████████████████████████████████████████████████████████                         | 13990/20117 [8:58:39<3:48:14,  2.24s/it]                                                                                                                                 {'loss': 0.1954, 'grad_norm': 0.551923930644989, 'learning_rate': 4.279279997058101e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.45, 'epoch': 1.39}
 70%|█████████████████████████████████████████████████████████                         | 13990/20117 [8:58:39<3:48:14,  2.24s/it] 70%|█████████████████████████████████████████████████████████                         | 13991/20117 [8:58:41<3:49:31,  2.25s/it] 70%|█████████████████████████████████████████████████████████                         | 13992/20117 [8:58:43<3:50:08,  2.25s/it] 70%|█████████████████████████████████████████████████████████                         | 13993/20117 [8:58:46<3:49:11,  2.25s/it] 70%|█████████████████████████████████████████████████████████                         | 13994/20117 [8:58:48<3:48:23,  2.24s/it] 70%|█████████████████████████████████████████████████████████                         | 13995/20117 [8:58:50<3:46:27,  2.22s/it] 70%|█████████████████████████████████████████████████████████                         | 13996/20117 [8:58:52<3:46:03,  2.22s/it] 70%|█████████████████████████████████████████████████████████                         | 13997/20117 [8:58:55<3:45:36,  2.21s/it] 70%|█████████████████████████████████████████████████████████                         | 13998/20117 [8:58:57<3:45:30,  2.21s/it] 70%|█████████████████████████████████████████████████████████                         | 13999/20117 [8:58:59<3:47:23,  2.23s/it] 70%|█████████████████████████████████████████████████████████                         | 14000/20117 [8:59:01<3:51:39,  2.27s/it]                                                                                                                                 {'loss': 0.1153, 'grad_norm': 0.43246060609817505, 'learning_rate': 4.266414264913041e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.74, 'epoch': 1.39}
 70%|█████████████████████████████████████████████████████████                         | 14000/20117 [8:59:01<3:51:39,  2.27s/it] 70%|█████████████████████████████████████████████████████████                         | 14001/20117 [8:59:04<3:51:02,  2.27s/it] 70%|█████████████████████████████████████████████████████████                         | 14002/20117 [8:59:06<3:50:07,  2.26s/it] 70%|█████████████████████████████████████████████████████████                         | 14003/20117 [8:59:08<3:49:20,  2.25s/it] 70%|█████████████████████████████████████████████████████████                         | 14004/20117 [8:59:10<3:48:34,  2.24s/it] 70%|█████████████████████████████████████████████████████████                         | 14005/20117 [8:59:13<3:48:12,  2.24s/it] 70%|█████████████████████████████████████████████████████████                         | 14006/20117 [8:59:15<3:46:40,  2.23s/it] 70%|█████████████████████████████████████████████████████████                         | 14007/20117 [8:59:17<3:46:14,  2.22s/it] 70%|█████████████████████████████████████████████████████████                         | 14008/20117 [8:59:19<3:46:47,  2.23s/it] 70%|█████████████████████████████████████████████████████████                         | 14009/20117 [8:59:21<3:47:34,  2.24s/it] 70%|█████████████████████████████████████████████████████████                         | 14010/20117 [8:59:24<3:46:52,  2.23s/it]                                                                                                                                 {'loss': 0.1414, 'grad_norm': 0.3509690761566162, 'learning_rate': 4.2535626558014705e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.19, 'epoch': 1.39}
 70%|█████████████████████████████████████████████████████████                         | 14010/20117 [8:59:24<3:46:52,  2.23s/it] 70%|█████████████████████████████████████████████████████████                         | 14011/20117 [8:59:26<3:45:46,  2.22s/it] 70%|█████████████████████████████████████████████████████████                         | 14012/20117 [8:59:28<3:44:28,  2.21s/it] 70%|█████████████████████████████████████████████████████████                         | 14013/20117 [8:59:30<3:44:01,  2.20s/it] 70%|█████████████████████████████████████████████████████████                         | 14014/20117 [8:59:32<3:45:24,  2.22s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14015/20117 [8:59:35<3:45:00,  2.21s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14016/20117 [8:59:37<3:45:20,  2.22s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14017/20117 [8:59:39<3:45:17,  2.22s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14018/20117 [8:59:41<3:45:08,  2.21s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14019/20117 [8:59:44<3:48:27,  2.25s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14020/20117 [8:59:46<3:47:01,  2.23s/it]                                                                                                                                 {'loss': 0.1538, 'grad_norm': 0.5196961164474487, 'learning_rate': 4.240725201379614e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.78, 'epoch': 1.39}
 70%|█████████████████████████████████████████████████████████▏                        | 14020/20117 [8:59:46<3:47:01,  2.23s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14021/20117 [8:59:48<3:46:12,  2.23s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14022/20117 [8:59:50<3:46:02,  2.23s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14023/20117 [8:59:53<3:47:50,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14024/20117 [8:59:55<3:49:04,  2.26s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14025/20117 [8:59:57<3:47:38,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14026/20117 [8:59:59<3:46:40,  2.23s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14027/20117 [9:00:02<3:47:03,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14028/20117 [9:00:04<3:47:15,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14029/20117 [9:00:06<3:47:34,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14030/20117 [9:00:08<3:48:12,  2.25s/it]                                                                                                                                 {'loss': 0.1549, 'grad_norm': 0.46282872557640076, 'learning_rate': 4.22790193326884e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.57, 'epoch': 1.39}
 70%|█████████████████████████████████████████████████████████▏                        | 14030/20117 [9:00:08<3:48:12,  2.25s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14031/20117 [9:00:10<3:47:04,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14032/20117 [9:00:13<3:46:32,  2.23s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14033/20117 [9:00:15<3:48:23,  2.25s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14034/20117 [9:00:17<3:47:13,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14035/20117 [9:00:19<3:47:44,  2.25s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14036/20117 [9:00:22<3:48:02,  2.25s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14037/20117 [9:00:24<3:49:02,  2.26s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14038/20117 [9:00:27<3:56:31,  2.33s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14039/20117 [9:00:29<3:52:52,  2.30s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14040/20117 [9:00:31<3:50:41,  2.28s/it]                                                                                                                                 {'loss': 0.1359, 'grad_norm': 0.3771131932735443, 'learning_rate': 4.21509288305556e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 377.38, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▏                        | 14040/20117 [9:00:31<3:50:41,  2.28s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14041/20117 [9:00:33<3:47:55,  2.25s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14042/20117 [9:00:35<3:49:03,  2.26s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14043/20117 [9:00:38<3:47:48,  2.25s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14044/20117 [9:00:40<3:46:43,  2.24s/it] 70%|█████████████████████████████████████████████████████████▏                        | 14045/20117 [9:00:42<3:45:02,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14046/20117 [9:00:44<3:47:33,  2.25s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14047/20117 [9:00:47<3:44:53,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14048/20117 [9:00:49<3:43:14,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14049/20117 [9:00:51<3:42:21,  2.20s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14050/20117 [9:00:53<3:43:08,  2.21s/it]                                                                                                                                 {'loss': 0.1673, 'grad_norm': 0.639408528804779, 'learning_rate': 4.2022980822911786e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.3, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▎                        | 14050/20117 [9:00:53<3:43:08,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14051/20117 [9:00:55<3:42:52,  2.20s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14052/20117 [9:00:58<3:43:47,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14053/20117 [9:01:00<3:42:50,  2.20s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14054/20117 [9:01:02<3:43:06,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14055/20117 [9:01:04<3:44:02,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14056/20117 [9:01:06<3:44:52,  2.23s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14057/20117 [9:01:09<3:44:01,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14058/20117 [9:01:11<3:44:12,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14059/20117 [9:01:13<3:45:06,  2.23s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14060/20117 [9:01:15<3:44:05,  2.22s/it]                                                                                                                                 {'loss': 0.1855, 'grad_norm': 0.44624921679496765, 'learning_rate': 4.189517562491996e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.56, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▎                        | 14060/20117 [9:01:15<3:44:05,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14061/20117 [9:01:18<3:43:46,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14062/20117 [9:01:20<3:42:43,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14063/20117 [9:01:22<3:44:02,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14064/20117 [9:01:24<3:44:31,  2.23s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14065/20117 [9:01:26<3:43:47,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14066/20117 [9:01:29<3:42:56,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14067/20117 [9:01:31<3:42:29,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14068/20117 [9:01:33<3:44:06,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14069/20117 [9:01:35<3:44:04,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14070/20117 [9:01:38<3:44:56,  2.23s/it]                                                                                                                                 {'loss': 0.1335, 'grad_norm': 0.5301280617713928, 'learning_rate': 4.176751355139126e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 256.61, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▎                        | 14070/20117 [9:01:38<3:44:56,  2.23s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14071/20117 [9:01:40<3:43:46,  2.22s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14072/20117 [9:01:42<3:44:55,  2.23s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14073/20117 [9:01:44<3:44:10,  2.23s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14074/20117 [9:01:46<3:42:47,  2.21s/it] 70%|█████████████████████████████████████████████████████████▎                        | 14075/20117 [9:01:49<3:42:43,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14076/20117 [9:01:51<3:42:58,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14077/20117 [9:01:53<3:41:55,  2.20s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14078/20117 [9:01:55<3:41:35,  2.20s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14079/20117 [9:01:57<3:41:11,  2.20s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14080/20117 [9:02:00<3:42:03,  2.21s/it]                                                                                                                                 {'loss': 0.1674, 'grad_norm': 0.6008658409118652, 'learning_rate': 4.163999491678444e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.31, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▍                        | 14080/20117 [9:02:00<3:42:03,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14081/20117 [9:02:02<3:42:32,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14082/20117 [9:02:04<3:43:29,  2.22s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14083/20117 [9:02:06<3:45:06,  2.24s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14084/20117 [9:02:09<3:45:10,  2.24s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14085/20117 [9:02:11<3:45:29,  2.24s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14086/20117 [9:02:13<3:43:42,  2.23s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14087/20117 [9:02:15<3:42:22,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14088/20117 [9:02:17<3:41:48,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14089/20117 [9:02:20<3:41:44,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14090/20117 [9:02:22<3:41:34,  2.21s/it]                                                                                                                                 {'loss': 0.1309, 'grad_norm': 0.569706916809082, 'learning_rate': 4.1512620035204784e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 311.74, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▍                        | 14090/20117 [9:02:22<3:41:34,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14091/20117 [9:02:24<3:49:50,  2.29s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14092/20117 [9:02:26<3:46:35,  2.26s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14093/20117 [9:02:29<3:46:21,  2.25s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14094/20117 [9:02:31<3:44:17,  2.23s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14095/20117 [9:02:33<3:44:27,  2.24s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14096/20117 [9:02:35<3:44:31,  2.24s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14097/20117 [9:02:38<3:43:05,  2.22s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14098/20117 [9:02:40<3:42:13,  2.22s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14099/20117 [9:02:42<3:43:48,  2.23s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14100/20117 [9:02:44<3:45:44,  2.25s/it]                                                                                                                                 {'loss': 0.2239, 'grad_norm': 0.5985649228096008, 'learning_rate': 4.138538922040356e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 431.67, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▍                        | 14100/20117 [9:02:44<3:45:44,  2.25s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14101/20117 [9:02:47<3:43:39,  2.23s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14102/20117 [9:02:49<3:43:49,  2.23s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14103/20117 [9:02:51<3:43:38,  2.23s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14104/20117 [9:02:53<3:42:25,  2.22s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14105/20117 [9:02:55<3:41:34,  2.21s/it] 70%|█████████████████████████████████████████████████████████▍                        | 14106/20117 [9:02:58<3:42:57,  2.23s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14107/20117 [9:03:00<3:41:44,  2.21s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14108/20117 [9:03:02<3:41:48,  2.21s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14109/20117 [9:03:04<3:40:41,  2.20s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14110/20117 [9:03:07<3:43:49,  2.24s/it]                                                                                                                                 {'loss': 0.1663, 'grad_norm': 0.18843001127243042, 'learning_rate': 4.125830278577717e-05, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 388.4, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▌                        | 14110/20117 [9:03:07<3:43:49,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14111/20117 [9:03:09<3:42:13,  2.22s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14112/20117 [9:03:11<3:41:58,  2.22s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14113/20117 [9:03:13<3:44:09,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14114/20117 [9:03:15<3:43:03,  2.23s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14115/20117 [9:03:18<3:44:33,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14116/20117 [9:03:20<3:43:34,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14117/20117 [9:03:22<3:43:06,  2.23s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14118/20117 [9:03:24<3:43:31,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14119/20117 [9:03:27<3:44:42,  2.25s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14120/20117 [9:03:29<3:44:52,  2.25s/it]                                                                                                                                 {'loss': 0.1385, 'grad_norm': 0.4820663332939148, 'learning_rate': 4.113136104436639e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.91, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▌                        | 14120/20117 [9:03:29<3:44:52,  2.25s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14121/20117 [9:03:31<3:46:44,  2.27s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14122/20117 [9:03:33<3:45:57,  2.26s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14123/20117 [9:03:36<3:44:28,  2.25s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14124/20117 [9:03:38<3:42:16,  2.23s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14125/20117 [9:03:40<3:42:55,  2.23s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14126/20117 [9:03:42<3:41:09,  2.21s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14127/20117 [9:03:45<3:43:05,  2.23s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14128/20117 [9:03:47<3:44:01,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14129/20117 [9:03:49<3:42:09,  2.23s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14130/20117 [9:03:51<3:41:15,  2.22s/it]                                                                                                                                 {'loss': 0.188, 'grad_norm': 0.47296223044395447, 'learning_rate': 4.10045643088555e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.31, 'epoch': 1.4}
 70%|█████████████████████████████████████████████████████████▌                        | 14130/20117 [9:03:51<3:41:15,  2.22s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14131/20117 [9:03:53<3:40:14,  2.21s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14132/20117 [9:03:56<3:40:50,  2.21s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14133/20117 [9:03:58<3:43:09,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14134/20117 [9:04:00<3:43:19,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14135/20117 [9:04:02<3:43:44,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14136/20117 [9:04:05<3:42:57,  2.24s/it] 70%|█████████████████████████████████████████████████████████▌                        | 14137/20117 [9:04:07<3:42:44,  2.23s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14138/20117 [9:04:09<3:42:13,  2.23s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14139/20117 [9:04:11<3:42:37,  2.23s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14140/20117 [9:04:14<3:44:47,  2.26s/it]                                                                                                                                 {'loss': 0.1479, 'grad_norm': 0.5604919791221619, 'learning_rate': 4.0877912891571725e-05, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.18, 'epoch': 1.41}
 70%|█████████████████████████████████████████████████████████▋                        | 14140/20117 [9:04:14<3:44:47,  2.26s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14141/20117 [9:04:16<3:43:51,  2.25s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14142/20117 [9:04:18<3:43:02,  2.24s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14143/20117 [9:04:21<3:50:38,  2.32s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14144/20117 [9:04:23<3:47:50,  2.29s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14145/20117 [9:04:25<3:45:20,  2.26s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14146/20117 [9:04:27<3:42:57,  2.24s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14147/20117 [9:04:29<3:44:28,  2.26s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14148/20117 [9:04:32<3:43:56,  2.25s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14149/20117 [9:04:34<3:43:32,  2.25s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14150/20117 [9:04:36<3:43:35,  2.25s/it]                                                                                                                                 {'loss': 0.1566, 'grad_norm': 0.529105544090271, 'learning_rate': 4.075140710448419e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.28, 'epoch': 1.41}
 70%|█████████████████████████████████████████████████████████▋                        | 14150/20117 [9:04:36<3:43:35,  2.25s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14151/20117 [9:04:38<3:42:30,  2.24s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14152/20117 [9:04:41<3:42:10,  2.23s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14153/20117 [9:04:43<3:40:33,  2.22s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14154/20117 [9:04:45<3:41:29,  2.23s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14155/20117 [9:04:47<3:40:49,  2.22s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14156/20117 [9:04:50<3:40:25,  2.22s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14157/20117 [9:04:52<3:38:51,  2.20s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14158/20117 [9:04:54<3:39:58,  2.21s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14159/20117 [9:04:56<3:39:57,  2.22s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14160/20117 [9:04:58<3:40:33,  2.22s/it]                                                                                                                                 {'loss': 0.1516, 'grad_norm': 0.36099550127983093, 'learning_rate': 4.062504725920347e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 393.89, 'epoch': 1.41}
 70%|█████████████████████████████████████████████████████████▋                        | 14160/20117 [9:04:58<3:40:33,  2.22s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14161/20117 [9:05:01<3:41:10,  2.23s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14162/20117 [9:05:03<3:39:33,  2.21s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14163/20117 [9:05:05<3:39:35,  2.21s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14164/20117 [9:05:07<3:39:12,  2.21s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14165/20117 [9:05:10<3:41:37,  2.23s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14166/20117 [9:05:12<3:40:30,  2.22s/it] 70%|█████████████████████████████████████████████████████████▋                        | 14167/20117 [9:05:14<3:39:36,  2.21s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14168/20117 [9:05:16<3:40:21,  2.22s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14169/20117 [9:05:18<3:42:26,  2.24s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14170/20117 [9:05:21<3:41:11,  2.23s/it]                                                                                                                                 {'loss': 0.1371, 'grad_norm': 0.6695134043693542, 'learning_rate': 4.0498833666980505e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 394.12, 'epoch': 1.41}
 70%|█████████████████████████████████████████████████████████▊                        | 14170/20117 [9:05:21<3:41:11,  2.23s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14171/20117 [9:05:23<3:39:39,  2.22s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14172/20117 [9:05:25<3:39:27,  2.21s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14173/20117 [9:05:27<3:38:32,  2.21s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14174/20117 [9:05:29<3:39:30,  2.22s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14175/20117 [9:05:32<3:38:32,  2.21s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14176/20117 [9:05:34<3:41:12,  2.23s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14177/20117 [9:05:36<3:39:32,  2.22s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14178/20117 [9:05:38<3:39:20,  2.22s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14179/20117 [9:05:41<3:38:51,  2.21s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14180/20117 [9:05:43<3:38:20,  2.21s/it]                                                                                                                                 {'loss': 0.1888, 'grad_norm': 0.4011685252189636, 'learning_rate': 4.037276663870607e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 401.14, 'epoch': 1.41}
 70%|█████████████████████████████████████████████████████████▊                        | 14180/20117 [9:05:43<3:38:20,  2.21s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14181/20117 [9:05:45<3:39:54,  2.22s/it] 70%|█████████████████████████████████████████████████████████▊                        | 14182/20117 [9:05:47<3:39:43,  2.22s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14183/20117 [9:05:49<3:40:04,  2.23s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14184/20117 [9:05:52<3:39:38,  2.22s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14185/20117 [9:05:54<3:37:54,  2.20s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14186/20117 [9:05:56<3:37:00,  2.20s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14187/20117 [9:05:58<3:38:59,  2.22s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14188/20117 [9:06:00<3:38:35,  2.21s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14189/20117 [9:06:03<3:37:26,  2.20s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14190/20117 [9:06:05<3:38:33,  2.21s/it]                                                                                                                                 {'loss': 0.1467, 'grad_norm': 0.5134690403938293, 'learning_rate': 4.024684648490995e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 320.81, 'epoch': 1.41}
 71%|█████████████████████████████████████████████████████████▊                        | 14190/20117 [9:06:05<3:38:33,  2.21s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14191/20117 [9:06:07<3:38:02,  2.21s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14192/20117 [9:06:09<3:37:39,  2.20s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14193/20117 [9:06:11<3:38:30,  2.21s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14194/20117 [9:06:14<3:38:54,  2.22s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14195/20117 [9:06:16<3:47:37,  2.31s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14196/20117 [9:06:18<3:45:07,  2.28s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14197/20117 [9:06:21<3:43:54,  2.27s/it] 71%|█████████████████████████████████████████████████████████▊                        | 14198/20117 [9:06:23<3:41:00,  2.24s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14199/20117 [9:06:25<3:40:06,  2.23s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14200/20117 [9:06:27<3:39:48,  2.23s/it]                                                                                                                                 {'loss': 0.1507, 'grad_norm': 0.4153744876384735, 'learning_rate': 4.012107351576001e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.65, 'epoch': 1.41}
 71%|█████████████████████████████████████████████████████████▉                        | 14200/20117 [9:06:27<3:39:48,  2.23s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14201/20117 [9:06:30<3:39:20,  2.22s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14202/20117 [9:06:32<3:37:57,  2.21s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14203/20117 [9:06:34<3:37:54,  2.21s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14204/20117 [9:06:36<3:38:33,  2.22s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14205/20117 [9:06:38<3:38:22,  2.22s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14206/20117 [9:06:41<3:37:09,  2.20s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14207/20117 [9:06:43<3:39:05,  2.22s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14208/20117 [9:06:45<3:41:08,  2.25s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14209/20117 [9:06:47<3:42:45,  2.26s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14210/20117 [9:06:50<3:42:57,  2.26s/it]                                                                                                                                 {'loss': 0.1785, 'grad_norm': 0.5884429216384888, 'learning_rate': 3.999544804106174e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 389.13, 'epoch': 1.41}
 71%|█████████████████████████████████████████████████████████▉                        | 14210/20117 [9:06:50<3:42:57,  2.26s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14211/20117 [9:06:52<3:42:10,  2.26s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14212/20117 [9:06:54<3:42:19,  2.26s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14213/20117 [9:06:56<3:41:31,  2.25s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14214/20117 [9:06:59<3:39:55,  2.24s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14215/20117 [9:07:01<3:40:46,  2.24s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14216/20117 [9:07:03<3:39:43,  2.23s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14217/20117 [9:07:05<3:41:19,  2.25s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14218/20117 [9:07:08<3:39:51,  2.24s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14219/20117 [9:07:10<3:40:20,  2.24s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14220/20117 [9:07:12<3:40:51,  2.25s/it]                                                                                                                                 {'loss': 0.1133, 'grad_norm': 0.3641144931316376, 'learning_rate': 3.986997037025716e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.06, 'epoch': 1.41}
 71%|█████████████████████████████████████████████████████████▉                        | 14220/20117 [9:07:12<3:40:51,  2.25s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14221/20117 [9:07:14<3:38:55,  2.23s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14222/20117 [9:07:17<3:40:11,  2.24s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14223/20117 [9:07:19<3:39:02,  2.23s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14224/20117 [9:07:21<3:37:42,  2.22s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14225/20117 [9:07:23<3:36:59,  2.21s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14226/20117 [9:07:25<3:37:36,  2.22s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14227/20117 [9:07:28<3:37:33,  2.22s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14228/20117 [9:07:30<3:38:43,  2.23s/it] 71%|█████████████████████████████████████████████████████████▉                        | 14229/20117 [9:07:32<3:38:40,  2.23s/it] 71%|██████████████████████████████████████████████████████████                        | 14230/20117 [9:07:34<3:39:12,  2.23s/it]                                                                                                                                 {'loss': 0.18, 'grad_norm': 0.5971251726150513, 'learning_rate': 3.974464081242437e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 388.25, 'epoch': 1.41}
 71%|██████████████████████████████████████████████████████████                        | 14230/20117 [9:07:34<3:39:12,  2.23s/it] 71%|██████████████████████████████████████████████████████████                        | 14231/20117 [9:07:36<3:37:55,  2.22s/it] 71%|██████████████████████████████████████████████████████████                        | 14232/20117 [9:07:39<3:38:20,  2.23s/it] 71%|██████████████████████████████████████████████████████████                        | 14233/20117 [9:07:41<3:37:08,  2.21s/it] 71%|██████████████████████████████████████████████████████████                        | 14234/20117 [9:07:43<3:35:58,  2.20s/it] 71%|██████████████████████████████████████████████████████████                        | 14235/20117 [9:07:45<3:36:31,  2.21s/it] 71%|██████████████████████████████████████████████████████████                        | 14236/20117 [9:07:48<3:37:30,  2.22s/it] 71%|██████████████████████████████████████████████████████████                        | 14237/20117 [9:07:50<3:38:14,  2.23s/it] 71%|██████████████████████████████████████████████████████████                        | 14238/20117 [9:07:52<3:37:00,  2.21s/it] 71%|██████████████████████████████████████████████████████████                        | 14239/20117 [9:07:54<3:36:18,  2.21s/it] 71%|██████████████████████████████████████████████████████████                        | 14240/20117 [9:07:56<3:36:26,  2.21s/it]                                                                                                                                 {'loss': 0.1546, 'grad_norm': 0.4704718291759491, 'learning_rate': 3.961945967627648e-05, 'memory/max_active (GiB)': 18.85, 'memory/max_allocated (GiB)': 18.85, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.95, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████                        | 14240/20117 [9:07:56<3:36:26,  2.21s/it] 71%|██████████████████████████████████████████████████████████                        | 14241/20117 [9:07:59<3:36:07,  2.21s/it] 71%|██████████████████████████████████████████████████████████                        | 14242/20117 [9:08:01<3:35:10,  2.20s/it] 71%|██████████████████████████████████████████████████████████                        | 14243/20117 [9:08:03<3:38:03,  2.23s/it] 71%|██████████████████████████████████████████████████████████                        | 14244/20117 [9:08:05<3:37:41,  2.22s/it] 71%|██████████████████████████████████████████████████████████                        | 14245/20117 [9:08:08<3:37:41,  2.22s/it] 71%|██████████████████████████████████████████████████████████                        | 14246/20117 [9:08:10<3:36:10,  2.21s/it] 71%|██████████████████████████████████████████████████████████                        | 14247/20117 [9:08:12<3:39:55,  2.25s/it] 71%|██████████████████████████████████████████████████████████                        | 14248/20117 [9:08:15<3:49:13,  2.34s/it] 71%|██████████████████████████████████████████████████████████                        | 14249/20117 [9:08:17<3:45:05,  2.30s/it] 71%|██████████████████████████████████████████████████████████                        | 14250/20117 [9:08:19<3:42:32,  2.28s/it]                                                                                                                                 {'loss': 0.1928, 'grad_norm': 0.44908052682876587, 'learning_rate': 3.9494427270161124e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.69, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████                        | 14250/20117 [9:08:19<3:42:32,  2.28s/it] 71%|██████████████████████████████████████████████████████████                        | 14251/20117 [9:08:21<3:40:00,  2.25s/it] 71%|██████████████████████████████████████████████████████████                        | 14252/20117 [9:08:23<3:38:43,  2.24s/it] 71%|██████████████████████████████████████████████████████████                        | 14253/20117 [9:08:26<3:37:52,  2.23s/it] 71%|██████████████████████████████████████████████████████████                        | 14254/20117 [9:08:28<3:37:19,  2.22s/it] 71%|██████████████████████████████████████████████████████████                        | 14255/20117 [9:08:30<3:41:30,  2.27s/it] 71%|██████████████████████████████████████████████████████████                        | 14256/20117 [9:08:33<3:46:20,  2.32s/it] 71%|██████████████████████████████████████████████████████████                        | 14257/20117 [9:08:35<3:43:41,  2.29s/it] 71%|██████████████████████████████████████████████████████████                        | 14258/20117 [9:08:37<3:42:24,  2.28s/it] 71%|██████████████████████████████████████████████████████████                        | 14259/20117 [9:08:39<3:40:35,  2.26s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14260/20117 [9:08:42<3:38:54,  2.24s/it]                                                                                                                                 {'loss': 0.1701, 'grad_norm': 0.4618172347545624, 'learning_rate': 3.936954390205955e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.08, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▏                       | 14260/20117 [9:08:42<3:38:54,  2.24s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14261/20117 [9:08:44<3:37:47,  2.23s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14262/20117 [9:08:46<3:37:00,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14263/20117 [9:08:48<3:35:46,  2.21s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14264/20117 [9:08:50<3:40:33,  2.26s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14265/20117 [9:08:53<3:39:38,  2.25s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14266/20117 [9:08:55<3:38:31,  2.24s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14267/20117 [9:08:57<3:37:09,  2.23s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14268/20117 [9:08:59<3:36:52,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14269/20117 [9:09:02<3:36:40,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14270/20117 [9:09:04<3:35:56,  2.22s/it]                                                                                                                                 {'loss': 0.1195, 'grad_norm': 0.4540565609931946, 'learning_rate': 3.924480987958592e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.37, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▏                       | 14270/20117 [9:09:04<3:35:56,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14271/20117 [9:09:06<3:38:42,  2.24s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14272/20117 [9:09:08<3:38:09,  2.24s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14273/20117 [9:09:11<3:37:54,  2.24s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14274/20117 [9:09:13<3:36:19,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14275/20117 [9:09:15<3:35:36,  2.21s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14276/20117 [9:09:17<3:35:12,  2.21s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14277/20117 [9:09:19<3:35:48,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14278/20117 [9:09:22<3:34:56,  2.21s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14279/20117 [9:09:24<3:34:23,  2.20s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14280/20117 [9:09:26<3:34:44,  2.21s/it]                                                                                                                                 {'loss': 0.1577, 'grad_norm': 0.45630887150764465, 'learning_rate': 3.912022550998642e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.77, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▏                       | 14280/20117 [9:09:26<3:34:44,  2.21s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14281/20117 [9:09:28<3:34:34,  2.21s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14282/20117 [9:09:30<3:35:39,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14283/20117 [9:09:33<3:35:43,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14284/20117 [9:09:35<3:35:11,  2.21s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14285/20117 [9:09:37<3:36:58,  2.23s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14286/20117 [9:09:39<3:36:04,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14287/20117 [9:09:42<3:35:37,  2.22s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14288/20117 [9:09:44<3:37:56,  2.24s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14289/20117 [9:09:46<3:39:34,  2.26s/it] 71%|██████████████████████████████████████████████████████████▏                       | 14290/20117 [9:09:48<3:38:28,  2.25s/it]                                                                                                                                 {'loss': 0.2043, 'grad_norm': 0.5009377002716064, 'learning_rate': 3.8995791100138755e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 436.57, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▏                       | 14290/20117 [9:09:48<3:38:28,  2.25s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14291/20117 [9:09:51<3:36:58,  2.23s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14292/20117 [9:09:53<3:36:13,  2.23s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14293/20117 [9:09:55<3:35:51,  2.22s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14294/20117 [9:09:57<3:38:58,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14295/20117 [9:10:00<3:39:33,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14296/20117 [9:10:02<3:39:23,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14297/20117 [9:10:04<3:38:52,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14298/20117 [9:10:06<3:41:43,  2.29s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14299/20117 [9:10:09<3:41:05,  2.28s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14300/20117 [9:10:11<3:39:51,  2.27s/it]                                                                                                                                 {'loss': 0.1279, 'grad_norm': 0.5434289574623108, 'learning_rate': 3.887150695655112e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.81, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▎                       | 14300/20117 [9:10:11<3:39:51,  2.27s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14301/20117 [9:10:13<3:40:48,  2.28s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14302/20117 [9:10:16<3:50:30,  2.38s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14303/20117 [9:10:18<3:48:06,  2.35s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14304/20117 [9:10:21<3:48:11,  2.36s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14305/20117 [9:10:23<3:49:28,  2.37s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14306/20117 [9:10:25<3:47:38,  2.35s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14307/20117 [9:10:27<3:44:39,  2.32s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14308/20117 [9:10:30<3:41:45,  2.29s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14309/20117 [9:10:32<3:39:30,  2.27s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14310/20117 [9:10:34<3:38:16,  2.26s/it]                                                                                                                                 {'loss': 0.1604, 'grad_norm': 0.4998416602611542, 'learning_rate': 3.874737338536164e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.31, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▎                       | 14310/20117 [9:10:34<3:38:16,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14311/20117 [9:10:36<3:37:37,  2.25s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14312/20117 [9:10:39<3:38:21,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14313/20117 [9:10:41<3:38:45,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14314/20117 [9:10:43<3:37:21,  2.25s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14315/20117 [9:10:45<3:37:32,  2.25s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14316/20117 [9:10:48<3:37:33,  2.25s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14317/20117 [9:10:50<3:39:04,  2.27s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14318/20117 [9:10:52<3:38:33,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14319/20117 [9:10:54<3:37:21,  2.25s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14320/20117 [9:10:57<3:37:52,  2.26s/it]                                                                                                                                 {'loss': 0.1572, 'grad_norm': 0.7324095964431763, 'learning_rate': 3.862339069233759e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.52, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▎                       | 14320/20117 [9:10:57<3:37:52,  2.26s/it] 71%|██████████████████████████████████████████████████████████▎                       | 14321/20117 [9:10:59<3:38:18,  2.26s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14322/20117 [9:11:01<3:38:53,  2.27s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14323/20117 [9:11:03<3:36:37,  2.24s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14324/20117 [9:11:06<3:36:44,  2.24s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14325/20117 [9:11:08<3:37:33,  2.25s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14326/20117 [9:11:10<3:45:19,  2.33s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14327/20117 [9:11:13<3:48:07,  2.36s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14328/20117 [9:11:15<3:47:40,  2.36s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14329/20117 [9:11:18<3:47:09,  2.35s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14330/20117 [9:11:20<3:48:34,  2.37s/it]                                                                                                                                 {'loss': 0.1947, 'grad_norm': 0.5743668675422668, 'learning_rate': 3.8499559182874475e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 399.2, 'epoch': 1.42}
 71%|██████████████████████████████████████████████████████████▍                       | 14330/20117 [9:11:20<3:48:34,  2.37s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14331/20117 [9:11:22<3:46:14,  2.35s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14332/20117 [9:11:25<3:45:40,  2.34s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14333/20117 [9:11:27<3:44:39,  2.33s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14334/20117 [9:11:29<3:44:17,  2.33s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14335/20117 [9:11:32<3:42:20,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14336/20117 [9:11:34<3:40:12,  2.29s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14337/20117 [9:11:36<3:39:04,  2.27s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14338/20117 [9:11:38<3:40:35,  2.29s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14339/20117 [9:11:41<3:41:53,  2.30s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14340/20117 [9:11:43<3:42:39,  2.31s/it]                                                                                                                                 {'loss': 0.0943, 'grad_norm': 0.3965088427066803, 'learning_rate': 3.837587916199554e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 308.93, 'epoch': 1.43}
 71%|██████████████████████████████████████████████████████████▍                       | 14340/20117 [9:11:43<3:42:39,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14341/20117 [9:11:45<3:42:29,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14342/20117 [9:11:48<3:42:48,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14343/20117 [9:11:50<3:40:53,  2.30s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14344/20117 [9:11:52<3:40:23,  2.29s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14345/20117 [9:11:54<3:40:02,  2.29s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14346/20117 [9:11:57<3:42:37,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14347/20117 [9:11:59<3:42:22,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14348/20117 [9:12:01<3:42:20,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14349/20117 [9:12:04<3:41:37,  2.31s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14350/20117 [9:12:06<3:39:07,  2.28s/it]                                                                                                                                 {'loss': 0.1645, 'grad_norm': 0.4278639256954193, 'learning_rate': 3.825235093435076e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.76, 'epoch': 1.43}
 71%|██████████████████████████████████████████████████████████▍                       | 14350/20117 [9:12:06<3:39:07,  2.28s/it] 71%|██████████████████████████████████████████████████████████▍                       | 14351/20117 [9:12:08<3:40:46,  2.30s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14352/20117 [9:12:11<3:40:19,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14353/20117 [9:12:13<3:40:18,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14354/20117 [9:12:15<3:49:24,  2.39s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14355/20117 [9:12:18<3:45:44,  2.35s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14356/20117 [9:12:20<3:41:02,  2.30s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14357/20117 [9:12:22<3:43:49,  2.33s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14358/20117 [9:12:25<3:42:52,  2.32s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14359/20117 [9:12:27<3:41:04,  2.30s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14360/20117 [9:12:29<3:39:57,  2.29s/it]                                                                                                                                 {'loss': 0.1323, 'grad_norm': 0.3658309876918793, 'learning_rate': 3.812897480421631e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.0, 'epoch': 1.43}
 71%|██████████████████████████████████████████████████████████▌                       | 14360/20117 [9:12:29<3:39:57,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14361/20117 [9:12:31<3:38:47,  2.28s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14362/20117 [9:12:34<3:38:25,  2.28s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14363/20117 [9:12:36<3:40:29,  2.30s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14364/20117 [9:12:38<3:40:19,  2.30s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14365/20117 [9:12:41<3:39:33,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14366/20117 [9:12:43<3:39:01,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14367/20117 [9:12:45<3:38:19,  2.28s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14368/20117 [9:12:47<3:38:30,  2.28s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14369/20117 [9:12:50<3:38:40,  2.28s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14370/20117 [9:12:52<3:40:56,  2.31s/it]                                                                                                                                 {'loss': 0.1764, 'grad_norm': 0.7314541339874268, 'learning_rate': 3.800575107549362e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.75, 'epoch': 1.43}
 71%|██████████████████████████████████████████████████████████▌                       | 14370/20117 [9:12:52<3:40:56,  2.31s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14371/20117 [9:12:54<3:44:27,  2.34s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14372/20117 [9:12:57<3:43:37,  2.34s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14373/20117 [9:12:59<3:42:40,  2.33s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14374/20117 [9:13:01<3:43:21,  2.33s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14375/20117 [9:13:04<3:41:14,  2.31s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14376/20117 [9:13:06<3:38:30,  2.28s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14377/20117 [9:13:08<3:38:48,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14378/20117 [9:13:11<3:38:50,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14379/20117 [9:13:13<3:37:45,  2.28s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14380/20117 [9:13:15<3:36:50,  2.27s/it]                                                                                                                                 {'loss': 0.1383, 'grad_norm': 0.3194984197616577, 'learning_rate': 3.788268005170883e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.0, 'epoch': 1.43}
 71%|██████████████████████████████████████████████████████████▌                       | 14380/20117 [9:13:15<3:36:50,  2.27s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14381/20117 [9:13:17<3:39:02,  2.29s/it] 71%|██████████████████████████████████████████████████████████▌                       | 14382/20117 [9:13:20<3:39:29,  2.30s/it] 71%|██████████████████████████████████████████████████████████▋                       | 14383/20117 [9:13:22<3:38:59,  2.29s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14384/20117 [9:13:24<3:44:01,  2.34s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14385/20117 [9:13:27<3:47:22,  2.38s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14386/20117 [9:13:29<3:43:02,  2.34s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14387/20117 [9:13:31<3:39:28,  2.30s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14388/20117 [9:13:33<3:35:57,  2.26s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14389/20117 [9:13:36<3:34:15,  2.24s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14390/20117 [9:13:38<3:38:16,  2.29s/it]                                                                                                                                 {'loss': 0.171, 'grad_norm': 0.24363841116428375, 'learning_rate': 3.7759762036011856e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.18, 'epoch': 1.43}
 72%|██████████████████████████████████████████████████████████▋                       | 14390/20117 [9:13:38<3:38:16,  2.29s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14391/20117 [9:13:40<3:36:16,  2.27s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14392/20117 [9:13:43<3:38:45,  2.29s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14393/20117 [9:13:45<3:37:36,  2.28s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14394/20117 [9:13:47<3:38:45,  2.29s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14395/20117 [9:13:49<3:37:32,  2.28s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14396/20117 [9:13:52<3:35:39,  2.26s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14397/20117 [9:13:54<3:36:14,  2.27s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14398/20117 [9:13:56<3:39:10,  2.30s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14399/20117 [9:13:59<3:38:52,  2.30s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14400/20117 [9:14:01<3:37:23,  2.28s/it]                                                                                                                                 {'loss': 0.1639, 'grad_norm': 0.8045446872711182, 'learning_rate': 3.7636997331175805e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.94, 'epoch': 1.43}
 72%|██████████████████████████████████████████████████████████▋                       | 14400/20117 [9:14:01<3:37:23,  2.28s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14401/20117 [9:14:03<3:35:23,  2.26s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14402/20117 [9:14:05<3:35:53,  2.27s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14403/20117 [9:14:08<3:36:13,  2.27s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14404/20117 [9:14:10<3:34:36,  2.25s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14405/20117 [9:14:12<3:36:09,  2.27s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14406/20117 [9:14:14<3:35:47,  2.27s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14407/20117 [9:14:17<3:35:05,  2.26s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14408/20117 [9:14:19<3:41:51,  2.33s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14409/20117 [9:14:21<3:40:39,  2.32s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14410/20117 [9:14:24<3:37:52,  2.29s/it]                                                                                                                                 {'loss': 0.1665, 'grad_norm': 0.45409709215164185, 'learning_rate': 3.751438623959601e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.33, 'epoch': 1.43}
 72%|██████████████████████████████████████████████████████████▋                       | 14410/20117 [9:14:24<3:37:52,  2.29s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14411/20117 [9:14:26<3:35:35,  2.27s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14412/20117 [9:14:28<3:36:52,  2.28s/it] 72%|██████████████████████████████████████████████████████████▋                       | 14413/20117 [9:14:31<3:36:57,  2.28s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14414/20117 [9:14:33<3:36:52,  2.28s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14415/20117 [9:14:35<3:36:54,  2.28s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14416/20117 [9:14:37<3:36:45,  2.28s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14417/20117 [9:14:40<3:35:45,  2.27s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14418/20117 [9:14:42<3:35:00,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14419/20117 [9:14:44<3:34:02,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14420/20117 [9:14:46<3:34:38,  2.26s/it]                                                                                                                                 {'loss': 0.1444, 'grad_norm': 0.5277268886566162, 'learning_rate': 3.739192906328958e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.77, 'epoch': 1.43}
 72%|██████████████████████████████████████████████████████████▊                       | 14420/20117 [9:14:46<3:34:38,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14421/20117 [9:14:49<3:33:49,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14422/20117 [9:14:51<3:34:25,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14423/20117 [9:14:53<3:34:36,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14424/20117 [9:14:55<3:34:11,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14425/20117 [9:14:58<3:33:05,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14426/20117 [9:15:00<3:33:44,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14427/20117 [9:15:02<3:34:39,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14428/20117 [9:15:04<3:34:22,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14429/20117 [9:15:07<3:33:24,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14430/20117 [9:15:09<3:33:43,  2.25s/it]                                                                                                                                 {'loss': 0.1362, 'grad_norm': 0.5323857069015503, 'learning_rate': 3.726962610389435e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.66, 'epoch': 1.43}
 72%|██████████████████████████████████████████████████████████▊                       | 14430/20117 [9:15:09<3:33:43,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14431/20117 [9:15:11<3:34:34,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14432/20117 [9:15:13<3:34:29,  2.26s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14433/20117 [9:15:16<3:32:46,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14434/20117 [9:15:18<3:35:56,  2.28s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14435/20117 [9:15:20<3:34:35,  2.27s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14436/20117 [9:15:22<3:32:44,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14437/20117 [9:15:25<3:32:46,  2.25s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14438/20117 [9:15:27<3:31:24,  2.23s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14439/20117 [9:15:29<3:30:45,  2.23s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14440/20117 [9:15:31<3:30:34,  2.23s/it]                                                                                                                                 {'loss': 0.1252, 'grad_norm': 0.7022963166236877, 'learning_rate': 3.7147477662668386e-05, 'memory/max_active (GiB)': 21.54, 'memory/max_allocated (GiB)': 21.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.51, 'epoch': 1.44}
 72%|██████████████████████████████████████████████████████████▊                       | 14440/20117 [9:15:31<3:30:34,  2.23s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14441/20117 [9:15:34<3:30:22,  2.22s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14442/20117 [9:15:36<3:32:07,  2.24s/it] 72%|██████████████████████████████████████████████████████████▊                       | 14443/20117 [9:15:38<3:32:45,  2.25s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14444/20117 [9:15:40<3:34:17,  2.27s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14445/20117 [9:15:43<3:34:20,  2.27s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14446/20117 [9:15:45<3:34:07,  2.27s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14447/20117 [9:15:47<3:32:52,  2.25s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14448/20117 [9:15:49<3:33:18,  2.26s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14449/20117 [9:15:52<3:33:45,  2.26s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14450/20117 [9:15:54<3:33:10,  2.26s/it]                                                                                                                                 {'loss': 0.1617, 'grad_norm': 0.4480128586292267, 'learning_rate': 3.702548404048917e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 400.86, 'epoch': 1.44}
 72%|██████████████████████████████████████████████████████████▉                       | 14450/20117 [9:15:54<3:33:10,  2.26s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14451/20117 [9:15:56<3:34:52,  2.28s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14452/20117 [9:15:58<3:32:47,  2.25s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14453/20117 [9:16:01<3:33:58,  2.27s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14454/20117 [9:16:03<3:34:34,  2.27s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14455/20117 [9:16:05<3:34:13,  2.27s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14456/20117 [9:16:08<3:34:42,  2.28s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14457/20117 [9:16:10<3:34:38,  2.28s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14458/20117 [9:16:12<3:34:15,  2.27s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14459/20117 [9:16:14<3:35:02,  2.28s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14460/20117 [9:16:17<3:40:12,  2.34s/it]                                                                                                                                 {'loss': 0.1598, 'grad_norm': 0.508468747138977, 'learning_rate': 3.690364553785268e-05, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.65, 'epoch': 1.44}
 72%|██████████████████████████████████████████████████████████▉                       | 14460/20117 [9:16:17<3:40:12,  2.34s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14461/20117 [9:16:19<3:36:05,  2.29s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14462/20117 [9:16:21<3:34:30,  2.28s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14463/20117 [9:16:24<3:34:39,  2.28s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14464/20117 [9:16:26<3:32:25,  2.25s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14465/20117 [9:16:28<3:30:10,  2.23s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14466/20117 [9:16:30<3:29:08,  2.22s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14467/20117 [9:16:32<3:29:34,  2.23s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14468/20117 [9:16:35<3:28:27,  2.21s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14469/20117 [9:16:37<3:30:40,  2.24s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14470/20117 [9:16:39<3:29:35,  2.23s/it]                                                                                                                                 {'loss': 0.1208, 'grad_norm': 0.3107985854148865, 'learning_rate': 3.678196245487299e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 328.38, 'epoch': 1.44}
 72%|██████████████████████████████████████████████████████████▉                       | 14470/20117 [9:16:39<3:29:35,  2.23s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14471/20117 [9:16:41<3:32:37,  2.26s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14472/20117 [9:16:44<3:31:33,  2.25s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14473/20117 [9:16:46<3:31:37,  2.25s/it] 72%|██████████████████████████████████████████████████████████▉                       | 14474/20117 [9:16:48<3:30:21,  2.24s/it] 72%|███████████████████████████████████████████████████████████                       | 14475/20117 [9:16:50<3:30:32,  2.24s/it] 72%|███████████████████████████████████████████████████████████                       | 14476/20117 [9:16:53<3:29:31,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14477/20117 [9:16:55<3:29:40,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14478/20117 [9:16:57<3:29:55,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14479/20117 [9:16:59<3:29:31,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14480/20117 [9:17:01<3:28:19,  2.22s/it]                                                                                                                                 {'loss': 0.1969, 'grad_norm': 0.5842998027801514, 'learning_rate': 3.666043509128118e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 419.05, 'epoch': 1.44}
 72%|███████████████████████████████████████████████████████████                       | 14480/20117 [9:17:01<3:28:19,  2.22s/it] 72%|███████████████████████████████████████████████████████████                       | 14481/20117 [9:17:04<3:28:03,  2.21s/it] 72%|███████████████████████████████████████████████████████████                       | 14482/20117 [9:17:06<3:26:43,  2.20s/it] 72%|███████████████████████████████████████████████████████████                       | 14483/20117 [9:17:08<3:26:05,  2.19s/it] 72%|███████████████████████████████████████████████████████████                       | 14484/20117 [9:17:10<3:26:00,  2.19s/it] 72%|███████████████████████████████████████████████████████████                       | 14485/20117 [9:17:12<3:26:53,  2.20s/it] 72%|███████████████████████████████████████████████████████████                       | 14486/20117 [9:17:15<3:28:09,  2.22s/it] 72%|███████████████████████████████████████████████████████████                       | 14487/20117 [9:17:17<3:28:57,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14488/20117 [9:17:19<3:28:10,  2.22s/it] 72%|███████████████████████████████████████████████████████████                       | 14489/20117 [9:17:21<3:28:40,  2.22s/it] 72%|███████████████████████████████████████████████████████████                       | 14490/20117 [9:17:24<3:29:04,  2.23s/it]                                                                                                                                 {'loss': 0.1706, 'grad_norm': 0.6047186255455017, 'learning_rate': 3.6539063746424884e-05, 'memory/max_active (GiB)': 19.82, 'memory/max_allocated (GiB)': 19.82, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.49, 'epoch': 1.44}
 72%|███████████████████████████████████████████████████████████                       | 14490/20117 [9:17:24<3:29:04,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14491/20117 [9:17:26<3:28:20,  2.22s/it] 72%|███████████████████████████████████████████████████████████                       | 14492/20117 [9:17:28<3:28:59,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14493/20117 [9:17:30<3:29:13,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14494/20117 [9:17:33<3:29:05,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14495/20117 [9:17:35<3:29:50,  2.24s/it] 72%|███████████████████████████████████████████████████████████                       | 14496/20117 [9:17:37<3:30:47,  2.25s/it] 72%|███████████████████████████████████████████████████████████                       | 14497/20117 [9:17:39<3:31:59,  2.26s/it] 72%|███████████████████████████████████████████████████████████                       | 14498/20117 [9:17:42<3:33:55,  2.28s/it] 72%|███████████████████████████████████████████████████████████                       | 14499/20117 [9:17:44<3:34:59,  2.30s/it] 72%|███████████████████████████████████████████████████████████                       | 14500/20117 [9:17:46<3:33:05,  2.28s/it]                                                                                                                                 {'loss': 0.1393, 'grad_norm': 0.37089505791664124, 'learning_rate': 3.641784871926733e-05, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 393.16, 'epoch': 1.44}
 72%|███████████████████████████████████████████████████████████                       | 14500/20117 [9:17:46<3:33:05,  2.28s/it] 72%|███████████████████████████████████████████████████████████                       | 14501/20117 [9:17:48<3:31:07,  2.26s/it] 72%|███████████████████████████████████████████████████████████                       | 14502/20117 [9:17:51<3:29:55,  2.24s/it] 72%|███████████████████████████████████████████████████████████                       | 14503/20117 [9:17:53<3:28:29,  2.23s/it] 72%|███████████████████████████████████████████████████████████                       | 14504/20117 [9:17:55<3:29:23,  2.24s/it] 72%|███████████████████████████████████████████████████████████                       | 14505/20117 [9:17:57<3:28:55,  2.23s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14506/20117 [9:18:00<3:30:10,  2.25s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14507/20117 [9:18:02<3:28:12,  2.23s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14508/20117 [9:18:04<3:27:41,  2.22s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14509/20117 [9:18:06<3:29:02,  2.24s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14510/20117 [9:18:09<3:29:24,  2.24s/it]                                                                                                                                 {'loss': 0.1796, 'grad_norm': 0.472478985786438, 'learning_rate': 3.629679030838682e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 386.53, 'epoch': 1.44}
 72%|███████████████████████████████████████████████████████████▏                      | 14510/20117 [9:18:09<3:29:24,  2.24s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14511/20117 [9:18:11<3:38:31,  2.34s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14512/20117 [9:18:13<3:35:35,  2.31s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14513/20117 [9:18:16<3:33:43,  2.29s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14514/20117 [9:18:18<3:31:26,  2.26s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14515/20117 [9:18:20<3:29:38,  2.25s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14516/20117 [9:18:22<3:29:37,  2.25s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14517/20117 [9:18:24<3:28:30,  2.23s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14518/20117 [9:18:27<3:27:13,  2.22s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14519/20117 [9:18:29<3:27:04,  2.22s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14520/20117 [9:18:31<3:26:38,  2.22s/it]                                                                                                                                 {'loss': 0.1152, 'grad_norm': 0.3645547330379486, 'learning_rate': 3.617588881197571e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.27, 'epoch': 1.44}
 72%|███████████████████████████████████████████████████████████▏                      | 14520/20117 [9:18:31<3:26:38,  2.22s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14521/20117 [9:18:33<3:27:53,  2.23s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14522/20117 [9:18:36<3:27:03,  2.22s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14523/20117 [9:18:38<3:27:23,  2.22s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14524/20117 [9:18:40<3:28:57,  2.24s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14525/20117 [9:18:42<3:29:21,  2.25s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14526/20117 [9:18:45<3:30:09,  2.26s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14527/20117 [9:18:47<3:30:37,  2.26s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14528/20117 [9:18:49<3:30:03,  2.25s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14529/20117 [9:18:51<3:30:51,  2.26s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14530/20117 [9:18:54<3:32:25,  2.28s/it]                                                                                                                                 {'loss': 0.1252, 'grad_norm': 0.16537845134735107, 'learning_rate': 3.605514452784e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.08, 'epoch': 1.44}
 72%|███████████████████████████████████████████████████████████▏                      | 14530/20117 [9:18:54<3:32:25,  2.28s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14531/20117 [9:18:56<3:33:05,  2.29s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14532/20117 [9:18:58<3:32:37,  2.28s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14533/20117 [9:19:01<3:32:17,  2.28s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14534/20117 [9:19:03<3:32:25,  2.28s/it] 72%|███████████████████████████████████████████████████████████▏                      | 14535/20117 [9:19:05<3:32:36,  2.29s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14536/20117 [9:19:08<3:39:48,  2.36s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14537/20117 [9:19:10<3:39:11,  2.36s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14538/20117 [9:19:12<3:36:55,  2.33s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14539/20117 [9:19:15<3:33:18,  2.29s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14540/20117 [9:19:17<3:31:02,  2.27s/it]                                                                                                                                 {'loss': 0.1967, 'grad_norm': 0.49066445231437683, 'learning_rate': 3.593455775339837e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 403.93, 'epoch': 1.45}
 72%|███████████████████████████████████████████████████████████▎                      | 14540/20117 [9:19:17<3:31:02,  2.27s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14541/20117 [9:19:19<3:29:04,  2.25s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14542/20117 [9:19:21<3:29:45,  2.26s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14543/20117 [9:19:23<3:29:39,  2.26s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14544/20117 [9:19:26<3:30:08,  2.26s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14545/20117 [9:19:28<3:29:19,  2.25s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14546/20117 [9:19:30<3:29:17,  2.25s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14547/20117 [9:19:33<3:30:52,  2.27s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14548/20117 [9:19:35<3:28:43,  2.25s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14549/20117 [9:19:37<3:27:45,  2.24s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14550/20117 [9:19:39<3:26:51,  2.23s/it]                                                                                                                                 {'loss': 0.1566, 'grad_norm': 0.424955815076828, 'learning_rate': 3.5814128785681554e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.0, 'epoch': 1.45}
 72%|███████████████████████████████████████████████████████████▎                      | 14550/20117 [9:19:39<3:26:51,  2.23s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14551/20117 [9:19:41<3:26:36,  2.23s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14552/20117 [9:19:44<3:28:29,  2.25s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14553/20117 [9:19:46<3:30:55,  2.27s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14554/20117 [9:19:48<3:30:52,  2.27s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14555/20117 [9:19:50<3:28:51,  2.25s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14556/20117 [9:19:53<3:30:18,  2.27s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14557/20117 [9:19:55<3:32:19,  2.29s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14558/20117 [9:19:57<3:32:10,  2.29s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14559/20117 [9:20:00<3:30:53,  2.28s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14560/20117 [9:20:02<3:29:41,  2.26s/it]                                                                                                                                 {'loss': 0.1663, 'grad_norm': 0.5189529657363892, 'learning_rate': 3.569385792133151e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 384.8, 'epoch': 1.45}
 72%|███████████████████████████████████████████████████████████▎                      | 14560/20117 [9:20:02<3:29:41,  2.26s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14561/20117 [9:20:04<3:37:12,  2.35s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14562/20117 [9:20:07<3:35:31,  2.33s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14563/20117 [9:20:09<3:32:29,  2.30s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14564/20117 [9:20:11<3:32:07,  2.29s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14565/20117 [9:20:13<3:30:00,  2.27s/it] 72%|███████████████████████████████████████████████████████████▎                      | 14566/20117 [9:20:16<3:30:19,  2.27s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14567/20117 [9:20:18<3:28:30,  2.25s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14568/20117 [9:20:20<3:29:34,  2.27s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14569/20117 [9:20:22<3:28:03,  2.25s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14570/20117 [9:20:25<3:26:26,  2.23s/it]                                                                                                                                 {'loss': 0.1533, 'grad_norm': 0.3756466209888458, 'learning_rate': 3.5573745456600826e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.89, 'epoch': 1.45}
 72%|███████████████████████████████████████████████████████████▍                      | 14570/20117 [9:20:25<3:26:26,  2.23s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14571/20117 [9:20:27<3:25:59,  2.23s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14572/20117 [9:20:29<3:24:53,  2.22s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14573/20117 [9:20:31<3:24:23,  2.21s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14574/20117 [9:20:34<3:26:42,  2.24s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14575/20117 [9:20:36<3:26:22,  2.23s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14576/20117 [9:20:38<3:27:52,  2.25s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14577/20117 [9:20:40<3:28:47,  2.26s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14578/20117 [9:20:43<3:27:19,  2.25s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14579/20117 [9:20:45<3:26:25,  2.24s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14580/20117 [9:20:47<3:27:40,  2.25s/it]                                                                                                                                 {'loss': 0.1883, 'grad_norm': 0.5015901923179626, 'learning_rate': 3.54537916873519e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.11, 'epoch': 1.45}
 72%|███████████████████████████████████████████████████████████▍                      | 14580/20117 [9:20:47<3:27:40,  2.25s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14581/20117 [9:20:49<3:26:21,  2.24s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14582/20117 [9:20:51<3:24:48,  2.22s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14583/20117 [9:20:54<3:24:32,  2.22s/it] 72%|███████████████████████████████████████████████████████████▍                      | 14584/20117 [9:20:56<3:25:46,  2.23s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14585/20117 [9:20:58<3:24:38,  2.22s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14586/20117 [9:21:00<3:24:35,  2.22s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14587/20117 [9:21:02<3:23:31,  2.21s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14588/20117 [9:21:05<3:23:35,  2.21s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14589/20117 [9:21:07<3:24:11,  2.22s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14590/20117 [9:21:09<3:25:12,  2.23s/it]                                                                                                                                 {'loss': 0.1883, 'grad_norm': 0.49634647369384766, 'learning_rate': 3.5333996909056176e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 392.27, 'epoch': 1.45}
 73%|███████████████████████████████████████████████████████████▍                      | 14590/20117 [9:21:09<3:25:12,  2.23s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14591/20117 [9:21:11<3:25:04,  2.23s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14592/20117 [9:21:14<3:23:32,  2.21s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14593/20117 [9:21:16<3:25:38,  2.23s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14594/20117 [9:21:18<3:25:29,  2.23s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14595/20117 [9:21:20<3:24:46,  2.23s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14596/20117 [9:21:23<3:24:32,  2.22s/it] 73%|███████████████████████████████████████████████████████████▍                      | 14597/20117 [9:21:25<3:25:56,  2.24s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14598/20117 [9:21:27<3:26:56,  2.25s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14599/20117 [9:21:29<3:25:17,  2.23s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14600/20117 [9:21:32<3:28:20,  2.27s/it]                                                                                                                                 {'loss': 0.1499, 'grad_norm': 0.27927589416503906, 'learning_rate': 3.521436141679357e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.04, 'epoch': 1.45}
 73%|███████████████████████████████████████████████████████████▌                      | 14600/20117 [9:21:32<3:28:20,  2.27s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14601/20117 [9:21:34<3:26:13,  2.24s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14602/20117 [9:21:36<3:25:38,  2.24s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14603/20117 [9:21:38<3:24:24,  2.22s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14604/20117 [9:21:40<3:24:59,  2.23s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14605/20117 [9:21:43<3:24:13,  2.22s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14606/20117 [9:21:45<3:25:25,  2.24s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14607/20117 [9:21:47<3:26:13,  2.25s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14608/20117 [9:21:49<3:24:53,  2.23s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14609/20117 [9:21:52<3:25:30,  2.24s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14610/20117 [9:21:54<3:24:57,  2.23s/it]                                                                                                                                 {'loss': 0.1779, 'grad_norm': 0.6297810077667236, 'learning_rate': 3.5094885505251515e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.31, 'epoch': 1.45}
 73%|███████████████████████████████████████████████████████████▌                      | 14610/20117 [9:21:54<3:24:57,  2.23s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14611/20117 [9:21:56<3:24:23,  2.23s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14612/20117 [9:21:58<3:23:36,  2.22s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14613/20117 [9:22:01<3:23:02,  2.21s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14614/20117 [9:22:03<3:33:02,  2.32s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14615/20117 [9:22:05<3:31:03,  2.30s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14616/20117 [9:22:08<3:27:48,  2.27s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14617/20117 [9:22:10<3:26:20,  2.25s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14618/20117 [9:22:12<3:27:13,  2.26s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14619/20117 [9:22:14<3:27:52,  2.27s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14620/20117 [9:22:17<3:26:15,  2.25s/it]                                                                                                                                 {'loss': 0.1784, 'grad_norm': 0.6115005612373352, 'learning_rate': 3.497556946872451e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.67, 'epoch': 1.45}
 73%|███████████████████████████████████████████████████████████▌                      | 14620/20117 [9:22:17<3:26:15,  2.25s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14621/20117 [9:22:19<3:25:15,  2.24s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14622/20117 [9:22:21<3:24:03,  2.23s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14623/20117 [9:22:23<3:24:10,  2.23s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14624/20117 [9:22:25<3:23:18,  2.22s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14625/20117 [9:22:28<3:23:02,  2.22s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14626/20117 [9:22:30<3:22:13,  2.21s/it] 73%|███████████████████████████████████████████████████████████▌                      | 14627/20117 [9:22:32<3:27:10,  2.26s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14628/20117 [9:22:34<3:24:51,  2.24s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14629/20117 [9:22:37<3:23:03,  2.22s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14630/20117 [9:22:39<3:25:01,  2.24s/it]                                                                                                                                 {'loss': 0.1118, 'grad_norm': 0.3485526442527771, 'learning_rate': 3.485641360111309e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 305.03, 'epoch': 1.45}
 73%|███████████████████████████████████████████████████████████▋                      | 14630/20117 [9:22:39<3:25:01,  2.24s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14631/20117 [9:22:41<3:23:37,  2.23s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14632/20117 [9:22:43<3:22:19,  2.21s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14633/20117 [9:22:45<3:22:16,  2.21s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14634/20117 [9:22:48<3:22:57,  2.22s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14635/20117 [9:22:50<3:22:52,  2.22s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14636/20117 [9:22:52<3:24:04,  2.23s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14637/20117 [9:22:54<3:25:20,  2.25s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14638/20117 [9:22:57<3:24:30,  2.24s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14639/20117 [9:22:59<3:23:39,  2.23s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14640/20117 [9:23:01<3:24:27,  2.24s/it]                                                                                                                                 {'loss': 0.1297, 'grad_norm': 0.5678386688232422, 'learning_rate': 3.473741819592341e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.8, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▋                      | 14640/20117 [9:23:01<3:24:27,  2.24s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14641/20117 [9:23:03<3:24:27,  2.24s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14642/20117 [9:23:06<3:25:17,  2.25s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14643/20117 [9:23:08<3:23:50,  2.23s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14644/20117 [9:23:10<3:23:08,  2.23s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14645/20117 [9:23:12<3:22:45,  2.22s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14646/20117 [9:23:14<3:23:00,  2.23s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14647/20117 [9:23:17<3:24:22,  2.24s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14648/20117 [9:23:19<3:25:34,  2.26s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14649/20117 [9:23:21<3:26:08,  2.26s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14650/20117 [9:23:24<3:26:41,  2.27s/it]                                                                                                                                 {'loss': 0.1583, 'grad_norm': 0.5769549012184143, 'learning_rate': 3.4618583546266246e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.23, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▋                      | 14650/20117 [9:23:24<3:26:41,  2.27s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14651/20117 [9:23:26<3:26:54,  2.27s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14652/20117 [9:23:28<3:27:38,  2.28s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14653/20117 [9:23:30<3:27:01,  2.27s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14654/20117 [9:23:33<3:28:06,  2.29s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14655/20117 [9:23:35<3:27:25,  2.28s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14656/20117 [9:23:37<3:27:11,  2.28s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14657/20117 [9:23:40<3:26:52,  2.27s/it] 73%|███████████████████████████████████████████████████████████▋                      | 14658/20117 [9:23:42<3:27:26,  2.28s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14659/20117 [9:23:44<3:26:06,  2.27s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14660/20117 [9:23:46<3:25:10,  2.26s/it]                                                                                                                                 {'loss': 0.1803, 'grad_norm': 0.48255455493927, 'learning_rate': 3.449990994485649e-05, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.5, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▊                      | 14660/20117 [9:23:46<3:25:10,  2.26s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14661/20117 [9:23:48<3:23:45,  2.24s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14662/20117 [9:23:51<3:22:32,  2.23s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14663/20117 [9:23:53<3:22:44,  2.23s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14664/20117 [9:23:55<3:24:01,  2.24s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14665/20117 [9:23:57<3:24:38,  2.25s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14666/20117 [9:24:00<3:25:51,  2.27s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14667/20117 [9:24:02<3:25:13,  2.26s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14668/20117 [9:24:05<3:32:00,  2.33s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14669/20117 [9:24:07<3:29:02,  2.30s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14670/20117 [9:24:09<3:26:48,  2.28s/it]                                                                                                                                 {'loss': 0.1496, 'grad_norm': 0.5541951060295105, 'learning_rate': 3.4381397684012296e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.55, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▊                      | 14670/20117 [9:24:09<3:26:48,  2.28s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14671/20117 [9:24:11<3:26:17,  2.27s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14672/20117 [9:24:14<3:25:54,  2.27s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14673/20117 [9:24:16<3:23:45,  2.25s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14674/20117 [9:24:18<3:24:07,  2.25s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14675/20117 [9:24:20<3:24:44,  2.26s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14676/20117 [9:24:22<3:23:29,  2.24s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14677/20117 [9:24:25<3:23:54,  2.25s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14678/20117 [9:24:27<3:23:05,  2.24s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14679/20117 [9:24:29<3:24:34,  2.26s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14680/20117 [9:24:31<3:22:56,  2.24s/it]                                                                                                                                 {'loss': 0.1257, 'grad_norm': 0.4175207316875458, 'learning_rate': 3.426304705565445e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.73, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▊                      | 14680/20117 [9:24:31<3:22:56,  2.24s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14681/20117 [9:24:34<3:23:25,  2.25s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14682/20117 [9:24:36<3:22:30,  2.24s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14683/20117 [9:24:38<3:22:03,  2.23s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14684/20117 [9:24:40<3:21:15,  2.22s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14685/20117 [9:24:42<3:20:06,  2.21s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14686/20117 [9:24:45<3:19:59,  2.21s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14687/20117 [9:24:47<3:19:50,  2.21s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14688/20117 [9:24:49<3:20:59,  2.22s/it] 73%|███████████████████████████████████████████████████████████▊                      | 14689/20117 [9:24:51<3:20:33,  2.22s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14690/20117 [9:24:54<3:21:54,  2.23s/it]                                                                                                                                 {'loss': 0.1461, 'grad_norm': 0.7478891611099243, 'learning_rate': 3.4144858351305496e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.51, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▉                      | 14690/20117 [9:24:54<3:21:54,  2.23s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14691/20117 [9:24:56<3:22:14,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14692/20117 [9:24:58<3:22:46,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14693/20117 [9:25:00<3:22:12,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14694/20117 [9:25:03<3:21:58,  2.23s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14695/20117 [9:25:05<3:20:56,  2.22s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14696/20117 [9:25:07<3:21:57,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14697/20117 [9:25:09<3:25:01,  2.27s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14698/20117 [9:25:12<3:23:17,  2.25s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14699/20117 [9:25:14<3:25:07,  2.27s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14700/20117 [9:25:16<3:24:59,  2.27s/it]                                                                                                                                 {'loss': 0.1719, 'grad_norm': 0.418258935213089, 'learning_rate': 3.402683186208922e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.01, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▉                      | 14700/20117 [9:25:16<3:24:59,  2.27s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14701/20117 [9:25:18<3:23:26,  2.25s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14702/20117 [9:25:21<3:22:59,  2.25s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14703/20117 [9:25:23<3:22:27,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14704/20117 [9:25:25<3:22:22,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14705/20117 [9:25:27<3:22:01,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14706/20117 [9:25:30<3:21:06,  2.23s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14707/20117 [9:25:32<3:22:11,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14708/20117 [9:25:34<3:23:30,  2.26s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14709/20117 [9:25:36<3:21:31,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14710/20117 [9:25:39<3:22:25,  2.25s/it]                                                                                                                                 {'loss': 0.1876, 'grad_norm': 0.6298655271530151, 'learning_rate': 3.390896787872985e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 384.18, 'epoch': 1.46}
 73%|███████████████████████████████████████████████████████████▉                      | 14710/20117 [9:25:39<3:22:25,  2.25s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14711/20117 [9:25:41<3:23:38,  2.26s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14712/20117 [9:25:43<3:23:26,  2.26s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14713/20117 [9:25:45<3:21:48,  2.24s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14714/20117 [9:25:48<3:23:09,  2.26s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14715/20117 [9:25:50<3:22:17,  2.25s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14716/20117 [9:25:52<3:23:34,  2.26s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14717/20117 [9:25:55<3:28:16,  2.31s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14718/20117 [9:25:57<3:29:25,  2.33s/it] 73%|███████████████████████████████████████████████████████████▉                      | 14719/20117 [9:25:59<3:28:58,  2.32s/it] 73%|████████████████████████████████████████████████████████████                      | 14720/20117 [9:26:02<3:42:07,  2.47s/it]                                                                                                                                 {'loss': 0.1502, 'grad_norm': 0.2738591134548187, 'learning_rate': 3.379126669155122e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.12, 'epoch': 1.46}
 73%|████████████████████████████████████████████████████████████                      | 14720/20117 [9:26:02<3:42:07,  2.47s/it] 73%|████████████████████████████████████████████████████████████                      | 14721/20117 [9:26:04<3:37:17,  2.42s/it] 73%|████████████████████████████████████████████████████████████                      | 14722/20117 [9:26:07<3:37:09,  2.42s/it] 73%|████████████████████████████████████████████████████████████                      | 14723/20117 [9:26:09<3:34:55,  2.39s/it] 73%|████████████████████████████████████████████████████████████                      | 14724/20117 [9:26:11<3:31:26,  2.35s/it] 73%|████████████████████████████████████████████████████████████                      | 14725/20117 [9:26:14<3:27:49,  2.31s/it] 73%|████████████████████████████████████████████████████████████                      | 14726/20117 [9:26:16<3:24:58,  2.28s/it] 73%|████████████████████████████████████████████████████████████                      | 14727/20117 [9:26:18<3:23:16,  2.26s/it] 73%|████████████████████████████████████████████████████████████                      | 14728/20117 [9:26:20<3:21:47,  2.25s/it] 73%|████████████████████████████████████████████████████████████                      | 14729/20117 [9:26:22<3:21:48,  2.25s/it] 73%|████████████████████████████████████████████████████████████                      | 14730/20117 [9:26:25<3:20:45,  2.24s/it]                                                                                                                                 {'loss': 0.1819, 'grad_norm': 0.6010544896125793, 'learning_rate': 3.3673728590476296e-05, 'memory/max_active (GiB)': 20.44, 'memory/max_allocated (GiB)': 20.44, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.31, 'epoch': 1.46}
 73%|████████████████████████████████████████████████████████████                      | 14730/20117 [9:26:25<3:20:45,  2.24s/it] 73%|████████████████████████████████████████████████████████████                      | 14731/20117 [9:26:27<3:21:05,  2.24s/it] 73%|████████████████████████████████████████████████████████████                      | 14732/20117 [9:26:29<3:21:07,  2.24s/it] 73%|████████████████████████████████████████████████████████████                      | 14733/20117 [9:26:31<3:21:24,  2.24s/it] 73%|████████████████████████████████████████████████████████████                      | 14734/20117 [9:26:34<3:20:16,  2.23s/it] 73%|████████████████████████████████████████████████████████████                      | 14735/20117 [9:26:36<3:19:31,  2.22s/it] 73%|████████████████████████████████████████████████████████████                      | 14736/20117 [9:26:38<3:20:11,  2.23s/it] 73%|████████████████████████████████████████████████████████████                      | 14737/20117 [9:26:40<3:20:50,  2.24s/it] 73%|████████████████████████████████████████████████████████████                      | 14738/20117 [9:26:43<3:19:54,  2.23s/it] 73%|████████████████████████████████████████████████████████████                      | 14739/20117 [9:26:45<3:20:43,  2.24s/it] 73%|████████████████████████████████████████████████████████████                      | 14740/20117 [9:26:47<3:22:13,  2.26s/it]                                                                                                                                 {'loss': 0.1465, 'grad_norm': 0.4839765727519989, 'learning_rate': 3.355635386502619e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.37, 'epoch': 1.47}
 73%|████████████████████████████████████████████████████████████                      | 14740/20117 [9:26:47<3:22:13,  2.26s/it] 73%|████████████████████████████████████████████████████████████                      | 14741/20117 [9:26:49<3:23:05,  2.27s/it] 73%|████████████████████████████████████████████████████████████                      | 14742/20117 [9:26:52<3:23:42,  2.27s/it] 73%|████████████████████████████████████████████████████████████                      | 14743/20117 [9:26:54<3:22:59,  2.27s/it] 73%|████████████████████████████████████████████████████████████                      | 14744/20117 [9:26:56<3:22:24,  2.26s/it] 73%|████████████████████████████████████████████████████████████                      | 14745/20117 [9:26:58<3:23:06,  2.27s/it] 73%|████████████████████████████████████████████████████████████                      | 14746/20117 [9:27:01<3:22:43,  2.26s/it] 73%|████████████████████████████████████████████████████████████                      | 14747/20117 [9:27:03<3:21:34,  2.25s/it] 73%|████████████████████████████████████████████████████████████                      | 14748/20117 [9:27:05<3:21:18,  2.25s/it] 73%|████████████████████████████████████████████████████████████                      | 14749/20117 [9:27:07<3:20:48,  2.24s/it] 73%|████████████████████████████████████████████████████████████                      | 14750/20117 [9:27:10<3:27:16,  2.32s/it]                                                                                                                                 {'loss': 0.1178, 'grad_norm': 0.569148063659668, 'learning_rate': 3.3439142804319743e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.15, 'epoch': 1.47}
 73%|████████████████████████████████████████████████████████████                      | 14750/20117 [9:27:10<3:27:16,  2.32s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14751/20117 [9:27:12<3:30:27,  2.35s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14752/20117 [9:27:15<3:27:19,  2.32s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14753/20117 [9:27:17<3:24:28,  2.29s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14754/20117 [9:27:19<3:21:52,  2.26s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14755/20117 [9:27:21<3:20:43,  2.25s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14756/20117 [9:27:23<3:20:38,  2.25s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14757/20117 [9:27:26<3:22:00,  2.26s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14758/20117 [9:27:28<3:22:00,  2.26s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14759/20117 [9:27:30<3:22:31,  2.27s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14760/20117 [9:27:33<3:23:21,  2.28s/it]                                                                                                                                 {'loss': 0.1648, 'grad_norm': 0.7138640284538269, 'learning_rate': 3.3322095697072496e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.69, 'epoch': 1.47}
 73%|████████████████████████████████████████████████████████████▏                     | 14760/20117 [9:27:33<3:23:21,  2.28s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14761/20117 [9:27:35<3:29:44,  2.35s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14762/20117 [9:27:38<3:31:39,  2.37s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14763/20117 [9:27:40<3:27:25,  2.32s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14764/20117 [9:27:42<3:26:53,  2.32s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14765/20117 [9:27:44<3:26:37,  2.32s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14766/20117 [9:27:47<3:26:39,  2.32s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14767/20117 [9:27:49<3:24:45,  2.30s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14768/20117 [9:27:51<3:28:35,  2.34s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14769/20117 [9:27:54<3:31:10,  2.37s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14770/20117 [9:27:56<3:29:34,  2.35s/it]                                                                                                                                 {'loss': 0.1368, 'grad_norm': 0.7108574509620667, 'learning_rate': 3.3205212831596264e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 386.29, 'epoch': 1.47}
 73%|████████████████████████████████████████████████████████████▏                     | 14770/20117 [9:27:56<3:29:34,  2.35s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14771/20117 [9:27:58<3:29:26,  2.35s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14772/20117 [9:28:01<3:29:25,  2.35s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14773/20117 [9:28:04<3:40:21,  2.47s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14774/20117 [9:28:06<3:37:49,  2.45s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14775/20117 [9:28:08<3:36:04,  2.43s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14776/20117 [9:28:11<3:36:29,  2.43s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14777/20117 [9:28:13<3:35:59,  2.43s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14778/20117 [9:28:16<3:35:40,  2.42s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14779/20117 [9:28:18<3:36:50,  2.44s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14780/20117 [9:28:21<3:37:18,  2.44s/it]                                                                                                                                 {'loss': 0.164, 'grad_norm': 0.347858190536499, 'learning_rate': 3.30884944957982e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 400.72, 'epoch': 1.47}
 73%|████████████████████████████████████████████████████████████▏                     | 14780/20117 [9:28:21<3:37:18,  2.44s/it] 73%|████████████████████████████████████████████████████████████▏                     | 14781/20117 [9:28:23<3:36:16,  2.43s/it] 73%|████████████████████████████████████████████████████████████▎                     | 14782/20117 [9:28:25<3:35:28,  2.42s/it] 73%|████████████████████████████████████████████████████████████▎                     | 14783/20117 [9:28:28<3:35:27,  2.42s/it] 73%|████████████████████████████████████████████████████████████▎                     | 14784/20117 [9:28:30<3:34:59,  2.42s/it] 73%|████████████████████████████████████████████████████████████▎                     | 14785/20117 [9:28:33<3:34:33,  2.41s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14786/20117 [9:28:35<3:35:05,  2.42s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14787/20117 [9:28:37<3:35:14,  2.42s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14788/20117 [9:28:40<3:36:29,  2.44s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14789/20117 [9:28:42<3:38:02,  2.46s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14790/20117 [9:28:45<3:37:43,  2.45s/it]                                                                                                                                 {'loss': 0.1748, 'grad_norm': 0.4929182827472687, 'learning_rate': 3.29719409771803e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.52, 'epoch': 1.47}
 74%|████████████████████████████████████████████████████████████▎                     | 14790/20117 [9:28:45<3:37:43,  2.45s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14791/20117 [9:28:47<3:39:14,  2.47s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14792/20117 [9:28:50<3:37:28,  2.45s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14793/20117 [9:28:52<3:34:45,  2.42s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14794/20117 [9:28:54<3:32:22,  2.39s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14795/20117 [9:28:57<3:32:32,  2.40s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14796/20117 [9:28:59<3:34:26,  2.42s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14797/20117 [9:29:02<3:35:52,  2.43s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14798/20117 [9:29:04<3:36:02,  2.44s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14799/20117 [9:29:07<3:36:05,  2.44s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14800/20117 [9:29:09<3:36:12,  2.44s/it]                                                                                                                                 {'loss': 0.1336, 'grad_norm': 0.4838101863861084, 'learning_rate': 3.2855552562838445e-05, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.25, 'epoch': 1.47}
 74%|████████████████████████████████████████████████████████████▎                     | 14800/20117 [9:29:09<3:36:12,  2.44s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14801/20117 [9:29:12<3:37:00,  2.45s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14802/20117 [9:29:14<3:39:11,  2.47s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14803/20117 [9:29:16<3:32:24,  2.40s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14804/20117 [9:29:19<3:28:48,  2.36s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14805/20117 [9:29:21<3:28:33,  2.36s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14806/20117 [9:29:23<3:27:37,  2.35s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14807/20117 [9:29:26<3:26:48,  2.34s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14808/20117 [9:29:28<3:28:05,  2.35s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14809/20117 [9:29:30<3:28:27,  2.36s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14810/20117 [9:29:33<3:24:04,  2.31s/it]                                                                                                                                 {'loss': 0.1646, 'grad_norm': 0.6003445386886597, 'learning_rate': 3.273932953946193e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.65, 'epoch': 1.47}
 74%|████████████████████████████████████████████████████████████▎                     | 14810/20117 [9:29:33<3:24:04,  2.31s/it] 74%|████████████████████████████████████████████████████████████▎                     | 14811/20117 [9:29:35<3:20:57,  2.27s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14812/20117 [9:29:37<3:18:34,  2.25s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14813/20117 [9:29:39<3:16:43,  2.23s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14814/20117 [9:29:41<3:17:46,  2.24s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14815/20117 [9:29:44<3:17:41,  2.24s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14816/20117 [9:29:46<3:16:57,  2.23s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14817/20117 [9:29:48<3:16:12,  2.22s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14818/20117 [9:29:50<3:16:10,  2.22s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14819/20117 [9:29:52<3:14:57,  2.21s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14820/20117 [9:29:55<3:16:05,  2.22s/it]                                                                                                                                 {'loss': 0.1307, 'grad_norm': 0.40864112973213196, 'learning_rate': 3.262327219333262e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 283.81, 'epoch': 1.47}
 74%|████████████████████████████████████████████████████████████▍                     | 14820/20117 [9:29:55<3:16:05,  2.22s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14821/20117 [9:29:57<3:18:14,  2.25s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14822/20117 [9:29:59<3:17:08,  2.23s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14823/20117 [9:30:02<3:19:12,  2.26s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14824/20117 [9:30:04<3:20:28,  2.27s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14825/20117 [9:30:06<3:21:32,  2.28s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14826/20117 [9:30:09<3:34:53,  2.44s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14827/20117 [9:30:11<3:32:45,  2.41s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14828/20117 [9:30:13<3:27:45,  2.36s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14829/20117 [9:30:16<3:24:18,  2.32s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14830/20117 [9:30:18<3:21:40,  2.29s/it]                                                                                                                                 {'loss': 0.1409, 'grad_norm': 0.5634713768959045, 'learning_rate': 3.250738081032433e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.86, 'epoch': 1.47}
 74%|████████████████████████████████████████████████████████████▍                     | 14830/20117 [9:30:18<3:21:40,  2.29s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14831/20117 [9:30:20<3:21:41,  2.29s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14832/20117 [9:30:22<3:20:47,  2.28s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14833/20117 [9:30:25<3:21:25,  2.29s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14834/20117 [9:30:27<3:23:35,  2.31s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14835/20117 [9:30:29<3:20:55,  2.28s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14836/20117 [9:30:32<3:19:03,  2.26s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14837/20117 [9:30:34<3:18:43,  2.26s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14838/20117 [9:30:36<3:17:45,  2.25s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14839/20117 [9:30:38<3:17:26,  2.24s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14840/20117 [9:30:41<3:16:59,  2.24s/it]                                                                                                                                 {'loss': 0.1392, 'grad_norm': 0.421989768743515, 'learning_rate': 3.239165567590197e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.01, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▍                     | 14840/20117 [9:30:41<3:16:59,  2.24s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14841/20117 [9:30:43<3:16:40,  2.24s/it] 74%|████████████████████████████████████████████████████████████▍                     | 14842/20117 [9:30:45<3:18:24,  2.26s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14843/20117 [9:30:47<3:17:11,  2.24s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14844/20117 [9:30:49<3:16:17,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14845/20117 [9:30:52<3:15:31,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14846/20117 [9:30:54<3:15:23,  2.22s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14847/20117 [9:30:56<3:16:08,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14848/20117 [9:30:58<3:16:04,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14849/20117 [9:31:01<3:14:54,  2.22s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14850/20117 [9:31:03<3:14:18,  2.21s/it]                                                                                                                                 {'loss': 0.2051, 'grad_norm': 0.8798142671585083, 'learning_rate': 3.2276097075121014e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 424.3, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▌                     | 14850/20117 [9:31:03<3:14:18,  2.21s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14851/20117 [9:31:05<3:15:06,  2.22s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14852/20117 [9:31:07<3:15:47,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14853/20117 [9:31:10<3:16:49,  2.24s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14854/20117 [9:31:12<3:15:13,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14855/20117 [9:31:14<3:14:26,  2.22s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14856/20117 [9:31:16<3:15:21,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14857/20117 [9:31:18<3:14:58,  2.22s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14858/20117 [9:31:21<3:15:37,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14859/20117 [9:31:23<3:15:30,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14860/20117 [9:31:25<3:14:29,  2.22s/it]                                                                                                                                 {'loss': 0.156, 'grad_norm': 0.4949728846549988, 'learning_rate': 3.216070529262678e-05, 'memory/max_active (GiB)': 20.65, 'memory/max_allocated (GiB)': 20.65, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.69, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▌                     | 14860/20117 [9:31:25<3:14:29,  2.22s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14861/20117 [9:31:27<3:15:24,  2.23s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14862/20117 [9:31:30<3:16:15,  2.24s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14863/20117 [9:31:32<3:16:45,  2.25s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14864/20117 [9:31:34<3:18:16,  2.26s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14865/20117 [9:31:36<3:19:52,  2.28s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14866/20117 [9:31:39<3:20:27,  2.29s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14867/20117 [9:31:41<3:20:09,  2.29s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14868/20117 [9:31:43<3:20:46,  2.30s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14869/20117 [9:31:46<3:21:17,  2.30s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14870/20117 [9:31:48<3:17:52,  2.26s/it]                                                                                                                                 {'loss': 0.1542, 'grad_norm': 0.44336196780204773, 'learning_rate': 3.204548061265353e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 437.39, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▌                     | 14870/20117 [9:31:48<3:17:52,  2.26s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14871/20117 [9:31:50<3:17:06,  2.25s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14872/20117 [9:31:52<3:18:17,  2.27s/it] 74%|████████████████████████████████████████████████████████████▌                     | 14873/20117 [9:31:55<3:19:26,  2.28s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14874/20117 [9:31:57<3:17:14,  2.26s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14875/20117 [9:31:59<3:15:24,  2.24s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14876/20117 [9:32:01<3:14:54,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14877/20117 [9:32:04<3:13:39,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14878/20117 [9:32:06<3:20:41,  2.30s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14879/20117 [9:32:08<3:17:52,  2.27s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14880/20117 [9:32:10<3:18:09,  2.27s/it]                                                                                                                                 {'loss': 0.1638, 'grad_norm': 0.3560582399368286, 'learning_rate': 3.193042331902408e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.98, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▋                     | 14880/20117 [9:32:11<3:18:09,  2.27s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14881/20117 [9:32:13<3:16:56,  2.26s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14882/20117 [9:32:15<3:14:54,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14883/20117 [9:32:17<3:14:39,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14884/20117 [9:32:19<3:14:51,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14885/20117 [9:32:22<3:14:40,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14886/20117 [9:32:24<3:13:12,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14887/20117 [9:32:26<3:13:50,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14888/20117 [9:32:28<3:14:23,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14889/20117 [9:32:30<3:14:39,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14890/20117 [9:32:33<3:15:13,  2.24s/it]                                                                                                                                 {'loss': 0.18, 'grad_norm': 0.5532661080360413, 'learning_rate': 3.181553369514881e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.94, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▋                     | 14890/20117 [9:32:33<3:15:13,  2.24s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14891/20117 [9:32:35<3:13:28,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14892/20117 [9:32:37<3:14:32,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14893/20117 [9:32:39<3:13:45,  2.23s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14894/20117 [9:32:42<3:13:05,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14895/20117 [9:32:44<3:12:58,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14896/20117 [9:32:46<3:11:59,  2.21s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14897/20117 [9:32:48<3:12:06,  2.21s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14898/20117 [9:32:50<3:12:41,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14899/20117 [9:32:53<3:13:02,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14900/20117 [9:32:55<3:13:21,  2.22s/it]                                                                                                                                 {'loss': 0.1776, 'grad_norm': 0.5040679574012756, 'learning_rate': 3.170081202402518e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 296.93, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▋                     | 14900/20117 [9:32:55<3:13:21,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14901/20117 [9:32:57<3:12:16,  2.21s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14902/20117 [9:32:59<3:12:39,  2.22s/it] 74%|████████████████████████████████████████████████████████████▋                     | 14903/20117 [9:33:02<3:13:32,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14904/20117 [9:33:04<3:14:32,  2.24s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14905/20117 [9:33:06<3:13:46,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14906/20117 [9:33:08<3:13:33,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14907/20117 [9:33:11<3:14:20,  2.24s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14908/20117 [9:33:13<3:17:13,  2.27s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14909/20117 [9:33:15<3:16:30,  2.26s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14910/20117 [9:33:17<3:14:56,  2.25s/it]                                                                                                                                 {'loss': 0.1919, 'grad_norm': 0.7191395163536072, 'learning_rate': 3.158625858823688e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 390.26, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▊                     | 14910/20117 [9:33:17<3:14:56,  2.25s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14911/20117 [9:33:20<3:16:27,  2.26s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14912/20117 [9:33:22<3:15:22,  2.25s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14913/20117 [9:33:24<3:14:15,  2.24s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14914/20117 [9:33:26<3:14:41,  2.25s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14915/20117 [9:33:29<3:13:54,  2.24s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14916/20117 [9:33:31<3:12:55,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14917/20117 [9:33:33<3:12:54,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14918/20117 [9:33:35<3:12:00,  2.22s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14919/20117 [9:33:37<3:11:21,  2.21s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14920/20117 [9:33:40<3:11:43,  2.21s/it]                                                                                                                                 {'loss': 0.1501, 'grad_norm': 0.6036331653594971, 'learning_rate': 3.1471873669953275e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.83, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▊                     | 14920/20117 [9:33:40<3:11:43,  2.21s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14921/20117 [9:33:42<3:11:04,  2.21s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14922/20117 [9:33:44<3:12:31,  2.22s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14923/20117 [9:33:46<3:12:36,  2.22s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14924/20117 [9:33:49<3:13:04,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14925/20117 [9:33:51<3:13:34,  2.24s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14926/20117 [9:33:53<3:12:35,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14927/20117 [9:33:55<3:12:54,  2.23s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14928/20117 [9:33:57<3:13:30,  2.24s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14929/20117 [9:34:00<3:20:48,  2.32s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14930/20117 [9:34:02<3:17:44,  2.29s/it]                                                                                                                                 {'loss': 0.1536, 'grad_norm': 0.4078894555568695, 'learning_rate': 3.135765755092854e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 434.52, 'epoch': 1.48}
 74%|████████████████████████████████████████████████████████████▊                     | 14930/20117 [9:34:02<3:17:44,  2.29s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14931/20117 [9:34:04<3:15:26,  2.26s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14932/20117 [9:34:07<3:16:08,  2.27s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14933/20117 [9:34:09<3:15:50,  2.27s/it] 74%|████████████████████████████████████████████████████████████▊                     | 14934/20117 [9:34:11<3:14:52,  2.26s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14935/20117 [9:34:13<3:13:52,  2.24s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14936/20117 [9:34:16<3:11:55,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14937/20117 [9:34:18<3:10:52,  2.21s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14938/20117 [9:34:20<3:09:39,  2.20s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14939/20117 [9:34:22<3:12:22,  2.23s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14940/20117 [9:34:24<3:11:21,  2.22s/it]                                                                                                                                 {'loss': 0.1538, 'grad_norm': 0.5916171669960022, 'learning_rate': 3.1243610512501175e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 439.02, 'epoch': 1.49}
 74%|████████████████████████████████████████████████████████████▉                     | 14940/20117 [9:34:24<3:11:21,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14941/20117 [9:34:27<3:11:34,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14942/20117 [9:34:29<3:12:10,  2.23s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14943/20117 [9:34:31<3:12:48,  2.24s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14944/20117 [9:34:33<3:10:53,  2.21s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14945/20117 [9:34:35<3:09:43,  2.20s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14946/20117 [9:34:38<3:09:20,  2.20s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14947/20117 [9:34:40<3:09:14,  2.20s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14948/20117 [9:34:42<3:10:14,  2.21s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14949/20117 [9:34:44<3:12:27,  2.23s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14950/20117 [9:34:47<3:11:36,  2.22s/it]                                                                                                                                 {'loss': 0.1858, 'grad_norm': 0.4087867736816406, 'learning_rate': 3.1129732835593085e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 284.99, 'epoch': 1.49}
 74%|████████████████████████████████████████████████████████████▉                     | 14950/20117 [9:34:47<3:11:36,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14951/20117 [9:34:49<3:10:07,  2.21s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14952/20117 [9:34:51<3:09:55,  2.21s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14953/20117 [9:34:53<3:10:58,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14954/20117 [9:34:55<3:09:57,  2.21s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14955/20117 [9:34:58<3:11:14,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14956/20117 [9:35:00<3:11:30,  2.23s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14957/20117 [9:35:02<3:10:17,  2.21s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14958/20117 [9:35:04<3:09:32,  2.20s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14959/20117 [9:35:07<3:11:16,  2.23s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14960/20117 [9:35:09<3:10:26,  2.22s/it]                                                                                                                                 {'loss': 0.1251, 'grad_norm': 0.371852308511734, 'learning_rate': 3.101602480070909e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.19, 'epoch': 1.49}
 74%|████████████████████████████████████████████████████████████▉                     | 14960/20117 [9:35:09<3:10:26,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14961/20117 [9:35:11<3:11:38,  2.23s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14962/20117 [9:35:13<3:11:32,  2.23s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14963/20117 [9:35:15<3:10:36,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14964/20117 [9:35:18<3:10:39,  2.22s/it] 74%|████████████████████████████████████████████████████████████▉                     | 14965/20117 [9:35:20<3:09:05,  2.20s/it] 74%|█████████████████████████████████████████████████████████████                     | 14966/20117 [9:35:22<3:11:29,  2.23s/it] 74%|█████████████████████████████████████████████████████████████                     | 14967/20117 [9:35:24<3:12:16,  2.24s/it] 74%|█████████████████████████████████████████████████████████████                     | 14968/20117 [9:35:27<3:10:38,  2.22s/it] 74%|█████████████████████████████████████████████████████████████                     | 14969/20117 [9:35:29<3:10:48,  2.22s/it] 74%|█████████████████████████████████████████████████████████████                     | 14970/20117 [9:35:31<3:11:56,  2.24s/it]                                                                                                                                 {'loss': 0.1941, 'grad_norm': 0.529620349407196, 'learning_rate': 3.0902486687936097e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.15, 'epoch': 1.49}
 74%|█████████████████████████████████████████████████████████████                     | 14970/20117 [9:35:31<3:11:56,  2.24s/it] 74%|█████████████████████████████████████████████████████████████                     | 14971/20117 [9:35:33<3:12:08,  2.24s/it] 74%|█████████████████████████████████████████████████████████████                     | 14972/20117 [9:35:35<3:11:26,  2.23s/it] 74%|█████████████████████████████████████████████████████████████                     | 14973/20117 [9:35:38<3:11:35,  2.23s/it] 74%|█████████████████████████████████████████████████████████████                     | 14974/20117 [9:35:40<3:11:27,  2.23s/it] 74%|█████████████████████████████████████████████████████████████                     | 14975/20117 [9:35:42<3:11:36,  2.24s/it] 74%|█████████████████████████████████████████████████████████████                     | 14976/20117 [9:35:44<3:10:08,  2.22s/it] 74%|█████████████████████████████████████████████████████████████                     | 14977/20117 [9:35:47<3:11:49,  2.24s/it] 74%|█████████████████████████████████████████████████████████████                     | 14978/20117 [9:35:49<3:12:04,  2.24s/it] 74%|█████████████████████████████████████████████████████████████                     | 14979/20117 [9:35:51<3:11:12,  2.23s/it] 74%|█████████████████████████████████████████████████████████████                     | 14980/20117 [9:35:54<3:18:50,  2.32s/it]                                                                                                                                 {'loss': 0.1448, 'grad_norm': 0.21512271463871002, 'learning_rate': 3.0789118776942484e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.83, 'epoch': 1.49}
 74%|█████████████████████████████████████████████████████████████                     | 14980/20117 [9:35:54<3:18:50,  2.32s/it] 74%|█████████████████████████████████████████████████████████████                     | 14981/20117 [9:35:56<3:16:26,  2.29s/it] 74%|█████████████████████████████████████████████████████████████                     | 14982/20117 [9:35:58<3:14:49,  2.28s/it] 74%|█████████████████████████████████████████████████████████████                     | 14983/20117 [9:36:00<3:12:30,  2.25s/it] 74%|█████████████████████████████████████████████████████████████                     | 14984/20117 [9:36:03<3:15:07,  2.28s/it] 74%|█████████████████████████████████████████████████████████████                     | 14985/20117 [9:36:05<3:12:54,  2.26s/it] 74%|█████████████████████████████████████████████████████████████                     | 14986/20117 [9:36:07<3:12:14,  2.25s/it] 74%|█████████████████████████████████████████████████████████████                     | 14987/20117 [9:36:09<3:11:12,  2.24s/it] 75%|█████████████████████████████████████████████████████████████                     | 14988/20117 [9:36:12<3:13:59,  2.27s/it] 75%|█████████████████████████████████████████████████████████████                     | 14989/20117 [9:36:14<3:11:38,  2.24s/it] 75%|█████████████████████████████████████████████████████████████                     | 14990/20117 [9:36:16<3:10:05,  2.22s/it]                                                                                                                                 {'loss': 0.1853, 'grad_norm': 0.6459974050521851, 'learning_rate': 3.067592134697741e-05, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 419.24, 'epoch': 1.49}
 75%|█████████████████████████████████████████████████████████████                     | 14990/20117 [9:36:16<3:10:05,  2.22s/it] 75%|█████████████████████████████████████████████████████████████                     | 14991/20117 [9:36:18<3:09:08,  2.21s/it] 75%|█████████████████████████████████████████████████████████████                     | 14992/20117 [9:36:20<3:10:20,  2.23s/it] 75%|█████████████████████████████████████████████████████████████                     | 14993/20117 [9:36:23<3:09:47,  2.22s/it] 75%|█████████████████████████████████████████████████████████████                     | 14994/20117 [9:36:25<3:10:14,  2.23s/it] 75%|█████████████████████████████████████████████████████████████                     | 14995/20117 [9:36:27<3:11:09,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 14996/20117 [9:36:29<3:10:56,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 14997/20117 [9:36:32<3:11:52,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 14998/20117 [9:36:34<3:10:34,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 14999/20117 [9:36:36<3:10:44,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15000/20117 [9:36:38<3:09:47,  2.23s/it]                                                                                                                                 {'loss': 0.1876, 'grad_norm': 0.6449222564697266, 'learning_rate': 3.0562894676870014e-05, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 426.0, 'epoch': 1.49}
 75%|█████████████████████████████████████████████████████████████▏                    | 15000/20117 [9:36:38<3:09:47,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15001/20117 [9:36:41<3:10:59,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15002/20117 [9:36:43<3:10:26,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15003/20117 [9:36:45<3:09:29,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15004/20117 [9:36:47<3:10:24,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15005/20117 [9:36:49<3:10:02,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15006/20117 [9:36:52<3:11:47,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15007/20117 [9:36:54<3:11:14,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15008/20117 [9:36:56<3:10:21,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15009/20117 [9:36:59<3:12:29,  2.26s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15010/20117 [9:37:01<3:09:37,  2.23s/it]                                                                                                                                 {'loss': 0.1583, 'grad_norm': 0.5701255202293396, 'learning_rate': 3.045003904502891e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.73, 'epoch': 1.49}
 75%|█████████████████████████████████████████████████████████████▏                    | 15010/20117 [9:37:01<3:09:37,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15011/20117 [9:37:03<3:08:51,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15012/20117 [9:37:05<3:08:58,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15013/20117 [9:37:07<3:07:09,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15014/20117 [9:37:10<3:09:14,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15015/20117 [9:37:12<3:08:36,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15016/20117 [9:37:14<3:09:18,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15017/20117 [9:37:16<3:09:02,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15018/20117 [9:37:18<3:08:03,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15019/20117 [9:37:21<3:07:19,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15020/20117 [9:37:23<3:06:35,  2.20s/it]                                                                                                                                 {'loss': 0.2089, 'grad_norm': 0.754385769367218, 'learning_rate': 3.0337354729441338e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.69, 'epoch': 1.49}
 75%|█████████████████████████████████████████████████████████████▏                    | 15020/20117 [9:37:23<3:06:35,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15021/20117 [9:37:25<3:09:11,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15022/20117 [9:37:27<3:08:53,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15023/20117 [9:37:29<3:08:24,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15024/20117 [9:37:32<3:10:16,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15025/20117 [9:37:34<3:10:17,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▏                    | 15026/20117 [9:37:36<3:10:27,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15027/20117 [9:37:39<3:11:03,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15028/20117 [9:37:41<3:10:39,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15029/20117 [9:37:43<3:10:36,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15030/20117 [9:37:45<3:09:21,  2.23s/it]                                                                                                                                 {'loss': 0.1358, 'grad_norm': 0.27737346291542053, 'learning_rate': 3.022484200767264e-05, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 445.86, 'epoch': 1.49}
 75%|█████████████████████████████████████████████████████████████▎                    | 15030/20117 [9:37:45<3:09:21,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15031/20117 [9:37:48<3:16:33,  2.32s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15032/20117 [9:37:50<3:13:06,  2.28s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15033/20117 [9:37:52<3:11:34,  2.26s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15034/20117 [9:37:54<3:12:51,  2.28s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15035/20117 [9:37:57<3:12:19,  2.27s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15036/20117 [9:37:59<3:11:02,  2.26s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15037/20117 [9:38:01<3:09:36,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15038/20117 [9:38:03<3:09:47,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15039/20117 [9:38:06<3:11:09,  2.26s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15040/20117 [9:38:08<3:08:57,  2.23s/it]                                                                                                                                 {'loss': 0.1361, 'grad_norm': 0.35400980710983276, 'learning_rate': 3.0112501156865348e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.91, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▎                    | 15040/20117 [9:38:08<3:08:57,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15041/20117 [9:38:10<3:08:01,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15042/20117 [9:38:12<3:08:29,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15043/20117 [9:38:15<3:09:12,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15044/20117 [9:38:17<3:08:53,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15045/20117 [9:38:19<3:08:57,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15046/20117 [9:38:21<3:08:10,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15047/20117 [9:38:23<3:07:33,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15048/20117 [9:38:26<3:08:05,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15049/20117 [9:38:28<3:07:25,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15050/20117 [9:38:30<3:06:56,  2.21s/it]                                                                                                                                 {'loss': 0.1716, 'grad_norm': 0.703973650932312, 'learning_rate': 3.000033245373881e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.21, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▎                    | 15050/20117 [9:38:30<3:06:56,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15051/20117 [9:38:32<3:07:49,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15052/20117 [9:38:34<3:06:20,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15053/20117 [9:38:37<3:06:47,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15054/20117 [9:38:39<3:08:37,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15055/20117 [9:38:41<3:09:53,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15056/20117 [9:38:43<3:08:36,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▎                    | 15057/20117 [9:38:46<3:08:45,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15058/20117 [9:38:48<3:09:03,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15059/20117 [9:38:50<3:09:54,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15060/20117 [9:38:53<3:09:22,  2.25s/it]                                                                                                                                 {'loss': 0.1148, 'grad_norm': 0.47651368379592896, 'learning_rate': 2.988833617458816e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 311.35, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▍                    | 15060/20117 [9:38:53<3:09:22,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15061/20117 [9:38:55<3:09:51,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15062/20117 [9:38:57<3:10:42,  2.26s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15063/20117 [9:38:59<3:11:58,  2.28s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15064/20117 [9:39:02<3:10:06,  2.26s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15065/20117 [9:39:04<3:08:25,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15066/20117 [9:39:06<3:07:18,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15067/20117 [9:39:08<3:06:52,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15068/20117 [9:39:10<3:07:30,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15069/20117 [9:39:13<3:06:48,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15070/20117 [9:39:15<3:06:50,  2.22s/it]                                                                                                                                 {'loss': 0.1478, 'grad_norm': 0.626930296421051, 'learning_rate': 2.977651259528399e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.12, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▍                    | 15070/20117 [9:39:15<3:06:50,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15071/20117 [9:39:17<3:07:08,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15072/20117 [9:39:19<3:06:50,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15073/20117 [9:39:22<3:08:13,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15074/20117 [9:39:24<3:07:55,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15075/20117 [9:39:26<3:06:54,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15076/20117 [9:39:28<3:07:32,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15077/20117 [9:39:31<3:08:10,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15078/20117 [9:39:33<3:09:16,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15079/20117 [9:39:35<3:08:23,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15080/20117 [9:39:37<3:07:21,  2.23s/it]                                                                                                                                 {'loss': 0.158, 'grad_norm': 0.5578542351722717, 'learning_rate': 2.9664861991271343e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 428.74, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▍                    | 15080/20117 [9:39:37<3:07:21,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15081/20117 [9:39:39<3:06:07,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15082/20117 [9:39:42<3:05:14,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15083/20117 [9:39:44<3:07:17,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15084/20117 [9:39:46<3:07:11,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15085/20117 [9:39:49<3:13:40,  2.31s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15086/20117 [9:39:51<3:11:38,  2.29s/it] 75%|█████████████████████████████████████████████████████████████▍                    | 15087/20117 [9:39:53<3:09:59,  2.27s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15088/20117 [9:39:55<3:08:02,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15089/20117 [9:39:57<3:07:46,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15090/20117 [9:40:00<3:06:10,  2.22s/it]                                                                                                                                 {'loss': 0.1047, 'grad_norm': 0.3915964961051941, 'learning_rate': 2.9553384637569282e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.43, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▌                    | 15090/20117 [9:40:00<3:06:10,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15091/20117 [9:40:02<3:05:53,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15092/20117 [9:40:04<3:05:31,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15093/20117 [9:40:06<3:06:23,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15094/20117 [9:40:09<3:08:29,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15095/20117 [9:40:11<3:07:11,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15096/20117 [9:40:13<3:06:49,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15097/20117 [9:40:15<3:05:44,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15098/20117 [9:40:18<3:06:51,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15099/20117 [9:40:20<3:06:23,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15100/20117 [9:40:22<3:05:34,  2.22s/it]                                                                                                                                 {'loss': 0.1921, 'grad_norm': 0.6796404123306274, 'learning_rate': 2.944208080877008e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.42, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▌                    | 15100/20117 [9:40:22<3:05:34,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15101/20117 [9:40:24<3:05:33,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15102/20117 [9:40:26<3:04:59,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15103/20117 [9:40:29<3:04:44,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15104/20117 [9:40:31<3:04:54,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15105/20117 [9:40:33<3:05:04,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15106/20117 [9:40:35<3:04:15,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15107/20117 [9:40:37<3:03:42,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15108/20117 [9:40:40<3:04:44,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15109/20117 [9:40:42<3:04:05,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15110/20117 [9:40:44<3:03:38,  2.20s/it]                                                                                                                                 {'loss': 0.1629, 'grad_norm': 0.3962237238883972, 'learning_rate': 2.933095077903861e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.99, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▌                    | 15110/20117 [9:40:44<3:03:38,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15111/20117 [9:40:46<3:03:30,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15112/20117 [9:40:48<3:03:09,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15113/20117 [9:40:51<3:06:26,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15114/20117 [9:40:53<3:05:15,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15115/20117 [9:40:55<3:05:49,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15116/20117 [9:40:57<3:04:57,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15117/20117 [9:41:00<3:06:47,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▌                    | 15118/20117 [9:41:02<3:08:31,  2.26s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15119/20117 [9:41:04<3:08:51,  2.27s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15120/20117 [9:41:06<3:07:26,  2.25s/it]                                                                                                                                 {'loss': 0.1593, 'grad_norm': 1.0471795797348022, 'learning_rate': 2.921999482211165e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.74, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▋                    | 15120/20117 [9:41:06<3:07:26,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15121/20117 [9:41:09<3:05:54,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15122/20117 [9:41:11<3:06:12,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15123/20117 [9:41:13<3:06:39,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15124/20117 [9:41:15<3:06:23,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15125/20117 [9:41:18<3:04:57,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15126/20117 [9:41:20<3:06:04,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15127/20117 [9:41:22<3:04:40,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15128/20117 [9:41:24<3:06:56,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15129/20117 [9:41:27<3:06:05,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15130/20117 [9:41:29<3:04:21,  2.22s/it]                                                                                                                                 {'loss': 0.1913, 'grad_norm': 0.3305923640727997, 'learning_rate': 2.9109213211297103e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.82, 'epoch': 1.5}
 75%|█████████████████████████████████████████████████████████████▋                    | 15130/20117 [9:41:29<3:04:21,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15131/20117 [9:41:31<3:03:36,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15132/20117 [9:41:33<3:04:33,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15133/20117 [9:41:35<3:03:14,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15134/20117 [9:41:38<3:05:50,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15135/20117 [9:41:40<3:05:36,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15136/20117 [9:41:42<3:05:01,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15137/20117 [9:41:44<3:05:28,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15138/20117 [9:41:47<3:06:12,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15139/20117 [9:41:49<3:12:38,  2.32s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15140/20117 [9:41:51<3:10:39,  2.30s/it]                                                                                                                                 {'loss': 0.1741, 'grad_norm': 0.5967475771903992, 'learning_rate': 2.8998606219473555e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.88, 'epoch': 1.51}
 75%|█████████████████████████████████████████████████████████████▋                    | 15140/20117 [9:41:51<3:10:39,  2.30s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15141/20117 [9:41:54<3:09:08,  2.28s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15142/20117 [9:41:56<3:06:56,  2.25s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15143/20117 [9:41:58<3:05:01,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15144/20117 [9:42:00<3:05:09,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15145/20117 [9:42:02<3:05:39,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15146/20117 [9:42:05<3:05:19,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15147/20117 [9:42:07<3:04:14,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15148/20117 [9:42:09<3:04:01,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▋                    | 15149/20117 [9:42:11<3:04:00,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15150/20117 [9:42:14<3:05:38,  2.24s/it]                                                                                                                                 {'loss': 0.1523, 'grad_norm': 0.5528421998023987, 'learning_rate': 2.888817411908935e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.43, 'epoch': 1.51}
 75%|█████████████████████████████████████████████████████████████▊                    | 15150/20117 [9:42:14<3:05:38,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15151/20117 [9:42:16<3:05:12,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15152/20117 [9:42:18<3:03:41,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15153/20117 [9:42:20<3:02:47,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15154/20117 [9:42:22<3:04:11,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15155/20117 [9:42:25<3:03:28,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15156/20117 [9:42:27<3:02:32,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15157/20117 [9:42:29<3:02:02,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15158/20117 [9:42:31<3:01:44,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15159/20117 [9:42:33<3:02:12,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15160/20117 [9:42:36<3:02:09,  2.20s/it]                                                                                                                                 {'loss': 0.1434, 'grad_norm': 0.6192378997802734, 'learning_rate': 2.877791718216214e-05, 'memory/max_active (GiB)': 18.16, 'memory/max_allocated (GiB)': 18.16, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 300.91, 'epoch': 1.51}
 75%|█████████████████████████████████████████████████████████████▊                    | 15160/20117 [9:42:36<3:02:09,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15161/20117 [9:42:38<3:01:38,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15162/20117 [9:42:40<3:01:48,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15163/20117 [9:42:42<3:02:00,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15164/20117 [9:42:44<3:02:32,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15165/20117 [9:42:47<3:04:32,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15166/20117 [9:42:49<3:04:01,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15167/20117 [9:42:51<3:03:12,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15168/20117 [9:42:53<3:03:17,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15169/20117 [9:42:56<3:02:08,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15170/20117 [9:42:58<3:03:17,  2.22s/it]                                                                                                                                 {'loss': 0.1334, 'grad_norm': 0.2290807068347931, 'learning_rate': 2.866783568027802e-05, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.23, 'epoch': 1.51}
 75%|█████████████████████████████████████████████████████████████▊                    | 15170/20117 [9:42:58<3:03:17,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15171/20117 [9:43:00<3:02:25,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15172/20117 [9:43:02<3:01:42,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15173/20117 [9:43:04<3:02:36,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15174/20117 [9:43:07<3:03:36,  2.23s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15175/20117 [9:43:09<3:03:06,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15176/20117 [9:43:11<3:03:04,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15177/20117 [9:43:13<3:02:46,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15178/20117 [9:43:16<3:02:48,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▊                    | 15179/20117 [9:43:18<3:01:50,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15180/20117 [9:43:20<3:01:37,  2.21s/it]                                                                                                                                 {'loss': 0.183, 'grad_norm': 0.748935341835022, 'learning_rate': 2.8557929884591038e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.48, 'epoch': 1.51}
 75%|█████████████████████████████████████████████████████████████▉                    | 15180/20117 [9:43:20<3:01:37,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15181/20117 [9:43:22<3:02:24,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15182/20117 [9:43:24<3:01:45,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15183/20117 [9:43:27<3:02:41,  2.22s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15184/20117 [9:43:29<3:02:00,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15185/20117 [9:43:31<3:01:08,  2.20s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15186/20117 [9:43:33<3:01:54,  2.21s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15187/20117 [9:43:36<3:03:55,  2.24s/it] 75%|█████████████████████████████████████████████████████████████▉                    | 15188/20117 [9:43:38<3:03:37,  2.24s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15189/20117 [9:43:40<3:02:28,  2.22s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15190/20117 [9:43:42<3:02:54,  2.23s/it]                                                                                                                                 {'loss': 0.1251, 'grad_norm': 0.6596253514289856, 'learning_rate': 2.844820006582235e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.83, 'epoch': 1.51}
 76%|█████████████████████████████████████████████████████████████▉                    | 15190/20117 [9:43:42<3:02:54,  2.23s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15191/20117 [9:43:44<3:01:56,  2.22s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15192/20117 [9:43:47<3:12:01,  2.34s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15193/20117 [9:43:49<3:10:53,  2.33s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15194/20117 [9:43:52<3:10:09,  2.32s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15195/20117 [9:43:54<3:07:14,  2.28s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15196/20117 [9:43:56<3:05:12,  2.26s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15197/20117 [9:43:58<3:05:03,  2.26s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15198/20117 [9:44:00<3:03:45,  2.24s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15199/20117 [9:44:03<3:03:06,  2.23s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15200/20117 [9:44:05<3:02:33,  2.23s/it]                                                                                                                                 {'loss': 0.1731, 'grad_norm': 0.5581555962562561, 'learning_rate': 2.8338646494259746e-05, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.43, 'epoch': 1.51}
 76%|█████████████████████████████████████████████████████████████▉                    | 15200/20117 [9:44:05<3:02:33,  2.23s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15201/20117 [9:44:07<3:00:55,  2.21s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15202/20117 [9:44:09<3:04:43,  2.26s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15203/20117 [9:44:12<3:05:44,  2.27s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15204/20117 [9:44:14<3:05:15,  2.26s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15205/20117 [9:44:16<3:05:03,  2.26s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15206/20117 [9:44:19<3:06:25,  2.28s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15207/20117 [9:44:21<3:04:30,  2.25s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15208/20117 [9:44:23<3:04:08,  2.25s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15209/20117 [9:44:25<3:02:47,  2.23s/it] 76%|█████████████████████████████████████████████████████████████▉                    | 15210/20117 [9:44:27<3:02:58,  2.24s/it]                                                                                                                                 {'loss': 0.1252, 'grad_norm': 0.525514543056488, 'learning_rate': 2.8229269439756768e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 398.56, 'epoch': 1.51}
 76%|█████████████████████████████████████████████████████████████▉                    | 15210/20117 [9:44:27<3:02:58,  2.24s/it] 76%|██████████████████████████████████████████████████████████████                    | 15211/20117 [9:44:30<3:01:26,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15212/20117 [9:44:32<3:01:05,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15213/20117 [9:44:34<3:02:22,  2.23s/it] 76%|██████████████████████████████████████████████████████████████                    | 15214/20117 [9:44:36<3:01:28,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15215/20117 [9:44:38<3:00:55,  2.21s/it] 76%|██████████████████████████████████████████████████████████████                    | 15216/20117 [9:44:41<3:03:56,  2.25s/it] 76%|██████████████████████████████████████████████████████████████                    | 15217/20117 [9:44:43<3:03:05,  2.24s/it] 76%|██████████████████████████████████████████████████████████████                    | 15218/20117 [9:44:45<3:01:47,  2.23s/it] 76%|██████████████████████████████████████████████████████████████                    | 15219/20117 [9:44:47<3:02:02,  2.23s/it] 76%|██████████████████████████████████████████████████████████████                    | 15220/20117 [9:44:50<3:01:31,  2.22s/it]                                                                                                                                 {'loss': 0.1561, 'grad_norm': 0.5555036664009094, 'learning_rate': 2.812006917173229e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.58, 'epoch': 1.51}
 76%|██████████████████████████████████████████████████████████████                    | 15220/20117 [9:44:50<3:01:31,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15221/20117 [9:44:52<3:01:33,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15222/20117 [9:44:54<3:01:17,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15223/20117 [9:44:56<3:01:23,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15224/20117 [9:44:59<3:00:37,  2.21s/it] 76%|██████████████████████████████████████████████████████████████                    | 15225/20117 [9:45:01<2:59:36,  2.20s/it] 76%|██████████████████████████████████████████████████████████████                    | 15226/20117 [9:45:03<2:59:21,  2.20s/it] 76%|██████████████████████████████████████████████████████████████                    | 15227/20117 [9:45:05<3:01:45,  2.23s/it] 76%|██████████████████████████████████████████████████████████████                    | 15228/20117 [9:45:08<3:04:00,  2.26s/it] 76%|██████████████████████████████████████████████████████████████                    | 15229/20117 [9:45:10<3:03:58,  2.26s/it] 76%|██████████████████████████████████████████████████████████████                    | 15230/20117 [9:45:12<3:03:42,  2.26s/it]                                                                                                                                 {'loss': 0.1428, 'grad_norm': 0.4960024058818817, 'learning_rate': 2.801104595916957e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.86, 'epoch': 1.51}
 76%|██████████████████████████████████████████████████████████████                    | 15230/20117 [9:45:12<3:03:42,  2.26s/it] 76%|██████████████████████████████████████████████████████████████                    | 15231/20117 [9:45:14<3:01:48,  2.23s/it] 76%|██████████████████████████████████████████████████████████████                    | 15232/20117 [9:45:16<3:00:58,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15233/20117 [9:45:19<3:02:28,  2.24s/it] 76%|██████████████████████████████████████████████████████████████                    | 15234/20117 [9:45:21<3:01:28,  2.23s/it] 76%|██████████████████████████████████████████████████████████████                    | 15235/20117 [9:45:23<3:01:42,  2.23s/it] 76%|██████████████████████████████████████████████████████████████                    | 15236/20117 [9:45:25<3:00:24,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15237/20117 [9:45:28<3:00:04,  2.21s/it] 76%|██████████████████████████████████████████████████████████████                    | 15238/20117 [9:45:30<3:00:24,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15239/20117 [9:45:32<3:00:15,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15240/20117 [9:45:34<3:00:03,  2.22s/it]                                                                                                                                 {'loss': 0.2009, 'grad_norm': 0.6800406575202942, 'learning_rate': 2.7902200070615868e-05, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.33, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████                    | 15240/20117 [9:45:34<3:00:03,  2.22s/it] 76%|██████████████████████████████████████████████████████████████                    | 15241/20117 [9:45:36<3:00:56,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15242/20117 [9:45:39<3:01:01,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15243/20117 [9:45:41<3:02:49,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15244/20117 [9:45:43<3:02:28,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15245/20117 [9:45:45<3:00:51,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15246/20117 [9:45:48<3:00:57,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15247/20117 [9:45:50<3:06:36,  2.30s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15248/20117 [9:45:52<3:05:33,  2.29s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15249/20117 [9:45:55<3:03:07,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15250/20117 [9:45:57<3:02:46,  2.25s/it]                                                                                                                                 {'loss': 0.1421, 'grad_norm': 0.5688208341598511, 'learning_rate': 2.7793531774181614e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.32, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▏                   | 15250/20117 [9:45:57<3:02:46,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15251/20117 [9:45:59<3:01:00,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15252/20117 [9:46:01<3:00:55,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15253/20117 [9:46:03<3:01:25,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15254/20117 [9:46:06<3:00:21,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15255/20117 [9:46:08<3:00:25,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15256/20117 [9:46:10<3:01:46,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15257/20117 [9:46:12<3:02:53,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15258/20117 [9:46:15<3:02:00,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15259/20117 [9:46:17<3:00:39,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15260/20117 [9:46:19<2:59:21,  2.22s/it]                                                                                                                                 {'loss': 0.1442, 'grad_norm': 0.6293416619300842, 'learning_rate': 2.7685041337539786e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.42, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▏                   | 15260/20117 [9:46:19<2:59:21,  2.22s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15261/20117 [9:46:21<2:58:31,  2.21s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15262/20117 [9:46:23<2:58:17,  2.20s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15263/20117 [9:46:26<2:58:06,  2.20s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15264/20117 [9:46:28<2:58:26,  2.21s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15265/20117 [9:46:30<2:57:52,  2.20s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15266/20117 [9:46:32<3:00:03,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15267/20117 [9:46:34<2:58:48,  2.21s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15268/20117 [9:46:37<3:00:04,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15269/20117 [9:46:39<3:00:33,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15270/20117 [9:46:41<2:59:16,  2.22s/it]                                                                                                                                 {'loss': 0.1386, 'grad_norm': 0.5825393795967102, 'learning_rate': 2.7576729027925286e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.81, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▏                   | 15270/20117 [9:46:41<2:59:16,  2.22s/it] 76%|██████████████████████████████████████████████████████████████▏                   | 15271/20117 [9:46:43<2:58:24,  2.21s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15272/20117 [9:46:46<2:58:13,  2.21s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15273/20117 [9:46:48<2:59:14,  2.22s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15274/20117 [9:46:50<2:59:16,  2.22s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15275/20117 [9:46:52<2:59:16,  2.22s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15276/20117 [9:46:54<2:58:48,  2.22s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15277/20117 [9:46:57<2:59:43,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15278/20117 [9:46:59<2:59:32,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15279/20117 [9:47:01<3:01:05,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15280/20117 [9:47:04<3:01:52,  2.26s/it]                                                                                                                                 {'loss': 0.1603, 'grad_norm': 0.37790921330451965, 'learning_rate': 2.7468595112134165e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.02, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▎                   | 15280/20117 [9:47:04<3:01:52,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15281/20117 [9:47:06<3:00:41,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15282/20117 [9:47:08<3:02:16,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15283/20117 [9:47:10<3:02:10,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15284/20117 [9:47:13<3:01:28,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15285/20117 [9:47:15<3:00:08,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15286/20117 [9:47:17<3:00:32,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15287/20117 [9:47:19<3:00:16,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15288/20117 [9:47:21<2:59:49,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15289/20117 [9:47:24<2:59:47,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15290/20117 [9:47:26<3:00:04,  2.24s/it]                                                                                                                                 {'loss': 0.1681, 'grad_norm': 0.4327569603919983, 'learning_rate': 2.7360639856523172e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.19, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▎                   | 15290/20117 [9:47:26<3:00:04,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15291/20117 [9:47:28<3:00:21,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15292/20117 [9:47:30<3:01:59,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15293/20117 [9:47:33<3:02:36,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15294/20117 [9:47:35<3:03:05,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15295/20117 [9:47:37<3:02:45,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15296/20117 [9:47:40<3:01:46,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15297/20117 [9:47:42<3:01:12,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15298/20117 [9:47:44<3:07:31,  2.33s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15299/20117 [9:47:47<3:04:50,  2.30s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15300/20117 [9:47:49<3:03:17,  2.28s/it]                                                                                                                                 {'loss': 0.1851, 'grad_norm': 0.272372841835022, 'learning_rate': 2.7252863527008867e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.93, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▎                   | 15300/20117 [9:47:49<3:03:17,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15301/20117 [9:47:51<3:03:07,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▎                   | 15302/20117 [9:47:53<3:02:36,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15303/20117 [9:47:56<3:03:03,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15304/20117 [9:47:58<3:01:21,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15305/20117 [9:48:00<3:03:00,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15306/20117 [9:48:02<3:02:49,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15307/20117 [9:48:05<3:01:28,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15308/20117 [9:48:07<3:00:55,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15309/20117 [9:48:09<3:01:09,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15310/20117 [9:48:11<3:01:52,  2.27s/it]                                                                                                                                 {'loss': 0.1504, 'grad_norm': 0.40148428082466125, 'learning_rate': 2.7145266389067182e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.6, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▍                   | 15310/20117 [9:48:11<3:01:52,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15311/20117 [9:48:14<3:01:14,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15312/20117 [9:48:16<3:00:52,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15313/20117 [9:48:18<3:01:11,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15314/20117 [9:48:20<3:00:09,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15315/20117 [9:48:23<3:00:46,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15316/20117 [9:48:25<3:00:30,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15317/20117 [9:48:27<3:00:29,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15318/20117 [9:48:30<3:01:23,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15319/20117 [9:48:32<3:00:17,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15320/20117 [9:48:34<3:01:03,  2.26s/it]                                                                                                                                 {'loss': 0.1614, 'grad_norm': 0.6262397170066833, 'learning_rate': 2.703784870773255e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 431.35, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▍                   | 15320/20117 [9:48:34<3:01:03,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15321/20117 [9:48:36<3:00:46,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15322/20117 [9:48:39<3:01:33,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15323/20117 [9:48:41<3:02:06,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15324/20117 [9:48:43<3:01:08,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15325/20117 [9:48:45<2:59:31,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15326/20117 [9:48:48<2:59:16,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15327/20117 [9:48:50<2:59:04,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15328/20117 [9:48:52<2:59:46,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15329/20117 [9:48:54<3:00:02,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15330/20117 [9:48:57<2:59:50,  2.25s/it]                                                                                                                                 {'loss': 0.1573, 'grad_norm': 0.3884369730949402, 'learning_rate': 2.6930610747597483e-05, 'memory/max_active (GiB)': 19.1, 'memory/max_allocated (GiB)': 19.1, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 410.18, 'epoch': 1.52}
 76%|██████████████████████████████████████████████████████████████▍                   | 15330/20117 [9:48:57<2:59:50,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15331/20117 [9:48:59<2:59:12,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15332/20117 [9:49:01<2:59:54,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▍                   | 15333/20117 [9:49:03<2:58:56,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15334/20117 [9:49:06<2:58:29,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15335/20117 [9:49:08<2:58:04,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15336/20117 [9:49:10<2:58:19,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15337/20117 [9:49:12<2:59:31,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15338/20117 [9:49:15<2:58:27,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15339/20117 [9:49:17<2:58:13,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15340/20117 [9:49:19<2:57:25,  2.23s/it]                                                                                                                                 {'loss': 0.1303, 'grad_norm': 0.285769522190094, 'learning_rate': 2.682355277281169e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.47, 'epoch': 1.53}
 76%|██████████████████████████████████████████████████████████████▌                   | 15340/20117 [9:49:19<2:57:25,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15341/20117 [9:49:21<2:57:32,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15342/20117 [9:49:23<2:57:27,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15343/20117 [9:49:26<2:58:11,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15344/20117 [9:49:28<2:58:08,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15345/20117 [9:49:30<2:59:18,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15346/20117 [9:49:33<3:00:29,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15347/20117 [9:49:35<3:02:37,  2.30s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15348/20117 [9:49:37<3:01:59,  2.29s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15349/20117 [9:49:39<2:59:56,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15350/20117 [9:49:42<3:00:14,  2.27s/it]                                                                                                                                 {'loss': 0.1356, 'grad_norm': 0.8329140543937683, 'learning_rate': 2.671667504708163e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.69, 'epoch': 1.53}
 76%|██████████████████████████████████████████████████████████████▌                   | 15350/20117 [9:49:42<3:00:14,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15351/20117 [9:49:44<3:07:44,  2.36s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15352/20117 [9:49:47<3:05:43,  2.34s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15353/20117 [9:49:49<3:03:48,  2.32s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15354/20117 [9:49:51<3:02:43,  2.30s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15355/20117 [9:49:53<3:02:11,  2.30s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15356/20117 [9:49:56<3:01:24,  2.29s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15357/20117 [9:49:58<3:00:37,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15358/20117 [9:50:00<2:59:59,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15359/20117 [9:50:02<2:58:34,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15360/20117 [9:50:05<2:58:35,  2.25s/it]                                                                                                                                 {'loss': 0.1665, 'grad_norm': 0.40556618571281433, 'learning_rate': 2.6609977833669686e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.84, 'epoch': 1.53}
 76%|██████████████████████████████████████████████████████████████▌                   | 15360/20117 [9:50:05<2:58:35,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15361/20117 [9:50:07<2:57:32,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15362/20117 [9:50:09<2:58:07,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▌                   | 15363/20117 [9:50:11<2:59:27,  2.27s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15364/20117 [9:50:14<2:59:14,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15365/20117 [9:50:16<3:00:28,  2.28s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15366/20117 [9:50:18<2:59:15,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15367/20117 [9:50:20<2:59:09,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15368/20117 [9:50:23<2:59:09,  2.26s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15369/20117 [9:50:25<2:57:48,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15370/20117 [9:50:27<2:57:26,  2.24s/it]                                                                                                                                 {'loss': 0.1838, 'grad_norm': 0.23340922594070435, 'learning_rate': 2.650346139539368e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.68, 'epoch': 1.53}
 76%|██████████████████████████████████████████████████████████████▋                   | 15370/20117 [9:50:27<2:57:26,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15371/20117 [9:50:29<2:56:58,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15372/20117 [9:50:32<2:56:55,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15373/20117 [9:50:34<2:57:07,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15374/20117 [9:50:36<2:56:52,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15375/20117 [9:50:38<2:57:14,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15376/20117 [9:50:41<2:56:45,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15377/20117 [9:50:43<2:57:16,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15378/20117 [9:50:45<2:56:28,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15379/20117 [9:50:47<2:57:13,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15380/20117 [9:50:50<2:57:57,  2.25s/it]                                                                                                                                 {'loss': 0.1513, 'grad_norm': 0.4838123917579651, 'learning_rate': 2.6397125994626128e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.9, 'epoch': 1.53}
 76%|██████████████████████████████████████████████████████████████▋                   | 15380/20117 [9:50:50<2:57:57,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15381/20117 [9:50:52<2:56:40,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15382/20117 [9:50:54<2:56:22,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15383/20117 [9:50:56<2:55:47,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15384/20117 [9:50:58<2:56:36,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15385/20117 [9:51:01<2:56:22,  2.24s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15386/20117 [9:51:03<2:56:07,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15387/20117 [9:51:05<2:55:47,  2.23s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15388/20117 [9:51:07<2:57:29,  2.25s/it] 76%|██████████████████████████████████████████████████████████████▋                   | 15389/20117 [9:51:10<2:56:25,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▋                   | 15390/20117 [9:51:12<2:56:57,  2.25s/it]                                                                                                                                 {'loss': 0.1673, 'grad_norm': 0.6558499932289124, 'learning_rate': 2.6290971893293547e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.68, 'epoch': 1.53}
 77%|██████████████████████████████████████████████████████████████▋                   | 15390/20117 [9:51:12<2:56:57,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▋                   | 15391/20117 [9:51:14<2:58:52,  2.27s/it] 77%|██████████████████████████████████████████████████████████████▋                   | 15392/20117 [9:51:16<2:58:05,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▋                   | 15393/20117 [9:51:19<2:58:02,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▋                   | 15394/20117 [9:51:21<2:57:55,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15395/20117 [9:51:23<2:57:00,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15396/20117 [9:51:25<2:56:24,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15397/20117 [9:51:28<2:55:36,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15398/20117 [9:51:30<2:55:03,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15399/20117 [9:51:32<2:55:52,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15400/20117 [9:51:34<2:56:26,  2.24s/it]                                                                                                                                 {'loss': 0.1269, 'grad_norm': 0.7235273718833923, 'learning_rate': 2.618499935287595e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.78, 'epoch': 1.53}
 77%|██████████████████████████████████████████████████████████████▊                   | 15400/20117 [9:51:34<2:56:26,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15401/20117 [9:51:37<2:57:06,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15402/20117 [9:51:39<2:57:30,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15403/20117 [9:51:41<2:59:07,  2.28s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15404/20117 [9:51:44<2:58:28,  2.27s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15405/20117 [9:51:46<2:57:51,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15406/20117 [9:51:48<3:04:10,  2.35s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15407/20117 [9:51:51<3:01:31,  2.31s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15408/20117 [9:51:53<2:59:19,  2.28s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15409/20117 [9:51:55<2:57:34,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15410/20117 [9:51:57<2:57:02,  2.26s/it]                                                                                                                                 {'loss': 0.1649, 'grad_norm': 2.2355356216430664, 'learning_rate': 2.6079208634406106e-05, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 418.05, 'epoch': 1.53}
 77%|██████████████████████████████████████████████████████████████▊                   | 15410/20117 [9:51:57<2:57:02,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15411/20117 [9:51:59<2:56:12,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15412/20117 [9:52:02<2:56:34,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15413/20117 [9:52:04<2:55:47,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15414/20117 [9:52:06<2:56:25,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15415/20117 [9:52:08<2:57:06,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15416/20117 [9:52:11<2:56:47,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15417/20117 [9:52:13<2:57:21,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15418/20117 [9:52:15<2:55:57,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15419/20117 [9:52:17<2:56:13,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15420/20117 [9:52:20<2:57:40,  2.27s/it]                                                                                                                                 {'loss': 0.1413, 'grad_norm': 0.2927922010421753, 'learning_rate': 2.5973599998468935e-05, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.29, 'epoch': 1.53}
 77%|██████████████████████████████████████████████████████████████▊                   | 15420/20117 [9:52:20<2:57:40,  2.27s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15421/20117 [9:52:22<2:56:37,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15422/20117 [9:52:24<2:55:25,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15423/20117 [9:52:26<2:55:04,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15424/20117 [9:52:29<2:55:35,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▊                   | 15425/20117 [9:52:31<2:55:15,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15426/20117 [9:52:33<2:54:40,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15427/20117 [9:52:35<2:55:28,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15428/20117 [9:52:38<2:56:36,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15429/20117 [9:52:40<2:55:45,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15430/20117 [9:52:42<2:54:52,  2.24s/it]                                                                                                                                 {'loss': 0.1356, 'grad_norm': 0.3300843834877014, 'learning_rate': 2.586817370520077e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.42, 'epoch': 1.53}
 77%|██████████████████████████████████████████████████████████████▉                   | 15430/20117 [9:52:42<2:54:52,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15431/20117 [9:52:44<2:54:44,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15432/20117 [9:52:47<2:56:00,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15433/20117 [9:52:49<2:54:51,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15434/20117 [9:52:51<2:54:07,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15435/20117 [9:52:53<2:54:14,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15436/20117 [9:52:56<2:54:42,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15437/20117 [9:52:58<2:54:05,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15438/20117 [9:53:00<2:54:51,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15439/20117 [9:53:02<2:54:17,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15440/20117 [9:53:05<2:53:48,  2.23s/it]                                                                                                                                 {'loss': 0.1307, 'grad_norm': 0.5750816464424133, 'learning_rate': 2.5762930014288933e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.56, 'epoch': 1.53}
 77%|██████████████████████████████████████████████████████████████▉                   | 15440/20117 [9:53:05<2:53:48,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15441/20117 [9:53:07<2:53:31,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15442/20117 [9:53:09<2:53:56,  2.23s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15443/20117 [9:53:11<2:56:00,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15444/20117 [9:53:13<2:54:32,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15445/20117 [9:53:16<2:56:04,  2.26s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15446/20117 [9:53:18<2:54:36,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15447/20117 [9:53:20<2:54:17,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15448/20117 [9:53:23<2:55:02,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15449/20117 [9:53:25<2:54:20,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15450/20117 [9:53:27<2:54:14,  2.24s/it]                                                                                                                                 {'loss': 0.1837, 'grad_norm': 0.7192912101745605, 'learning_rate': 2.5657869184970795e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.13, 'epoch': 1.54}
 77%|██████████████████████████████████████████████████████████████▉                   | 15450/20117 [9:53:27<2:54:14,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15451/20117 [9:53:29<2:53:59,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15452/20117 [9:53:31<2:54:28,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15453/20117 [9:53:34<2:55:13,  2.25s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15454/20117 [9:53:36<2:53:46,  2.24s/it] 77%|██████████████████████████████████████████████████████████████▉                   | 15455/20117 [9:53:38<2:53:03,  2.23s/it] 77%|███████████████████████████████████████████████████████████████                   | 15456/20117 [9:53:40<2:54:18,  2.24s/it] 77%|███████████████████████████████████████████████████████████████                   | 15457/20117 [9:53:43<2:53:23,  2.23s/it] 77%|███████████████████████████████████████████████████████████████                   | 15458/20117 [9:53:45<2:54:57,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15459/20117 [9:53:47<3:02:06,  2.35s/it] 77%|███████████████████████████████████████████████████████████████                   | 15460/20117 [9:53:50<2:59:00,  2.31s/it]                                                                                                                                 {'loss': 0.1978, 'grad_norm': 0.7183093428611755, 'learning_rate': 2.555299147603345e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.76, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████                   | 15460/20117 [9:53:50<2:59:00,  2.31s/it] 77%|███████████████████████████████████████████████████████████████                   | 15461/20117 [9:53:52<2:59:11,  2.31s/it] 77%|███████████████████████████████████████████████████████████████                   | 15462/20117 [9:53:54<2:57:18,  2.29s/it] 77%|███████████████████████████████████████████████████████████████                   | 15463/20117 [9:53:57<2:57:04,  2.28s/it] 77%|███████████████████████████████████████████████████████████████                   | 15464/20117 [9:53:59<2:55:10,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15465/20117 [9:54:01<2:54:13,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15466/20117 [9:54:03<2:54:18,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15467/20117 [9:54:05<2:53:21,  2.24s/it] 77%|███████████████████████████████████████████████████████████████                   | 15468/20117 [9:54:08<2:52:52,  2.23s/it] 77%|███████████████████████████████████████████████████████████████                   | 15469/20117 [9:54:10<2:55:08,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15470/20117 [9:54:12<2:55:22,  2.26s/it]                                                                                                                                 {'loss': 0.1732, 'grad_norm': 0.27620071172714233, 'learning_rate': 2.5448297145812805e-05, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.71, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████                   | 15470/20117 [9:54:12<2:55:22,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15471/20117 [9:54:14<2:53:59,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15472/20117 [9:54:17<2:53:45,  2.24s/it] 77%|███████████████████████████████████████████████████████████████                   | 15473/20117 [9:54:19<2:53:06,  2.24s/it] 77%|███████████████████████████████████████████████████████████████                   | 15474/20117 [9:54:21<2:54:17,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15475/20117 [9:54:23<2:54:06,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15476/20117 [9:54:26<2:53:33,  2.24s/it] 77%|███████████████████████████████████████████████████████████████                   | 15477/20117 [9:54:28<2:53:49,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15478/20117 [9:54:30<2:54:43,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15479/20117 [9:54:32<2:54:33,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15480/20117 [9:54:35<2:54:37,  2.26s/it]                                                                                                                                 {'loss': 0.1229, 'grad_norm': 0.3960123062133789, 'learning_rate': 2.5343786452193185e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.22, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████                   | 15480/20117 [9:54:35<2:54:37,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15481/20117 [9:54:37<2:53:51,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15482/20117 [9:54:39<2:54:22,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15483/20117 [9:54:41<2:53:48,  2.25s/it] 77%|███████████████████████████████████████████████████████████████                   | 15484/20117 [9:54:44<2:54:15,  2.26s/it] 77%|███████████████████████████████████████████████████████████████                   | 15485/20117 [9:54:46<2:54:53,  2.27s/it] 77%|███████████████████████████████████████████████████████████████                   | 15486/20117 [9:54:48<2:54:21,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15487/20117 [9:54:51<2:54:33,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15488/20117 [9:54:53<2:56:13,  2.28s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15489/20117 [9:54:55<2:54:50,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15490/20117 [9:54:57<2:54:21,  2.26s/it]                                                                                                                                 {'loss': 0.1504, 'grad_norm': 0.4011591970920563, 'learning_rate': 2.5239459652606457e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 421.47, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████▏                  | 15490/20117 [9:54:57<2:54:21,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15491/20117 [9:55:00<2:53:27,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15492/20117 [9:55:02<2:54:15,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15493/20117 [9:55:04<2:54:34,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15494/20117 [9:55:06<2:53:08,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15495/20117 [9:55:09<2:53:23,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15496/20117 [9:55:11<2:52:50,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15497/20117 [9:55:13<2:54:56,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15498/20117 [9:55:15<2:54:57,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15499/20117 [9:55:18<2:53:33,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15500/20117 [9:55:20<2:53:09,  2.25s/it]                                                                                                                                 {'loss': 0.116, 'grad_norm': 0.40115076303482056, 'learning_rate': 2.51353170040316e-05, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.93, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████▏                  | 15500/20117 [9:55:20<2:53:09,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15501/20117 [9:55:22<2:53:08,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15502/20117 [9:55:24<2:52:20,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15503/20117 [9:55:27<2:52:52,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15504/20117 [9:55:29<2:52:04,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15505/20117 [9:55:31<2:52:16,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15506/20117 [9:55:33<2:51:39,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15507/20117 [9:55:36<2:52:10,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15508/20117 [9:55:38<2:51:59,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15509/20117 [9:55:40<2:52:11,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15510/20117 [9:55:42<2:52:24,  2.25s/it]                                                                                                                                 {'loss': 0.1467, 'grad_norm': 0.487507164478302, 'learning_rate': 2.5031358762994005e-05, 'memory/max_active (GiB)': 18.85, 'memory/max_allocated (GiB)': 18.85, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.74, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████▏                  | 15510/20117 [9:55:42<2:52:24,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15511/20117 [9:55:45<2:58:49,  2.33s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15512/20117 [9:55:47<2:56:50,  2.30s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15513/20117 [9:55:49<2:56:29,  2.30s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15514/20117 [9:55:52<2:54:39,  2.28s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15515/20117 [9:55:54<2:53:25,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15516/20117 [9:55:56<2:54:15,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▏                  | 15517/20117 [9:55:58<2:53:45,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15518/20117 [9:56:01<2:52:59,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15519/20117 [9:56:03<2:53:29,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15520/20117 [9:56:05<2:53:28,  2.26s/it]                                                                                                                                 {'loss': 0.1632, 'grad_norm': 0.4600401222705841, 'learning_rate': 2.492758518556473e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 389.11, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████▎                  | 15520/20117 [9:56:05<2:53:28,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15521/20117 [9:56:07<2:52:40,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15522/20117 [9:56:10<2:53:23,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15523/20117 [9:56:12<2:52:38,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15524/20117 [9:56:14<2:51:53,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15525/20117 [9:56:16<2:54:12,  2.28s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15526/20117 [9:56:19<2:54:03,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15527/20117 [9:56:21<2:54:00,  2.27s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15528/20117 [9:56:23<2:52:48,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15529/20117 [9:56:25<2:52:28,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15530/20117 [9:56:28<2:51:37,  2.24s/it]                                                                                                                                 {'loss': 0.1396, 'grad_norm': 0.5790596604347229, 'learning_rate': 2.482399652736006e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.02, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████▎                  | 15530/20117 [9:56:28<2:51:37,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15531/20117 [9:56:30<2:51:08,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15532/20117 [9:56:32<2:50:34,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15533/20117 [9:56:34<2:50:04,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15534/20117 [9:56:37<2:50:51,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15535/20117 [9:56:39<2:52:27,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15536/20117 [9:56:41<2:51:50,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15537/20117 [9:56:43<2:51:28,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15538/20117 [9:56:46<2:51:33,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15539/20117 [9:56:48<2:51:48,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15540/20117 [9:56:50<2:50:50,  2.24s/it]                                                                                                                                 {'loss': 0.201, 'grad_norm': 0.5716336369514465, 'learning_rate': 2.4720593043540752e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.4, 'epoch': 1.54}
 77%|███████████████████████████████████████████████████████████████▎                  | 15540/20117 [9:56:50<2:50:50,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15541/20117 [9:56:52<2:50:48,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15542/20117 [9:56:55<2:49:47,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15543/20117 [9:56:57<2:50:34,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15544/20117 [9:56:59<2:49:43,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15545/20117 [9:57:01<2:49:36,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15546/20117 [9:57:03<2:49:24,  2.22s/it] 77%|███████████████████████████████████████████████████████████████▎                  | 15547/20117 [9:57:06<2:48:45,  2.22s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15548/20117 [9:57:08<2:50:56,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15549/20117 [9:57:10<2:50:38,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15550/20117 [9:57:12<2:50:13,  2.24s/it]                                                                                                                                 {'loss': 0.1525, 'grad_norm': 0.5810480117797852, 'learning_rate': 2.461737498881148e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.4, 'epoch': 1.55}
 77%|███████████████████████████████████████████████████████████████▍                  | 15550/20117 [9:57:12<2:50:13,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15551/20117 [9:57:15<2:50:31,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15552/20117 [9:57:17<2:50:38,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15553/20117 [9:57:19<2:50:14,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15554/20117 [9:57:21<2:49:39,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15555/20117 [9:57:24<2:50:08,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15556/20117 [9:57:26<2:50:20,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15557/20117 [9:57:28<2:49:10,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15558/20117 [9:57:30<2:50:40,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15559/20117 [9:57:33<2:50:41,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15560/20117 [9:57:35<2:50:35,  2.25s/it]                                                                                                                                 {'loss': 0.1237, 'grad_norm': 0.3276031017303467, 'learning_rate': 2.451434261742005e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.69, 'epoch': 1.55}
 77%|███████████████████████████████████████████████████████████████▍                  | 15560/20117 [9:57:35<2:50:35,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15561/20117 [9:57:37<2:50:05,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15562/20117 [9:57:39<2:50:01,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15563/20117 [9:57:42<2:56:22,  2.32s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15564/20117 [9:57:44<2:54:26,  2.30s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15565/20117 [9:57:46<2:53:58,  2.29s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15566/20117 [9:57:49<2:53:56,  2.29s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15567/20117 [9:57:51<2:53:03,  2.28s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15568/20117 [9:57:53<2:52:55,  2.28s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15569/20117 [9:57:55<2:51:36,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15570/20117 [9:57:58<2:50:18,  2.25s/it]                                                                                                                                 {'loss': 0.131, 'grad_norm': 0.40953657031059265, 'learning_rate': 2.4411496183157045e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.6, 'epoch': 1.55}
 77%|███████████████████████████████████████████████████████████████▍                  | 15570/20117 [9:57:58<2:50:18,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15571/20117 [9:58:00<2:49:10,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15572/20117 [9:58:02<2:49:18,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15573/20117 [9:58:04<2:48:59,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15574/20117 [9:58:07<2:51:10,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15575/20117 [9:58:09<2:49:43,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15576/20117 [9:58:11<2:50:26,  2.25s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15577/20117 [9:58:13<2:49:27,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▍                  | 15578/20117 [9:58:15<2:48:40,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15579/20117 [9:58:18<2:48:29,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15580/20117 [9:58:20<2:48:57,  2.23s/it]                                                                                                                                 {'loss': 0.1647, 'grad_norm': 0.34955236315727234, 'learning_rate': 2.4308835939354913e-05, 'memory/max_active (GiB)': 19.09, 'memory/max_allocated (GiB)': 19.09, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.13, 'epoch': 1.55}
 77%|███████████████████████████████████████████████████████████████▌                  | 15580/20117 [9:58:20<2:48:57,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15581/20117 [9:58:22<2:48:48,  2.23s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15582/20117 [9:58:24<2:49:12,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15583/20117 [9:58:27<2:48:58,  2.24s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15584/20117 [9:58:29<2:50:50,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15585/20117 [9:58:31<2:50:45,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15586/20117 [9:58:34<2:51:49,  2.28s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15587/20117 [9:58:36<2:52:22,  2.28s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15588/20117 [9:58:38<2:50:18,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15589/20117 [9:58:40<2:50:47,  2.26s/it] 77%|███████████████████████████████████████████████████████████████▌                  | 15590/20117 [9:58:43<2:50:31,  2.26s/it]                                                                                                                                 {'loss': 0.1674, 'grad_norm': 0.5104487538337708, 'learning_rate': 2.4206362138887584e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 413.19, 'epoch': 1.55}
 77%|███████████████████████████████████████████████████████████████▌                  | 15590/20117 [9:58:43<2:50:31,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15591/20117 [9:58:45<2:49:12,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15592/20117 [9:58:47<2:48:45,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15593/20117 [9:58:49<2:52:01,  2.28s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15594/20117 [9:58:52<2:55:04,  2.32s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15595/20117 [9:58:54<2:56:13,  2.34s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15596/20117 [9:58:57<2:57:35,  2.36s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15597/20117 [9:58:59<2:58:41,  2.37s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15598/20117 [9:59:01<2:59:03,  2.38s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15599/20117 [9:59:04<2:59:46,  2.39s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15600/20117 [9:59:06<3:00:42,  2.40s/it]                                                                                                                                 {'loss': 0.1848, 'grad_norm': 0.3313903510570526, 'learning_rate': 2.4104075034169628e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.41, 'epoch': 1.55}
 78%|███████████████████████████████████████████████████████████████▌                  | 15600/20117 [9:59:06<3:00:42,  2.40s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15601/20117 [9:59:09<3:00:50,  2.40s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15602/20117 [9:59:11<3:00:16,  2.40s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15603/20117 [9:59:13<2:57:17,  2.36s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15604/20117 [9:59:16<2:54:37,  2.32s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15605/20117 [9:59:18<2:53:02,  2.30s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15606/20117 [9:59:20<2:52:46,  2.30s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15607/20117 [9:59:22<2:51:35,  2.28s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15608/20117 [9:59:24<2:49:34,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▌                  | 15609/20117 [9:59:27<2:49:10,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15610/20117 [9:59:29<2:48:51,  2.25s/it]                                                                                                                                 {'loss': 0.1431, 'grad_norm': 0.35265353322029114, 'learning_rate': 2.400197487715585e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.88, 'epoch': 1.55}
 78%|███████████████████████████████████████████████████████████████▋                  | 15610/20117 [9:59:29<2:48:51,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15611/20117 [9:59:31<2:48:16,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15612/20117 [9:59:33<2:47:20,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15613/20117 [9:59:36<2:49:36,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15614/20117 [9:59:38<2:48:06,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15615/20117 [9:59:41<2:55:43,  2.34s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15616/20117 [9:59:43<2:53:20,  2.31s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15617/20117 [9:59:45<2:51:41,  2.29s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15618/20117 [9:59:47<2:49:18,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15619/20117 [9:59:49<2:49:24,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15620/20117 [9:59:52<2:47:55,  2.24s/it]                                                                                                                                 {'loss': 0.145, 'grad_norm': 0.5764623880386353, 'learning_rate': 2.390006191934048e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.91, 'epoch': 1.55}
 78%|███████████████████████████████████████████████████████████████▋                  | 15620/20117 [9:59:52<2:47:55,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15621/20117 [9:59:54<2:47:28,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15622/20117 [9:59:56<2:49:08,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▋                  | 15623/20117 [9:59:58<2:50:37,  2.28s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15624/20117 [10:00:01<2:49:57,  2.27s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15625/20117 [10:00:03<2:48:00,  2.24s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15626/20117 [10:00:05<2:47:43,  2.24s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15627/20117 [10:00:07<2:48:12,  2.25s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15628/20117 [10:00:10<2:48:51,  2.26s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15629/20117 [10:00:12<2:49:29,  2.27s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15630/20117 [10:00:14<2:49:31,  2.27s/it]                                                                                                                                 {'loss': 0.1489, 'grad_norm': 0.5069451928138733, 'learning_rate': 2.3798336411756682e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 411.7, 'epoch': 1.55}
 78%|██████████████████████████████████████████████████████████████▉                  | 15630/20117 [10:00:14<2:49:31,  2.27s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15631/20117 [10:00:17<2:50:18,  2.28s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15632/20117 [10:00:19<2:50:49,  2.29s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15633/20117 [10:00:21<2:49:03,  2.26s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15634/20117 [10:00:23<2:47:58,  2.25s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15635/20117 [10:00:25<2:46:50,  2.23s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15636/20117 [10:00:28<2:46:13,  2.23s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15637/20117 [10:00:30<2:45:28,  2.22s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15638/20117 [10:00:32<2:45:36,  2.22s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15639/20117 [10:00:34<2:45:49,  2.22s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15640/20117 [10:00:37<2:45:11,  2.21s/it]                                                                                                                                 {'loss': 0.1571, 'grad_norm': 0.3448779881000519, 'learning_rate': 2.3696798604975933e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 383.87, 'epoch': 1.55}
 78%|██████████████████████████████████████████████████████████████▉                  | 15640/20117 [10:00:37<2:45:11,  2.21s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15641/20117 [10:00:39<2:45:13,  2.21s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15642/20117 [10:00:41<2:44:50,  2.21s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15643/20117 [10:00:43<2:45:12,  2.22s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15644/20117 [10:00:45<2:44:12,  2.20s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15645/20117 [10:00:48<2:44:37,  2.21s/it] 78%|██████████████████████████████████████████████████████████████▉                  | 15646/20117 [10:00:50<2:44:47,  2.21s/it] 78%|███████████████████████████████████████████████████████████████                  | 15647/20117 [10:00:52<2:45:16,  2.22s/it] 78%|███████████████████████████████████████████████████████████████                  | 15648/20117 [10:00:54<2:47:42,  2.25s/it] 78%|███████████████████████████████████████████████████████████████                  | 15649/20117 [10:00:57<2:46:23,  2.23s/it] 78%|███████████████████████████████████████████████████████████████                  | 15650/20117 [10:00:59<2:45:53,  2.23s/it]                                                                                                                                 {'loss': 0.1472, 'grad_norm': 0.7202053070068359, 'learning_rate': 2.359544874910723e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.25, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████                  | 15650/20117 [10:00:59<2:45:53,  2.23s/it] 78%|███████████████████████████████████████████████████████████████                  | 15651/20117 [10:01:01<2:47:14,  2.25s/it] 78%|███████████████████████████████████████████████████████████████                  | 15652/20117 [10:01:03<2:46:25,  2.24s/it] 78%|███████████████████████████████████████████████████████████████                  | 15653/20117 [10:01:06<2:47:14,  2.25s/it] 78%|███████████████████████████████████████████████████████████████                  | 15654/20117 [10:01:08<2:47:06,  2.25s/it] 78%|███████████████████████████████████████████████████████████████                  | 15655/20117 [10:01:10<2:47:58,  2.26s/it] 78%|███████████████████████████████████████████████████████████████                  | 15656/20117 [10:01:12<2:46:49,  2.24s/it] 78%|███████████████████████████████████████████████████████████████                  | 15657/20117 [10:01:15<2:47:04,  2.25s/it] 78%|███████████████████████████████████████████████████████████████                  | 15658/20117 [10:01:17<2:45:39,  2.23s/it] 78%|███████████████████████████████████████████████████████████████                  | 15659/20117 [10:01:19<2:45:04,  2.22s/it] 78%|███████████████████████████████████████████████████████████████                  | 15660/20117 [10:01:21<2:46:37,  2.24s/it]                                                                                                                                 {'loss': 0.1428, 'grad_norm': 0.41816800832748413, 'learning_rate': 2.3494287093796763e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.33, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████                  | 15660/20117 [10:01:21<2:46:37,  2.24s/it] 78%|███████████████████████████████████████████████████████████████                  | 15661/20117 [10:01:23<2:46:23,  2.24s/it] 78%|███████████████████████████████████████████████████████████████                  | 15662/20117 [10:01:26<2:45:45,  2.23s/it] 78%|███████████████████████████████████████████████████████████████                  | 15663/20117 [10:01:28<2:45:51,  2.23s/it] 78%|███████████████████████████████████████████████████████████████                  | 15664/20117 [10:01:30<2:46:17,  2.24s/it] 78%|███████████████████████████████████████████████████████████████                  | 15665/20117 [10:01:32<2:47:05,  2.25s/it] 78%|███████████████████████████████████████████████████████████████                  | 15666/20117 [10:01:35<2:55:14,  2.36s/it] 78%|███████████████████████████████████████████████████████████████                  | 15667/20117 [10:01:37<2:53:04,  2.33s/it] 78%|███████████████████████████████████████████████████████████████                  | 15668/20117 [10:01:40<2:51:05,  2.31s/it] 78%|███████████████████████████████████████████████████████████████                  | 15669/20117 [10:01:42<2:48:33,  2.27s/it] 78%|███████████████████████████████████████████████████████████████                  | 15670/20117 [10:01:44<2:48:22,  2.27s/it]                                                                                                                                 {'loss': 0.1495, 'grad_norm': 0.47676244378089905, 'learning_rate': 2.339331388822701e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 386.13, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████                  | 15670/20117 [10:01:44<2:48:22,  2.27s/it] 78%|███████████████████████████████████████████████████████████████                  | 15671/20117 [10:01:46<2:48:53,  2.28s/it] 78%|███████████████████████████████████████████████████████████████                  | 15672/20117 [10:01:49<2:47:39,  2.26s/it] 78%|███████████████████████████████████████████████████████████████                  | 15673/20117 [10:01:51<2:46:13,  2.24s/it] 78%|███████████████████████████████████████████████████████████████                  | 15674/20117 [10:01:53<2:45:58,  2.24s/it] 78%|███████████████████████████████████████████████████████████████                  | 15675/20117 [10:01:55<2:47:07,  2.26s/it] 78%|███████████████████████████████████████████████████████████████                  | 15676/20117 [10:01:58<2:46:38,  2.25s/it] 78%|███████████████████████████████████████████████████████████████                  | 15677/20117 [10:02:00<2:45:16,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15678/20117 [10:02:02<2:43:46,  2.21s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15679/20117 [10:02:04<2:43:53,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15680/20117 [10:02:06<2:43:55,  2.22s/it]                                                                                                                                 {'loss': 0.1541, 'grad_norm': 0.43887218832969666, 'learning_rate': 2.3292529381116336e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.57, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████▏                 | 15680/20117 [10:02:06<2:43:55,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15681/20117 [10:02:09<2:45:15,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15682/20117 [10:02:11<2:45:21,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15683/20117 [10:02:13<2:45:06,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15684/20117 [10:02:15<2:45:44,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15685/20117 [10:02:18<2:45:11,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15686/20117 [10:02:20<2:45:00,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15687/20117 [10:02:22<2:44:16,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15688/20117 [10:02:24<2:44:11,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15689/20117 [10:02:26<2:44:11,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15690/20117 [10:02:29<2:44:19,  2.23s/it]                                                                                                                                 {'loss': 0.1124, 'grad_norm': 0.5106401443481445, 'learning_rate': 2.319193382071829e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.79, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████▏                 | 15690/20117 [10:02:29<2:44:19,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15691/20117 [10:02:31<2:45:37,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15692/20117 [10:02:33<2:46:48,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15693/20117 [10:02:35<2:45:00,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15694/20117 [10:02:38<2:44:34,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15695/20117 [10:02:40<2:45:19,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15696/20117 [10:02:42<2:45:37,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15697/20117 [10:02:45<2:47:51,  2.28s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15698/20117 [10:02:47<2:47:04,  2.27s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15699/20117 [10:02:49<2:46:45,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15700/20117 [10:02:51<2:45:23,  2.25s/it]                                                                                                                                 {'loss': 0.1248, 'grad_norm': 0.6262336373329163, 'learning_rate': 2.3091527454821027e-05, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.97, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████▏                 | 15700/20117 [10:02:51<2:45:23,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15701/20117 [10:02:53<2:44:46,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15702/20117 [10:02:56<2:44:34,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15703/20117 [10:02:58<2:44:50,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15704/20117 [10:03:00<2:43:52,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15705/20117 [10:03:02<2:44:03,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15706/20117 [10:03:05<2:44:48,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15707/20117 [10:03:07<2:44:29,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▏                 | 15708/20117 [10:03:09<2:43:42,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15709/20117 [10:03:11<2:43:55,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15710/20117 [10:03:14<2:43:34,  2.23s/it]                                                                                                                                 {'loss': 0.1214, 'grad_norm': 0.48663491010665894, 'learning_rate': 2.299131053074659e-05, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 292.22, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████▎                 | 15710/20117 [10:03:14<2:43:34,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15711/20117 [10:03:16<2:45:20,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15712/20117 [10:03:18<2:45:00,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15713/20117 [10:03:20<2:44:54,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15714/20117 [10:03:23<2:46:01,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15715/20117 [10:03:25<2:46:18,  2.27s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15716/20117 [10:03:27<2:45:08,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15717/20117 [10:03:30<2:51:47,  2.34s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15718/20117 [10:03:32<2:51:14,  2.34s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15719/20117 [10:03:34<2:49:13,  2.31s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15720/20117 [10:03:36<2:46:57,  2.28s/it]                                                                                                                                 {'loss': 0.1655, 'grad_norm': 0.7886651158332825, 'learning_rate': 2.2891283295350508e-05, 'memory/max_active (GiB)': 21.41, 'memory/max_allocated (GiB)': 21.41, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 390.33, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████▎                 | 15720/20117 [10:03:36<2:46:57,  2.28s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15721/20117 [10:03:39<2:45:18,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15722/20117 [10:03:41<2:43:42,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15723/20117 [10:03:43<2:42:52,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15724/20117 [10:03:45<2:43:42,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15725/20117 [10:03:48<2:43:02,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15726/20117 [10:03:50<2:42:41,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15727/20117 [10:03:52<2:43:10,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15728/20117 [10:03:54<2:42:15,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15729/20117 [10:03:56<2:42:57,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15730/20117 [10:03:59<2:44:13,  2.25s/it]                                                                                                                                 {'loss': 0.1406, 'grad_norm': 0.2919699251651764, 'learning_rate': 2.2791445995020943e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.88, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████▎                 | 15730/20117 [10:03:59<2:44:13,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15731/20117 [10:04:01<2:43:52,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15732/20117 [10:04:03<2:43:10,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15733/20117 [10:04:05<2:42:42,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15734/20117 [10:04:08<2:41:53,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15735/20117 [10:04:10<2:43:09,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15736/20117 [10:04:12<2:42:50,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15737/20117 [10:04:14<2:43:24,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15738/20117 [10:04:17<2:42:39,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▎                 | 15739/20117 [10:04:19<2:42:29,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15740/20117 [10:04:21<2:43:53,  2.25s/it]                                                                                                                                 {'loss': 0.1673, 'grad_norm': 0.4909593164920807, 'learning_rate': 2.2691798875678304e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 320.3, 'epoch': 1.56}
 78%|███████████████████████████████████████████████████████████████▍                 | 15740/20117 [10:04:21<2:43:53,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15741/20117 [10:04:23<2:43:06,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15742/20117 [10:04:25<2:43:21,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15743/20117 [10:04:28<2:42:23,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15744/20117 [10:04:30<2:42:27,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15745/20117 [10:04:32<2:43:41,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15746/20117 [10:04:35<2:45:16,  2.27s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15747/20117 [10:04:37<2:45:25,  2.27s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15748/20117 [10:04:39<2:44:17,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15749/20117 [10:04:41<2:43:51,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15750/20117 [10:04:44<2:44:11,  2.26s/it]                                                                                                                                 {'loss': 0.1746, 'grad_norm': 0.37527504563331604, 'learning_rate': 2.2592342182774482e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 418.23, 'epoch': 1.57}
 78%|███████████████████████████████████████████████████████████████▍                 | 15750/20117 [10:04:44<2:44:11,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15751/20117 [10:04:46<2:42:43,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15752/20117 [10:04:48<2:41:42,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15753/20117 [10:04:50<2:42:07,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15754/20117 [10:04:52<2:41:58,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15755/20117 [10:04:55<2:43:14,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15756/20117 [10:04:57<2:43:16,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15757/20117 [10:04:59<2:42:00,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15758/20117 [10:05:01<2:42:36,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15759/20117 [10:05:04<2:43:32,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15760/20117 [10:05:06<2:43:31,  2.25s/it]                                                                                                                                 {'loss': 0.1401, 'grad_norm': 0.5083606243133545, 'learning_rate': 2.249307616129237e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.1, 'epoch': 1.57}
 78%|███████████████████████████████████████████████████████████████▍                 | 15760/20117 [10:05:06<2:43:31,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15761/20117 [10:05:08<2:43:24,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15762/20117 [10:05:10<2:44:22,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15763/20117 [10:05:13<2:42:39,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15764/20117 [10:05:15<2:42:32,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15765/20117 [10:05:17<2:41:25,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15766/20117 [10:05:19<2:40:45,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15767/20117 [10:05:21<2:40:23,  2.21s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15768/20117 [10:05:24<2:40:55,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15769/20117 [10:05:26<2:40:45,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▍                 | 15770/20117 [10:05:28<2:40:48,  2.22s/it]                                                                                                                                 {'loss': 0.1795, 'grad_norm': 0.5692464113235474, 'learning_rate': 2.2394001055745107e-05, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.36, 'epoch': 1.57}
 78%|███████████████████████████████████████████████████████████████▍                 | 15770/20117 [10:05:28<2:40:48,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15771/20117 [10:05:31<2:48:20,  2.32s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15772/20117 [10:05:33<2:46:19,  2.30s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15773/20117 [10:05:35<2:44:58,  2.28s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15774/20117 [10:05:37<2:43:58,  2.27s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15775/20117 [10:05:40<2:43:12,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15776/20117 [10:05:42<2:43:10,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15777/20117 [10:05:44<2:42:53,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15778/20117 [10:05:46<2:41:54,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15779/20117 [10:05:49<2:41:04,  2.23s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15780/20117 [10:05:51<2:40:31,  2.22s/it]                                                                                                                                 {'loss': 0.1981, 'grad_norm': 0.6630048751831055, 'learning_rate': 2.2295117110175645e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.48, 'epoch': 1.57}
 78%|███████████████████████████████████████████████████████████████▌                 | 15780/20117 [10:05:51<2:40:31,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15781/20117 [10:05:53<2:40:40,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15782/20117 [10:05:55<2:41:37,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15783/20117 [10:05:58<2:42:26,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15784/20117 [10:06:00<2:42:56,  2.26s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15785/20117 [10:06:02<2:42:18,  2.25s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15786/20117 [10:06:04<2:41:20,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15787/20117 [10:06:06<2:41:44,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15788/20117 [10:06:09<2:41:31,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15789/20117 [10:06:11<2:41:26,  2.24s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15790/20117 [10:06:13<2:40:16,  2.22s/it]                                                                                                                                 {'loss': 0.1255, 'grad_norm': 0.6706286072731018, 'learning_rate': 2.2196424568156073e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.95, 'epoch': 1.57}
 78%|███████████████████████████████████████████████████████████████▌                 | 15790/20117 [10:06:13<2:40:16,  2.22s/it] 78%|███████████████████████████████████████████████████████████████▌                 | 15791/20117 [10:06:15<2:40:54,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15792/20117 [10:06:18<2:40:38,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15793/20117 [10:06:20<2:41:09,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15794/20117 [10:06:22<2:41:25,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15795/20117 [10:06:24<2:41:42,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15796/20117 [10:06:27<2:41:18,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15797/20117 [10:06:29<2:41:47,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15798/20117 [10:06:31<2:41:47,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15799/20117 [10:06:33<2:41:37,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15800/20117 [10:06:36<2:40:37,  2.23s/it]                                                                                                                                 {'loss': 0.167, 'grad_norm': 0.6745642423629761, 'learning_rate': 2.2097923672786913e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 420.69, 'epoch': 1.57}
 79%|███████████████████████████████████████████████████████████████▌                 | 15800/20117 [10:06:36<2:40:37,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▌                 | 15801/20117 [10:06:38<2:39:45,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15802/20117 [10:06:40<2:38:37,  2.21s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15803/20117 [10:06:42<2:38:43,  2.21s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15804/20117 [10:06:44<2:40:32,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15805/20117 [10:06:47<2:40:38,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15806/20117 [10:06:49<2:41:51,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15807/20117 [10:06:51<2:41:58,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15808/20117 [10:06:53<2:42:20,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15809/20117 [10:06:56<2:41:34,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15810/20117 [10:06:58<2:41:34,  2.25s/it]                                                                                                                                 {'loss': 0.1483, 'grad_norm': 0.4416200518608093, 'learning_rate': 2.1999614666696733e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.55, 'epoch': 1.57}
 79%|███████████████████████████████████████████████████████████████▋                 | 15810/20117 [10:06:58<2:41:34,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15811/20117 [10:07:00<2:40:50,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15812/20117 [10:07:03<2:42:40,  2.27s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15813/20117 [10:07:05<2:42:26,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15814/20117 [10:07:07<2:43:31,  2.28s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15815/20117 [10:07:09<2:42:46,  2.27s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15816/20117 [10:07:12<2:42:30,  2.27s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15817/20117 [10:07:14<2:41:40,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15818/20117 [10:07:16<2:43:54,  2.29s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15819/20117 [10:07:19<2:45:43,  2.31s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15820/20117 [10:07:21<2:46:14,  2.32s/it]                                                                                                                                 {'loss': 0.1796, 'grad_norm': 0.6229557394981384, 'learning_rate': 2.1901497792041392e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.79, 'epoch': 1.57}
 79%|███████████████████████████████████████████████████████████████▋                 | 15820/20117 [10:07:21<2:46:14,  2.32s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15821/20117 [10:07:23<2:47:03,  2.33s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15822/20117 [10:07:26<2:47:45,  2.34s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15823/20117 [10:07:28<2:47:58,  2.35s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15824/20117 [10:07:30<2:48:32,  2.36s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15825/20117 [10:07:33<2:56:00,  2.46s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15826/20117 [10:07:36<2:55:19,  2.45s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15827/20117 [10:07:38<2:52:51,  2.42s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15828/20117 [10:07:40<2:51:28,  2.40s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15829/20117 [10:07:43<2:51:11,  2.40s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15830/20117 [10:07:45<2:50:16,  2.38s/it]                                                                                                                                 {'loss': 0.2508, 'grad_norm': 0.6999335289001465, 'learning_rate': 2.1803573290503497e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 421.46, 'epoch': 1.57}
 79%|███████████████████████████████████████████████████████████████▋                 | 15830/20117 [10:07:45<2:50:16,  2.38s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15831/20117 [10:07:47<2:50:59,  2.39s/it] 79%|███████████████████████████████████████████████████████████████▋                 | 15832/20117 [10:07:50<2:49:57,  2.38s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15833/20117 [10:07:52<2:48:15,  2.36s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15834/20117 [10:07:54<2:44:50,  2.31s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15835/20117 [10:07:56<2:42:43,  2.28s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15836/20117 [10:07:59<2:41:04,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15837/20117 [10:08:01<2:40:37,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15838/20117 [10:08:03<2:39:20,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15839/20117 [10:08:05<2:39:08,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15840/20117 [10:08:08<2:41:51,  2.27s/it]                                                                                                                                 {'loss': 0.1617, 'grad_norm': 0.4594232141971588, 'learning_rate': 2.170584140329177e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.47, 'epoch': 1.57}
 79%|███████████████████████████████████████████████████████████████▊                 | 15840/20117 [10:08:08<2:41:51,  2.27s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15841/20117 [10:08:10<2:43:14,  2.29s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15842/20117 [10:08:12<2:44:43,  2.31s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15843/20117 [10:08:15<2:46:31,  2.34s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15844/20117 [10:08:17<2:46:35,  2.34s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15845/20117 [10:08:19<2:46:56,  2.34s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15846/20117 [10:08:22<2:47:12,  2.35s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15847/20117 [10:08:24<2:44:22,  2.31s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15848/20117 [10:08:26<2:42:23,  2.28s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15849/20117 [10:08:28<2:40:20,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15850/20117 [10:08:31<2:42:53,  2.29s/it]                                                                                                                                 {'loss': 0.142, 'grad_norm': 0.18643365800380707, 'learning_rate': 2.1608302371140533e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 302.45, 'epoch': 1.58}
 79%|███████████████████████████████████████████████████████████████▊                 | 15850/20117 [10:08:31<2:42:53,  2.29s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15851/20117 [10:08:33<2:42:09,  2.28s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15852/20117 [10:08:35<2:41:58,  2.28s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15853/20117 [10:08:38<2:40:42,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15854/20117 [10:08:40<2:40:11,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15855/20117 [10:08:42<2:40:36,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15856/20117 [10:08:44<2:39:50,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15857/20117 [10:08:47<2:39:00,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15858/20117 [10:08:49<2:38:31,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15859/20117 [10:08:51<2:37:27,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15860/20117 [10:08:53<2:37:12,  2.22s/it]                                                                                                                                 {'loss': 0.1691, 'grad_norm': 0.42105787992477417, 'learning_rate': 2.1510956434308992e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.79, 'epoch': 1.58}
 79%|███████████████████████████████████████████████████████████████▊                 | 15860/20117 [10:08:53<2:37:12,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15861/20117 [10:08:55<2:37:49,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15862/20117 [10:08:58<2:38:21,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▊                 | 15863/20117 [10:09:00<2:37:21,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15864/20117 [10:09:02<2:37:15,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15865/20117 [10:09:04<2:38:07,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15866/20117 [10:09:07<2:38:08,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15867/20117 [10:09:09<2:39:11,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15868/20117 [10:09:11<2:38:30,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15869/20117 [10:09:13<2:38:07,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15870/20117 [10:09:15<2:37:47,  2.23s/it]                                                                                                                                 {'loss': 0.1398, 'grad_norm': 0.5371260046958923, 'learning_rate': 2.1413803832580813e-05, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.27, 'epoch': 1.58}
 79%|███████████████████████████████████████████████████████████████▉                 | 15870/20117 [10:09:15<2:37:47,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15871/20117 [10:09:18<2:37:20,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15872/20117 [10:09:20<2:37:38,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15873/20117 [10:09:22<2:37:09,  2.22s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15874/20117 [10:09:24<2:38:45,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15875/20117 [10:09:27<2:37:30,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15876/20117 [10:09:29<2:38:45,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15877/20117 [10:09:31<2:38:46,  2.25s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15878/20117 [10:09:34<2:44:47,  2.33s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15879/20117 [10:09:36<2:43:15,  2.31s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15880/20117 [10:09:38<2:41:49,  2.29s/it]                                                                                                                                 {'loss': 0.2099, 'grad_norm': 0.5238702297210693, 'learning_rate': 2.1316844805263346e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.19, 'epoch': 1.58}
 79%|███████████████████████████████████████████████████████████████▉                 | 15880/20117 [10:09:38<2:41:49,  2.29s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15881/20117 [10:09:40<2:40:43,  2.28s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15882/20117 [10:09:43<2:39:19,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15883/20117 [10:09:45<2:39:18,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15884/20117 [10:09:47<2:39:05,  2.26s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15885/20117 [10:09:49<2:37:56,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15886/20117 [10:09:52<2:38:17,  2.24s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15887/20117 [10:09:54<2:36:53,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15888/20117 [10:09:56<2:37:14,  2.23s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15889/20117 [10:09:58<2:36:04,  2.21s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15890/20117 [10:10:00<2:35:55,  2.21s/it]                                                                                                                                 {'loss': 0.1528, 'grad_norm': 0.2773045599460602, 'learning_rate': 2.1220079591187214e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.88, 'epoch': 1.58}
 79%|███████████████████████████████████████████████████████████████▉                 | 15890/20117 [10:10:00<2:35:55,  2.21s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15891/20117 [10:10:03<2:35:45,  2.21s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15892/20117 [10:10:05<2:35:46,  2.21s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15893/20117 [10:10:07<2:35:50,  2.21s/it] 79%|███████████████████████████████████████████████████████████████▉                 | 15894/20117 [10:10:09<2:35:19,  2.21s/it] 79%|████████████████████████████████████████████████████████████████                 | 15895/20117 [10:10:11<2:35:07,  2.20s/it] 79%|████████████████████████████████████████████████████████████████                 | 15896/20117 [10:10:14<2:34:38,  2.20s/it] 79%|████████████████████████████████████████████████████████████████                 | 15897/20117 [10:10:16<2:35:46,  2.21s/it] 79%|████████████████████████████████████████████████████████████████                 | 15898/20117 [10:10:18<2:36:13,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15899/20117 [10:10:20<2:36:36,  2.23s/it] 79%|████████████████████████████████████████████████████████████████                 | 15900/20117 [10:10:23<2:35:57,  2.22s/it]                                                                                                                                 {'loss': 0.1474, 'grad_norm': 0.6425759196281433, 'learning_rate': 2.112350842870553e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.59, 'epoch': 1.58}
 79%|████████████████████████████████████████████████████████████████                 | 15900/20117 [10:10:23<2:35:57,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15901/20117 [10:10:25<2:36:05,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15902/20117 [10:10:27<2:37:04,  2.24s/it] 79%|████████████████████████████████████████████████████████████████                 | 15903/20117 [10:10:29<2:36:06,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15904/20117 [10:10:31<2:35:32,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15905/20117 [10:10:34<2:35:14,  2.21s/it] 79%|████████████████████████████████████████████████████████████████                 | 15906/20117 [10:10:36<2:35:57,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15907/20117 [10:10:38<2:35:56,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15908/20117 [10:10:40<2:35:57,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15909/20117 [10:10:43<2:35:25,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15910/20117 [10:10:45<2:35:10,  2.21s/it]                                                                                                                                 {'loss': 0.1381, 'grad_norm': 0.3497686982154846, 'learning_rate': 2.1027131555693524e-05, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 300.82, 'epoch': 1.58}
 79%|████████████████████████████████████████████████████████████████                 | 15910/20117 [10:10:45<2:35:10,  2.21s/it] 79%|████████████████████████████████████████████████████████████████                 | 15911/20117 [10:10:47<2:34:52,  2.21s/it] 79%|████████████████████████████████████████████████████████████████                 | 15912/20117 [10:10:49<2:35:31,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15913/20117 [10:10:51<2:35:32,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15914/20117 [10:10:54<2:36:55,  2.24s/it] 79%|████████████████████████████████████████████████████████████████                 | 15915/20117 [10:10:56<2:35:57,  2.23s/it] 79%|████████████████████████████████████████████████████████████████                 | 15916/20117 [10:10:58<2:37:35,  2.25s/it] 79%|████████████████████████████████████████████████████████████████                 | 15917/20117 [10:11:00<2:36:32,  2.24s/it] 79%|████████████████████████████████████████████████████████████████                 | 15918/20117 [10:11:03<2:35:19,  2.22s/it] 79%|████████████████████████████████████████████████████████████████                 | 15919/20117 [10:11:05<2:35:48,  2.23s/it] 79%|████████████████████████████████████████████████████████████████                 | 15920/20117 [10:11:07<2:35:42,  2.23s/it]                                                                                                                                 {'loss': 0.1004, 'grad_norm': 0.37605515122413635, 'learning_rate': 2.0930949209547813e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.71, 'epoch': 1.58}
 79%|████████████████████████████████████████████████████████████████                 | 15920/20117 [10:11:07<2:35:42,  2.23s/it] 79%|████████████████████████████████████████████████████████████████                 | 15921/20117 [10:11:09<2:36:39,  2.24s/it] 79%|████████████████████████████████████████████████████████████████                 | 15922/20117 [10:11:12<2:35:56,  2.23s/it] 79%|████████████████████████████████████████████████████████████████                 | 15923/20117 [10:11:14<2:35:54,  2.23s/it] 79%|████████████████████████████████████████████████████████████████                 | 15924/20117 [10:11:16<2:35:38,  2.23s/it] 79%|████████████████████████████████████████████████████████████████                 | 15925/20117 [10:11:18<2:35:37,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15926/20117 [10:11:20<2:35:44,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15927/20117 [10:11:23<2:35:48,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15928/20117 [10:11:25<2:36:12,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15929/20117 [10:11:27<2:37:27,  2.26s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15930/20117 [10:11:29<2:36:25,  2.24s/it]                                                                                                                                 {'loss': 0.1309, 'grad_norm': 0.6107087731361389, 'learning_rate': 2.08349616271858e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.87, 'epoch': 1.58}
 79%|████████████████████████████████████████████████████████████████▏                | 15930/20117 [10:11:29<2:36:25,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15931/20117 [10:11:32<2:35:26,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15932/20117 [10:11:34<2:36:48,  2.25s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15933/20117 [10:11:36<2:42:11,  2.33s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15934/20117 [10:11:39<2:41:28,  2.32s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15935/20117 [10:11:41<2:40:29,  2.30s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15936/20117 [10:11:43<2:38:21,  2.27s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15937/20117 [10:11:45<2:36:43,  2.25s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15938/20117 [10:11:48<2:36:35,  2.25s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15939/20117 [10:11:50<2:35:32,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15940/20117 [10:11:52<2:34:17,  2.22s/it]                                                                                                                                 {'loss': 0.1575, 'grad_norm': 0.4413928687572479, 'learning_rate': 2.0739169045045237e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.56, 'epoch': 1.58}
 79%|████████████████████████████████████████████████████████████████▏                | 15940/20117 [10:11:52<2:34:17,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15941/20117 [10:11:54<2:34:25,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15942/20117 [10:11:57<2:36:09,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15943/20117 [10:11:59<2:35:10,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15944/20117 [10:12:01<2:34:46,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15945/20117 [10:12:03<2:34:14,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15946/20117 [10:12:05<2:34:47,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15947/20117 [10:12:08<2:33:52,  2.21s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15948/20117 [10:12:10<2:34:01,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15949/20117 [10:12:12<2:35:51,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15950/20117 [10:12:14<2:34:47,  2.23s/it]                                                                                                                                 {'loss': 0.1437, 'grad_norm': 0.3646581470966339, 'learning_rate': 2.064357169908345e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.0, 'epoch': 1.59}
 79%|████████████████████████████████████████████████████████████████▏                | 15950/20117 [10:12:14<2:34:47,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15951/20117 [10:12:17<2:34:11,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15952/20117 [10:12:19<2:33:47,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15953/20117 [10:12:21<2:34:16,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15954/20117 [10:12:23<2:35:28,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15955/20117 [10:12:25<2:34:22,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15956/20117 [10:12:28<2:37:17,  2.27s/it] 79%|████████████████████████████████████████████████████████████████▏                | 15957/20117 [10:12:30<2:35:28,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15958/20117 [10:12:32<2:35:37,  2.25s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15959/20117 [10:12:34<2:34:06,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15960/20117 [10:12:37<2:33:46,  2.22s/it]                                                                                                                                 {'loss': 0.1873, 'grad_norm': 0.5044755935668945, 'learning_rate': 2.054816982477693e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.28, 'epoch': 1.59}
 79%|████████████████████████████████████████████████████████████████▎                | 15960/20117 [10:12:37<2:33:46,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15961/20117 [10:12:39<2:34:14,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15962/20117 [10:12:41<2:36:21,  2.26s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15963/20117 [10:12:43<2:36:41,  2.26s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15964/20117 [10:12:46<2:35:06,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15965/20117 [10:12:48<2:34:41,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15966/20117 [10:12:50<2:33:48,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15967/20117 [10:12:52<2:34:19,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15968/20117 [10:12:54<2:33:11,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15969/20117 [10:12:57<2:32:37,  2.21s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15970/20117 [10:12:59<2:34:24,  2.23s/it]                                                                                                                                 {'loss': 0.1646, 'grad_norm': 0.5462661981582642, 'learning_rate': 2.045296365712066e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.25, 'epoch': 1.59}
 79%|████████████████████████████████████████████████████████████████▎                | 15970/20117 [10:12:59<2:34:24,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15971/20117 [10:13:01<2:33:23,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15972/20117 [10:13:03<2:33:31,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15973/20117 [10:13:06<2:33:16,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15974/20117 [10:13:08<2:32:47,  2.21s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15975/20117 [10:13:10<2:32:37,  2.21s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15976/20117 [10:13:12<2:33:14,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15977/20117 [10:13:15<2:33:49,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15978/20117 [10:13:17<2:34:15,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15979/20117 [10:13:19<2:33:22,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15980/20117 [10:13:21<2:33:49,  2.23s/it]                                                                                                                                 {'loss': 0.1461, 'grad_norm': 0.3154492676258087, 'learning_rate': 2.0357953430627575e-05, 'memory/max_active (GiB)': 18.82, 'memory/max_allocated (GiB)': 18.82, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.04, 'epoch': 1.59}
 79%|████████████████████████████████████████████████████████████████▎                | 15980/20117 [10:13:21<2:33:49,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15981/20117 [10:13:23<2:34:42,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15982/20117 [10:13:26<2:33:50,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15983/20117 [10:13:28<2:34:22,  2.24s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15984/20117 [10:13:30<2:33:25,  2.23s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15985/20117 [10:13:32<2:32:36,  2.22s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15986/20117 [10:13:35<2:32:08,  2.21s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15987/20117 [10:13:37<2:38:43,  2.31s/it] 79%|████████████████████████████████████████████████████████████████▎                | 15988/20117 [10:13:39<2:39:09,  2.31s/it] 79%|████████████████████████████████████████████████████████████████▍                | 15989/20117 [10:13:42<2:36:52,  2.28s/it] 79%|████████████████████████████████████████████████████████████████▍                | 15990/20117 [10:13:44<2:36:27,  2.27s/it]                                                                                                                                 {'loss': 0.1701, 'grad_norm': 0.4407619833946228, 'learning_rate': 2.02631393793279e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.26, 'epoch': 1.59}
 79%|████████████████████████████████████████████████████████████████▍                | 15990/20117 [10:13:44<2:36:27,  2.27s/it] 79%|████████████████████████████████████████████████████████████████▍                | 15991/20117 [10:13:46<2:34:47,  2.25s/it] 79%|████████████████████████████████████████████████████████████████▍                | 15992/20117 [10:13:48<2:35:21,  2.26s/it] 79%|████████████████████████████████████████████████████████████████▍                | 15993/20117 [10:13:51<2:34:21,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▍                | 15994/20117 [10:13:53<2:33:10,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▍                | 15995/20117 [10:13:55<2:33:57,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 15996/20117 [10:13:57<2:33:38,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 15997/20117 [10:13:59<2:33:50,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 15998/20117 [10:14:02<2:34:20,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▍                | 15999/20117 [10:14:04<2:33:57,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16000/20117 [10:14:06<2:33:42,  2.24s/it]                                                                                                                                 {'loss': 0.1009, 'grad_norm': 0.3745664358139038, 'learning_rate': 2.0168521736768732e-05, 'memory/max_active (GiB)': 20.62, 'memory/max_allocated (GiB)': 20.62, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.69, 'epoch': 1.59}
 80%|████████████████████████████████████████████████████████████████▍                | 16000/20117 [10:14:06<2:33:42,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16001/20117 [10:14:08<2:33:19,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16002/20117 [10:14:11<2:35:51,  2.27s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16003/20117 [10:14:13<2:34:27,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16004/20117 [10:14:15<2:33:20,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16005/20117 [10:14:17<2:33:32,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16006/20117 [10:14:20<2:33:57,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16007/20117 [10:14:22<2:34:35,  2.26s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16008/20117 [10:14:24<2:33:33,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16009/20117 [10:14:26<2:32:48,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16010/20117 [10:14:29<2:31:48,  2.22s/it]                                                                                                                                 {'loss': 0.189, 'grad_norm': 0.7978280186653137, 'learning_rate': 2.007410073601326e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.26, 'epoch': 1.59}
 80%|████████████████████████████████████████████████████████████████▍                | 16010/20117 [10:14:29<2:31:48,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16011/20117 [10:14:31<2:31:15,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16012/20117 [10:14:33<2:31:46,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16013/20117 [10:14:35<2:32:14,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16014/20117 [10:14:38<2:33:21,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16015/20117 [10:14:40<2:33:01,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16016/20117 [10:14:42<2:32:39,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16017/20117 [10:14:44<2:32:08,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16018/20117 [10:14:46<2:31:42,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▍                | 16019/20117 [10:14:49<2:31:11,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16020/20117 [10:14:51<2:31:05,  2.21s/it]                                                                                                                                 {'loss': 0.1536, 'grad_norm': 0.5025820136070251, 'learning_rate': 1.9979876609640437e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 389.34, 'epoch': 1.59}
 80%|████████████████████████████████████████████████████████████████▌                | 16020/20117 [10:14:51<2:31:05,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16021/20117 [10:14:53<2:30:37,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16022/20117 [10:14:55<2:30:25,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16023/20117 [10:14:57<2:30:28,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16024/20117 [10:15:00<2:30:50,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16025/20117 [10:15:02<2:30:43,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16026/20117 [10:15:04<2:30:03,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16027/20117 [10:15:06<2:30:59,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16028/20117 [10:15:08<2:30:43,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16029/20117 [10:15:11<2:31:03,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16030/20117 [10:15:13<2:29:56,  2.20s/it]                                                                                                                                 {'loss': 0.0939, 'grad_norm': 0.37046942114830017, 'learning_rate': 1.988584958974412e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.01, 'epoch': 1.59}
 80%|████████████████████████████████████████████████████████████████▌                | 16030/20117 [10:15:13<2:29:56,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16031/20117 [10:15:15<2:29:07,  2.19s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16032/20117 [10:15:17<2:29:59,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16033/20117 [10:15:19<2:30:12,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16034/20117 [10:15:22<2:30:29,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16035/20117 [10:15:24<2:30:44,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16036/20117 [10:15:26<2:30:53,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16037/20117 [10:15:28<2:29:52,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16038/20117 [10:15:31<2:30:47,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16039/20117 [10:15:33<2:36:24,  2.30s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16040/20117 [10:15:35<2:35:35,  2.29s/it]                                                                                                                                 {'loss': 0.131, 'grad_norm': 0.40366023778915405, 'learning_rate': 1.979201990793279e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.64, 'epoch': 1.59}
 80%|████████████████████████████████████████████████████████████████▌                | 16040/20117 [10:15:35<2:35:35,  2.29s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16041/20117 [10:15:38<2:33:42,  2.26s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16042/20117 [10:15:40<2:32:14,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16043/20117 [10:15:42<2:30:27,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16044/20117 [10:15:44<2:31:44,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16045/20117 [10:15:46<2:31:14,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16046/20117 [10:15:49<2:30:06,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16047/20117 [10:15:51<2:29:55,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16048/20117 [10:15:53<2:30:07,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16049/20117 [10:15:55<2:29:27,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▌                | 16050/20117 [10:15:57<2:29:17,  2.20s/it]                                                                                                                                 {'loss': 0.1501, 'grad_norm': 0.2725692689418793, 'learning_rate': 1.9698387795328788e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 410.98, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▌                | 16050/20117 [10:15:57<2:29:17,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16051/20117 [10:16:00<2:30:53,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16052/20117 [10:16:02<2:31:01,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16053/20117 [10:16:04<2:30:45,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16054/20117 [10:16:06<2:29:16,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16055/20117 [10:16:09<2:30:45,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16056/20117 [10:16:11<2:29:30,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16057/20117 [10:16:13<2:29:53,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16058/20117 [10:16:15<2:29:43,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16059/20117 [10:16:17<2:29:49,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16060/20117 [10:16:20<2:29:07,  2.21s/it]                                                                                                                                 {'loss': 0.1891, 'grad_norm': 0.5428498983383179, 'learning_rate': 1.9604953482567756e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 418.32, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▋                | 16060/20117 [10:16:20<2:29:07,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16061/20117 [10:16:22<2:29:56,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16062/20117 [10:16:24<2:29:15,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16063/20117 [10:16:26<2:28:27,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16064/20117 [10:16:28<2:28:58,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16065/20117 [10:16:31<2:28:16,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16066/20117 [10:16:33<2:28:31,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16067/20117 [10:16:35<2:28:36,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16068/20117 [10:16:37<2:27:50,  2.19s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16069/20117 [10:16:39<2:28:40,  2.20s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16070/20117 [10:16:42<2:29:14,  2.21s/it]                                                                                                                                 {'loss': 0.1726, 'grad_norm': 0.5400422215461731, 'learning_rate': 1.9511717199798208e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 399.7, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▋                | 16070/20117 [10:16:42<2:29:14,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16071/20117 [10:16:44<2:29:16,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16072/20117 [10:16:46<2:30:42,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16073/20117 [10:16:48<2:30:35,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16074/20117 [10:16:51<2:29:42,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16075/20117 [10:16:53<2:29:54,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16076/20117 [10:16:55<2:30:34,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16077/20117 [10:16:57<2:30:58,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16078/20117 [10:17:00<2:33:11,  2.28s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16079/20117 [10:17:02<2:33:29,  2.28s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16080/20117 [10:17:04<2:34:26,  2.30s/it]                                                                                                                                 {'loss': 0.1893, 'grad_norm': 0.5957368612289429, 'learning_rate': 1.9418679176680743e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 388.42, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▋                | 16080/20117 [10:17:04<2:34:26,  2.30s/it] 80%|████████████████████████████████████████████████████████████████▋                | 16081/20117 [10:17:07<2:33:48,  2.29s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16082/20117 [10:17:09<2:36:52,  2.33s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16083/20117 [10:17:11<2:35:30,  2.31s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16084/20117 [10:17:14<2:35:40,  2.32s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16085/20117 [10:17:16<2:35:59,  2.32s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16086/20117 [10:17:18<2:33:38,  2.29s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16087/20117 [10:17:20<2:31:56,  2.26s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16088/20117 [10:17:22<2:30:15,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16089/20117 [10:17:25<2:29:53,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16090/20117 [10:17:27<2:29:15,  2.22s/it]                                                                                                                                 {'loss': 0.1314, 'grad_norm': 0.8663964867591858, 'learning_rate': 1.9325839642387755e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 297.89, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▊                | 16090/20117 [10:17:27<2:29:15,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16091/20117 [10:17:29<2:28:23,  2.21s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16092/20117 [10:17:32<2:34:57,  2.31s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16093/20117 [10:17:34<2:33:15,  2.29s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16094/20117 [10:17:36<2:30:54,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16095/20117 [10:17:38<2:29:56,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16096/20117 [10:17:40<2:29:29,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16097/20117 [10:17:43<2:29:58,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16098/20117 [10:17:45<2:29:32,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16099/20117 [10:17:47<2:29:48,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16100/20117 [10:17:49<2:28:46,  2.22s/it]                                                                                                                                 {'loss': 0.1427, 'grad_norm': 0.46175047755241394, 'learning_rate': 1.9233198825602572e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.74, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▊                | 16100/20117 [10:17:49<2:28:46,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16101/20117 [10:17:52<2:28:55,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16102/20117 [10:17:54<2:31:49,  2.27s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16103/20117 [10:17:56<2:30:48,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16104/20117 [10:17:58<2:29:10,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16105/20117 [10:18:01<2:28:41,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16106/20117 [10:18:03<2:29:09,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16107/20117 [10:18:05<2:29:43,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16108/20117 [10:18:07<2:28:53,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16109/20117 [10:18:10<2:29:45,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16110/20117 [10:18:12<2:29:30,  2.24s/it]                                                                                                                                 {'loss': 0.1824, 'grad_norm': 0.5290225148200989, 'learning_rate': 1.9140756954519136e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 423.78, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▊                | 16110/20117 [10:18:12<2:29:30,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16111/20117 [10:18:14<2:29:01,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▊                | 16112/20117 [10:18:16<2:29:18,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16113/20117 [10:18:18<2:29:32,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16114/20117 [10:18:21<2:30:53,  2.26s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16115/20117 [10:18:23<2:29:44,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16116/20117 [10:18:25<2:29:06,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16117/20117 [10:18:27<2:28:52,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16118/20117 [10:18:30<2:28:15,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16119/20117 [10:18:32<2:28:59,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16120/20117 [10:18:34<2:27:55,  2.22s/it]                                                                                                                                 {'loss': 0.1428, 'grad_norm': 0.28334707021713257, 'learning_rate': 1.904851425684131e-05, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 358.47, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▉                | 16120/20117 [10:18:34<2:27:55,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16121/20117 [10:18:36<2:28:41,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16122/20117 [10:18:39<2:29:22,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16123/20117 [10:18:41<2:28:27,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16124/20117 [10:18:43<2:29:31,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16125/20117 [10:18:45<2:29:42,  2.25s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16126/20117 [10:18:48<2:28:17,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16127/20117 [10:18:50<2:28:17,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16128/20117 [10:18:52<2:29:03,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16129/20117 [10:18:54<2:29:09,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16130/20117 [10:18:56<2:27:51,  2.23s/it]                                                                                                                                 {'loss': 0.1508, 'grad_norm': 0.4152858555316925, 'learning_rate': 1.895647095978238e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.35, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▉                | 16130/20117 [10:18:56<2:27:51,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16131/20117 [10:18:59<2:27:34,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16132/20117 [10:19:01<2:28:39,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16133/20117 [10:19:03<2:27:42,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16134/20117 [10:19:05<2:27:43,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16135/20117 [10:19:08<2:27:35,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16136/20117 [10:19:10<2:28:28,  2.24s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16137/20117 [10:19:12<2:28:10,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16138/20117 [10:19:14<2:28:08,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16139/20117 [10:19:17<2:27:23,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16140/20117 [10:19:19<2:27:14,  2.22s/it]                                                                                                                                 {'loss': 0.1618, 'grad_norm': 0.5704591274261475, 'learning_rate': 1.8864627290064396e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.9, 'epoch': 1.6}
 80%|████████████████████████████████████████████████████████████████▉                | 16140/20117 [10:19:19<2:27:14,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16141/20117 [10:19:21<2:27:58,  2.23s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16142/20117 [10:19:23<2:27:02,  2.22s/it] 80%|████████████████████████████████████████████████████████████████▉                | 16143/20117 [10:19:25<2:26:21,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████                | 16144/20117 [10:19:28<2:26:49,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████                | 16145/20117 [10:19:30<2:26:27,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████                | 16146/20117 [10:19:32<2:33:50,  2.32s/it] 80%|█████████████████████████████████████████████████████████████████                | 16147/20117 [10:19:35<2:30:42,  2.28s/it] 80%|█████████████████████████████████████████████████████████████████                | 16148/20117 [10:19:37<2:29:17,  2.26s/it] 80%|█████████████████████████████████████████████████████████████████                | 16149/20117 [10:19:39<2:29:07,  2.25s/it] 80%|█████████████████████████████████████████████████████████████████                | 16150/20117 [10:19:41<2:28:19,  2.24s/it]                                                                                                                                 {'loss': 0.1667, 'grad_norm': 0.5318484902381897, 'learning_rate': 1.877298347391777e-05, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.18, 'epoch': 1.61}
 80%|█████████████████████████████████████████████████████████████████                | 16150/20117 [10:19:41<2:28:19,  2.24s/it] 80%|█████████████████████████████████████████████████████████████████                | 16151/20117 [10:19:43<2:27:31,  2.23s/it] 80%|█████████████████████████████████████████████████████████████████                | 16152/20117 [10:19:46<2:28:11,  2.24s/it] 80%|█████████████████████████████████████████████████████████████████                | 16153/20117 [10:19:48<2:29:16,  2.26s/it] 80%|█████████████████████████████████████████████████████████████████                | 16154/20117 [10:19:50<2:29:37,  2.27s/it] 80%|█████████████████████████████████████████████████████████████████                | 16155/20117 [10:19:53<2:30:19,  2.28s/it] 80%|█████████████████████████████████████████████████████████████████                | 16156/20117 [10:19:55<2:30:20,  2.28s/it] 80%|█████████████████████████████████████████████████████████████████                | 16157/20117 [10:19:57<2:30:25,  2.28s/it] 80%|█████████████████████████████████████████████████████████████████                | 16158/20117 [10:19:59<2:30:14,  2.28s/it] 80%|█████████████████████████████████████████████████████████████████                | 16159/20117 [10:20:02<2:29:55,  2.27s/it] 80%|█████████████████████████████████████████████████████████████████                | 16160/20117 [10:20:04<2:30:04,  2.28s/it]                                                                                                                                 {'loss': 0.152, 'grad_norm': 0.456482470035553, 'learning_rate': 1.8681539737080543e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.75, 'epoch': 1.61}
 80%|█████████████████████████████████████████████████████████████████                | 16160/20117 [10:20:04<2:30:04,  2.28s/it] 80%|█████████████████████████████████████████████████████████████████                | 16161/20117 [10:20:06<2:30:25,  2.28s/it] 80%|█████████████████████████████████████████████████████████████████                | 16162/20117 [10:20:09<2:31:23,  2.30s/it] 80%|█████████████████████████████████████████████████████████████████                | 16163/20117 [10:20:11<2:32:50,  2.32s/it] 80%|█████████████████████████████████████████████████████████████████                | 16164/20117 [10:20:13<2:32:09,  2.31s/it] 80%|█████████████████████████████████████████████████████████████████                | 16165/20117 [10:20:16<2:31:35,  2.30s/it] 80%|█████████████████████████████████████████████████████████████████                | 16166/20117 [10:20:18<2:29:21,  2.27s/it] 80%|█████████████████████████████████████████████████████████████████                | 16167/20117 [10:20:20<2:28:13,  2.25s/it] 80%|█████████████████████████████████████████████████████████████████                | 16168/20117 [10:20:22<2:27:41,  2.24s/it] 80%|█████████████████████████████████████████████████████████████████                | 16169/20117 [10:20:24<2:26:44,  2.23s/it] 80%|█████████████████████████████████████████████████████████████████                | 16170/20117 [10:20:27<2:26:14,  2.22s/it]                                                                                                                                 {'loss': 0.1347, 'grad_norm': 0.6000080108642578, 'learning_rate': 1.8590296304797996e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.23, 'epoch': 1.61}
 80%|█████████████████████████████████████████████████████████████████                | 16170/20117 [10:20:27<2:26:14,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████                | 16171/20117 [10:20:29<2:26:57,  2.23s/it] 80%|█████████████████████████████████████████████████████████████████                | 16172/20117 [10:20:31<2:27:28,  2.24s/it] 80%|█████████████████████████████████████████████████████████████████                | 16173/20117 [10:20:33<2:26:11,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████                | 16174/20117 [10:20:36<2:26:05,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16175/20117 [10:20:38<2:26:08,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16176/20117 [10:20:40<2:25:50,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16177/20117 [10:20:42<2:26:00,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16178/20117 [10:20:44<2:25:24,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16179/20117 [10:20:47<2:25:35,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16180/20117 [10:20:49<2:25:12,  2.21s/it]                                                                                                                                 {'loss': 0.1503, 'grad_norm': 0.48490649461746216, 'learning_rate': 1.8499253401822004e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.69, 'epoch': 1.61}
 80%|█████████████████████████████████████████████████████████████████▏               | 16180/20117 [10:20:49<2:25:12,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16181/20117 [10:20:51<2:25:07,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16182/20117 [10:20:53<2:25:07,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16183/20117 [10:20:55<2:24:46,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16184/20117 [10:20:58<2:24:37,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16185/20117 [10:21:00<2:25:13,  2.22s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16186/20117 [10:21:02<2:24:59,  2.21s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16187/20117 [10:21:04<2:26:18,  2.23s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16188/20117 [10:21:07<2:26:49,  2.24s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16189/20117 [10:21:09<2:27:08,  2.25s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16190/20117 [10:21:11<2:27:57,  2.26s/it]                                                                                                                                 {'loss': 0.18, 'grad_norm': 0.5004350543022156, 'learning_rate': 1.840841125241044e-05, 'memory/max_active (GiB)': 19.82, 'memory/max_allocated (GiB)': 19.82, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.28, 'epoch': 1.61}
 80%|█████████████████████████████████████████████████████████████████▏               | 16190/20117 [10:21:11<2:27:57,  2.26s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16191/20117 [10:21:13<2:27:56,  2.26s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16192/20117 [10:21:16<2:27:12,  2.25s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16193/20117 [10:21:18<2:27:04,  2.25s/it] 80%|█████████████████████████████████████████████████████████████████▏               | 16194/20117 [10:21:20<2:26:51,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16195/20117 [10:21:22<2:27:05,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16196/20117 [10:21:25<2:25:50,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16197/20117 [10:21:27<2:26:12,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16198/20117 [10:21:29<2:32:49,  2.34s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16199/20117 [10:21:32<2:29:57,  2.30s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16200/20117 [10:21:34<2:28:31,  2.27s/it]                                                                                                                                 {'loss': 0.171, 'grad_norm': 0.5725821256637573, 'learning_rate': 1.8317770080326757e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 427.8, 'epoch': 1.61}
 81%|█████████████████████████████████████████████████████████████████▏               | 16200/20117 [10:21:34<2:28:31,  2.27s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16201/20117 [10:21:36<2:26:38,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16202/20117 [10:21:38<2:26:26,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16203/20117 [10:21:40<2:25:29,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16204/20117 [10:21:43<2:26:05,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▏               | 16205/20117 [10:21:45<2:25:32,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16206/20117 [10:21:47<2:24:25,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16207/20117 [10:21:49<2:24:05,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16208/20117 [10:21:52<2:24:17,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16209/20117 [10:21:54<2:23:27,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16210/20117 [10:21:56<2:22:53,  2.19s/it]                                                                                                                                 {'loss': 0.1318, 'grad_norm': 0.47481808066368103, 'learning_rate': 1.822733010883928e-05, 'memory/max_active (GiB)': 20.65, 'memory/max_allocated (GiB)': 20.65, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.38, 'epoch': 1.61}
 81%|█████████████████████████████████████████████████████████████████▎               | 16210/20117 [10:21:56<2:22:53,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16211/20117 [10:21:58<2:23:41,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16212/20117 [10:22:00<2:24:11,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16213/20117 [10:22:03<2:23:08,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16214/20117 [10:22:05<2:24:33,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16215/20117 [10:22:07<2:25:41,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16216/20117 [10:22:09<2:26:16,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16217/20117 [10:22:12<2:26:01,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16218/20117 [10:22:14<2:25:06,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16219/20117 [10:22:16<2:24:11,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16220/20117 [10:22:18<2:24:03,  2.22s/it]                                                                                                                                 {'loss': 0.1429, 'grad_norm': 0.9544722437858582, 'learning_rate': 1.813709156072081e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.59, 'epoch': 1.61}
 81%|█████████████████████████████████████████████████████████████████▎               | 16220/20117 [10:22:18<2:24:03,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16221/20117 [10:22:20<2:24:01,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16222/20117 [10:22:23<2:23:09,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16223/20117 [10:22:25<2:23:29,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16224/20117 [10:22:27<2:27:24,  2.27s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16225/20117 [10:22:29<2:27:15,  2.27s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16226/20117 [10:22:32<2:24:40,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16227/20117 [10:22:34<2:24:04,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16228/20117 [10:22:36<2:24:25,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16229/20117 [10:22:38<2:23:13,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16230/20117 [10:22:40<2:23:30,  2.22s/it]                                                                                                                                 {'loss': 0.1672, 'grad_norm': 0.5491587519645691, 'learning_rate': 1.804705465824793e-05, 'memory/max_active (GiB)': 21.41, 'memory/max_allocated (GiB)': 21.41, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.87, 'epoch': 1.61}
 81%|█████████████████████████████████████████████████████████████████▎               | 16230/20117 [10:22:40<2:23:30,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16231/20117 [10:22:43<2:22:45,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16232/20117 [10:22:45<2:22:39,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16233/20117 [10:22:47<2:22:51,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16234/20117 [10:22:49<2:21:31,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16235/20117 [10:22:51<2:21:07,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▎               | 16236/20117 [10:22:54<2:21:23,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16237/20117 [10:22:56<2:22:37,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16238/20117 [10:22:58<2:22:21,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16239/20117 [10:23:00<2:22:07,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16240/20117 [10:23:02<2:22:12,  2.20s/it]                                                                                                                                 {'loss': 0.1793, 'grad_norm': 0.6248442530632019, 'learning_rate': 1.795721962320057e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.54, 'epoch': 1.61}
 81%|█████████████████████████████████████████████████████████████████▍               | 16240/20117 [10:23:02<2:22:12,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16241/20117 [10:23:05<2:22:28,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16242/20117 [10:23:07<2:22:05,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16243/20117 [10:23:09<2:21:18,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16244/20117 [10:23:11<2:22:40,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16245/20117 [10:23:13<2:22:07,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16246/20117 [10:23:16<2:22:32,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16247/20117 [10:23:18<2:22:32,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16248/20117 [10:23:20<2:22:45,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16249/20117 [10:23:22<2:22:02,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16250/20117 [10:23:25<2:27:30,  2.29s/it]                                                                                                                                 {'loss': 0.1438, 'grad_norm': 0.28726670145988464, 'learning_rate': 1.7867586676861416e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 278.32, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▍               | 16250/20117 [10:23:25<2:27:30,  2.29s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16251/20117 [10:23:27<2:25:48,  2.26s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16252/20117 [10:23:29<2:25:51,  2.26s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16253/20117 [10:23:31<2:24:16,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16254/20117 [10:23:34<2:23:31,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16255/20117 [10:23:36<2:23:25,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16256/20117 [10:23:38<2:22:20,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16257/20117 [10:23:40<2:21:28,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16258/20117 [10:23:42<2:22:10,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16259/20117 [10:23:45<2:21:37,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16260/20117 [10:23:47<2:21:13,  2.20s/it]                                                                                                                                 {'loss': 0.182, 'grad_norm': 0.36886531114578247, 'learning_rate': 1.7778156040015393e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.55, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▍               | 16260/20117 [10:23:47<2:21:13,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16261/20117 [10:23:49<2:22:33,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16262/20117 [10:23:51<2:22:39,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16263/20117 [10:23:53<2:21:58,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16264/20117 [10:23:56<2:22:54,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16265/20117 [10:23:58<2:22:04,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16266/20117 [10:24:00<2:21:53,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▍               | 16267/20117 [10:24:02<2:20:52,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16268/20117 [10:24:04<2:20:33,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16269/20117 [10:24:07<2:20:37,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16270/20117 [10:24:09<2:21:10,  2.20s/it]                                                                                                                                 {'loss': 0.1251, 'grad_norm': 0.3643040955066681, 'learning_rate': 1.7688927932948983e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 315.86, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▌               | 16270/20117 [10:24:09<2:21:10,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16271/20117 [10:24:11<2:21:57,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16272/20117 [10:24:13<2:20:53,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16273/20117 [10:24:15<2:20:26,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16274/20117 [10:24:18<2:19:55,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16275/20117 [10:24:20<2:20:38,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16276/20117 [10:24:22<2:20:48,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16277/20117 [10:24:24<2:22:20,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16278/20117 [10:24:27<2:23:24,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16279/20117 [10:24:29<2:22:36,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16280/20117 [10:24:31<2:22:14,  2.22s/it]                                                                                                                                 {'loss': 0.1725, 'grad_norm': 0.4020111560821533, 'learning_rate': 1.7599902575449955e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 434.3, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▌               | 16280/20117 [10:24:31<2:22:14,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16281/20117 [10:24:33<2:22:04,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16282/20117 [10:24:35<2:21:42,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16283/20117 [10:24:38<2:21:49,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16284/20117 [10:24:40<2:21:55,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16285/20117 [10:24:42<2:20:51,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16286/20117 [10:24:44<2:20:12,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16287/20117 [10:24:46<2:19:34,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16288/20117 [10:24:49<2:20:39,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16289/20117 [10:24:51<2:19:41,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16290/20117 [10:24:53<2:19:50,  2.19s/it]                                                                                                                                 {'loss': 0.1305, 'grad_norm': 0.5112940073013306, 'learning_rate': 1.7511080186806518e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 260.92, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▌               | 16290/20117 [10:24:53<2:19:50,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16291/20117 [10:24:55<2:20:18,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16292/20117 [10:24:57<2:19:40,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16293/20117 [10:25:00<2:19:40,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16294/20117 [10:25:02<2:19:44,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16295/20117 [10:25:04<2:19:27,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16296/20117 [10:25:06<2:19:41,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16297/20117 [10:25:08<2:20:03,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▌               | 16298/20117 [10:25:11<2:20:11,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16299/20117 [10:25:13<2:19:25,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16300/20117 [10:25:15<2:18:55,  2.18s/it]                                                                                                                                 {'loss': 0.1869, 'grad_norm': 0.7122372984886169, 'learning_rate': 1.742246098580701e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 320.49, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▋               | 16300/20117 [10:25:15<2:18:55,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16301/20117 [10:25:17<2:18:31,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16302/20117 [10:25:20<2:23:59,  2.26s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16303/20117 [10:25:22<2:23:10,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16304/20117 [10:25:24<2:22:38,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16305/20117 [10:25:26<2:21:54,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16306/20117 [10:25:28<2:21:02,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16307/20117 [10:25:31<2:19:55,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16308/20117 [10:25:33<2:19:57,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16309/20117 [10:25:35<2:19:51,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16310/20117 [10:25:37<2:19:54,  2.20s/it]                                                                                                                                 {'loss': 0.2079, 'grad_norm': 0.6411992311477661, 'learning_rate': 1.7334045190739277e-05, 'memory/max_active (GiB)': 19.2, 'memory/max_allocated (GiB)': 19.2, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.2, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▋               | 16310/20117 [10:25:37<2:19:54,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16311/20117 [10:25:39<2:20:12,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16312/20117 [10:25:42<2:19:18,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16313/20117 [10:25:44<2:18:35,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16314/20117 [10:25:46<2:18:25,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16315/20117 [10:25:48<2:18:56,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16316/20117 [10:25:50<2:18:37,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16317/20117 [10:25:52<2:18:06,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16318/20117 [10:25:55<2:17:46,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16319/20117 [10:25:57<2:18:21,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16320/20117 [10:25:59<2:18:32,  2.19s/it]                                                                                                                                 {'loss': 0.1607, 'grad_norm': 0.673321008682251, 'learning_rate': 1.7245833019390055e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.22, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▋               | 16320/20117 [10:25:59<2:18:32,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16321/20117 [10:26:01<2:18:12,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16322/20117 [10:26:03<2:17:55,  2.18s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16323/20117 [10:26:06<2:18:31,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16324/20117 [10:26:08<2:19:03,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16325/20117 [10:26:10<2:19:00,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16326/20117 [10:26:12<2:18:59,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16327/20117 [10:26:14<2:18:48,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16328/20117 [10:26:17<2:19:16,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▋               | 16329/20117 [10:26:19<2:18:22,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16330/20117 [10:26:21<2:17:56,  2.19s/it]                                                                                                                                 {'loss': 0.1823, 'grad_norm': 0.6042284369468689, 'learning_rate': 1.7157824689044632e-05, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.57, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▊               | 16330/20117 [10:26:21<2:17:56,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16331/20117 [10:26:23<2:19:17,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16332/20117 [10:26:25<2:18:35,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16333/20117 [10:26:28<2:18:12,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16334/20117 [10:26:30<2:17:54,  2.19s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16335/20117 [10:26:32<2:18:57,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16336/20117 [10:26:34<2:19:06,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16337/20117 [10:26:36<2:20:25,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16338/20117 [10:26:39<2:19:57,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16339/20117 [10:26:41<2:19:26,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16340/20117 [10:26:43<2:18:46,  2.20s/it]                                                                                                                                 {'loss': 0.125, 'grad_norm': 0.3080300986766815, 'learning_rate': 1.7070020416486065e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.14, 'epoch': 1.62}
 81%|█████████████████████████████████████████████████████████████████▊               | 16340/20117 [10:26:43<2:18:46,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16341/20117 [10:26:45<2:19:09,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16342/20117 [10:26:48<2:19:18,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16343/20117 [10:26:50<2:19:14,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16344/20117 [10:26:52<2:19:14,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16345/20117 [10:26:54<2:19:29,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16346/20117 [10:26:56<2:19:12,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16347/20117 [10:26:59<2:18:29,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16348/20117 [10:27:01<2:19:16,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16349/20117 [10:27:03<2:19:30,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16350/20117 [10:27:05<2:19:57,  2.23s/it]                                                                                                                                 {'loss': 0.1487, 'grad_norm': 0.33366823196411133, 'learning_rate': 1.6982420417994893e-05, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.66, 'epoch': 1.63}
 81%|█████████████████████████████████████████████████████████████████▊               | 16350/20117 [10:27:05<2:19:57,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16351/20117 [10:27:07<2:19:07,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16352/20117 [10:27:10<2:19:11,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16353/20117 [10:27:12<2:18:34,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16354/20117 [10:27:14<2:18:27,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16355/20117 [10:27:17<2:24:51,  2.31s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16356/20117 [10:27:19<2:23:15,  2.29s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16357/20117 [10:27:21<2:21:48,  2.26s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16358/20117 [10:27:23<2:22:00,  2.27s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16359/20117 [10:27:26<2:20:52,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▊               | 16360/20117 [10:27:28<2:19:49,  2.23s/it]                                                                                                                                 {'loss': 0.1528, 'grad_norm': 0.5744956731796265, 'learning_rate': 1.6895024909348367e-05, 'memory/max_active (GiB)': 18.19, 'memory/max_allocated (GiB)': 18.19, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.23, 'epoch': 1.63}
 81%|█████████████████████████████████████████████████████████████████▊               | 16360/20117 [10:27:28<2:19:49,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16361/20117 [10:27:30<2:18:53,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16362/20117 [10:27:32<2:18:30,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16363/20117 [10:27:34<2:20:44,  2.25s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16364/20117 [10:27:37<2:20:20,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16365/20117 [10:27:39<2:19:57,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16366/20117 [10:27:41<2:19:49,  2.24s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16367/20117 [10:27:43<2:19:37,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16368/20117 [10:27:46<2:19:38,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16369/20117 [10:27:48<2:19:01,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16370/20117 [10:27:50<2:18:20,  2.22s/it]                                                                                                                                 {'loss': 0.1521, 'grad_norm': 0.578666090965271, 'learning_rate': 1.6807834105820163e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.8, 'epoch': 1.63}
 81%|█████████████████████████████████████████████████████████████████▉               | 16370/20117 [10:27:50<2:18:20,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16371/20117 [10:27:52<2:19:03,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16372/20117 [10:27:54<2:18:38,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16373/20117 [10:27:57<2:17:59,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16374/20117 [10:27:59<2:18:23,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16375/20117 [10:28:01<2:17:40,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16376/20117 [10:28:03<2:17:25,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16377/20117 [10:28:06<2:18:04,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16378/20117 [10:28:08<2:18:59,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16379/20117 [10:28:10<2:19:04,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16380/20117 [10:28:12<2:18:11,  2.22s/it]                                                                                                                                 {'loss': 0.1405, 'grad_norm': 0.5622230768203735, 'learning_rate': 1.6720848222179587e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.53, 'epoch': 1.63}
 81%|█████████████████████████████████████████████████████████████████▉               | 16380/20117 [10:28:12<2:18:11,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16381/20117 [10:28:14<2:18:28,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16382/20117 [10:28:17<2:17:24,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16383/20117 [10:28:19<2:18:46,  2.23s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16384/20117 [10:28:21<2:18:17,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16385/20117 [10:28:23<2:17:24,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16386/20117 [10:28:25<2:17:02,  2.20s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16387/20117 [10:28:28<2:17:19,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16388/20117 [10:28:30<2:17:56,  2.22s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16389/20117 [10:28:32<2:17:32,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16390/20117 [10:28:34<2:17:34,  2.21s/it]                                                                                                                                 {'loss': 0.1543, 'grad_norm': 0.562519371509552, 'learning_rate': 1.6634067472691283e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.87, 'epoch': 1.63}
 81%|█████████████████████████████████████████████████████████████████▉               | 16390/20117 [10:28:34<2:17:34,  2.21s/it] 81%|█████████████████████████████████████████████████████████████████▉               | 16391/20117 [10:28:37<2:17:23,  2.21s/it] 81%|██████████████████████████████████████████████████████████████████               | 16392/20117 [10:28:39<2:18:12,  2.23s/it] 81%|██████████████████████████████████████████████████████████████████               | 16393/20117 [10:28:41<2:17:58,  2.22s/it] 81%|██████████████████████████████████████████████████████████████████               | 16394/20117 [10:28:43<2:19:08,  2.24s/it] 81%|██████████████████████████████████████████████████████████████████               | 16395/20117 [10:28:46<2:18:45,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████               | 16396/20117 [10:28:48<2:17:53,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████               | 16397/20117 [10:28:50<2:17:48,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████               | 16398/20117 [10:28:52<2:17:25,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████               | 16399/20117 [10:28:54<2:18:05,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████               | 16400/20117 [10:28:57<2:18:36,  2.24s/it]                                                                                                                                 {'loss': 0.1881, 'grad_norm': 0.5723958611488342, 'learning_rate': 1.65474920711146e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.21, 'epoch': 1.63}
 82%|██████████████████████████████████████████████████████████████████               | 16400/20117 [10:28:57<2:18:36,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████               | 16401/20117 [10:28:59<2:17:33,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████               | 16402/20117 [10:29:01<2:17:16,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████               | 16403/20117 [10:29:03<2:17:48,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████               | 16404/20117 [10:29:06<2:18:22,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████               | 16405/20117 [10:29:08<2:18:41,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████               | 16406/20117 [10:29:10<2:17:41,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████               | 16407/20117 [10:29:12<2:17:11,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████               | 16408/20117 [10:29:15<2:19:04,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████               | 16409/20117 [10:29:17<2:23:32,  2.32s/it] 82%|██████████████████████████████████████████████████████████████████               | 16410/20117 [10:29:19<2:21:16,  2.29s/it]                                                                                                                                 {'loss': 0.148, 'grad_norm': 0.5237053632736206, 'learning_rate': 1.646112223070305e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.74, 'epoch': 1.63}
 82%|██████████████████████████████████████████████████████████████████               | 16410/20117 [10:29:19<2:21:16,  2.29s/it] 82%|██████████████████████████████████████████████████████████████████               | 16411/20117 [10:29:22<2:21:11,  2.29s/it] 82%|██████████████████████████████████████████████████████████████████               | 16412/20117 [10:29:24<2:19:51,  2.27s/it] 82%|██████████████████████████████████████████████████████████████████               | 16413/20117 [10:29:26<2:18:09,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████               | 16414/20117 [10:29:28<2:18:56,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████               | 16415/20117 [10:29:30<2:18:11,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████               | 16416/20117 [10:29:33<2:19:27,  2.26s/it] 82%|██████████████████████████████████████████████████████████████████               | 16417/20117 [10:29:35<2:18:30,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████               | 16418/20117 [10:29:37<2:18:07,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████               | 16419/20117 [10:29:39<2:18:34,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████               | 16420/20117 [10:29:42<2:17:12,  2.23s/it]                                                                                                                                 {'loss': 0.1729, 'grad_norm': 0.576963484287262, 'learning_rate': 1.6374958164203768e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 374.69, 'epoch': 1.63}
 82%|██████████████████████████████████████████████████████████████████               | 16420/20117 [10:29:42<2:17:12,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████               | 16421/20117 [10:29:44<2:16:26,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████               | 16422/20117 [10:29:46<2:16:02,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16423/20117 [10:29:48<2:16:16,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16424/20117 [10:29:50<2:16:49,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16425/20117 [10:29:53<2:16:31,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16426/20117 [10:29:55<2:15:38,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16427/20117 [10:29:57<2:15:11,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16428/20117 [10:29:59<2:15:07,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16429/20117 [10:30:01<2:15:36,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16430/20117 [10:30:04<2:15:38,  2.21s/it]                                                                                                                                 {'loss': 0.1487, 'grad_norm': 0.5174708366394043, 'learning_rate': 1.6289000083857088e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 414.81, 'epoch': 1.63}
 82%|██████████████████████████████████████████████████████████████████▏              | 16430/20117 [10:30:04<2:15:38,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16431/20117 [10:30:06<2:16:01,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16432/20117 [10:30:08<2:16:12,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16433/20117 [10:30:10<2:16:18,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16434/20117 [10:30:13<2:17:58,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16435/20117 [10:30:15<2:17:10,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16436/20117 [10:30:17<2:17:02,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16437/20117 [10:30:19<2:15:48,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16438/20117 [10:30:22<2:16:44,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16439/20117 [10:30:24<2:16:22,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16440/20117 [10:30:26<2:17:25,  2.24s/it]                                                                                                                                 {'loss': 0.1505, 'grad_norm': 0.44571495056152344, 'learning_rate': 1.620324820139595e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.78, 'epoch': 1.63}
 82%|██████████████████████████████████████████████████████████████████▏              | 16440/20117 [10:30:26<2:17:25,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16441/20117 [10:30:28<2:16:21,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16442/20117 [10:30:30<2:15:50,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16443/20117 [10:30:33<2:15:05,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16444/20117 [10:30:35<2:15:24,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16445/20117 [10:30:37<2:14:59,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16446/20117 [10:30:39<2:14:46,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16447/20117 [10:30:41<2:14:21,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16448/20117 [10:30:44<2:16:00,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16449/20117 [10:30:46<2:15:16,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16450/20117 [10:30:48<2:15:20,  2.21s/it]                                                                                                                                 {'loss': 0.1283, 'grad_norm': 0.5897732377052307, 'learning_rate': 1.61177027280453e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 328.51, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▏              | 16450/20117 [10:30:48<2:15:20,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16451/20117 [10:30:50<2:15:19,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16452/20117 [10:30:52<2:14:46,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▏              | 16453/20117 [10:30:55<2:15:21,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16454/20117 [10:30:57<2:14:56,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16455/20117 [10:30:59<2:16:45,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16456/20117 [10:31:01<2:15:45,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16457/20117 [10:31:04<2:16:05,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16458/20117 [10:31:06<2:16:09,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16459/20117 [10:31:08<2:16:33,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16460/20117 [10:31:10<2:16:26,  2.24s/it]                                                                                                                                 {'loss': 0.2051, 'grad_norm': 0.5686812400817871, 'learning_rate': 1.6032363874521804e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.45, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▎              | 16460/20117 [10:31:10<2:16:26,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16461/20117 [10:31:13<2:17:04,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16462/20117 [10:31:15<2:16:01,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16463/20117 [10:31:17<2:15:20,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16464/20117 [10:31:20<2:21:58,  2.33s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16465/20117 [10:31:22<2:21:06,  2.32s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16466/20117 [10:31:24<2:18:45,  2.28s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16467/20117 [10:31:26<2:17:03,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16468/20117 [10:31:28<2:15:40,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16469/20117 [10:31:31<2:15:19,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16470/20117 [10:31:33<2:15:24,  2.23s/it]                                                                                                                                 {'loss': 0.1437, 'grad_norm': 0.6155579686164856, 'learning_rate': 1.5947231851033016e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.49, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▎              | 16470/20117 [10:31:33<2:15:24,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16471/20117 [10:31:35<2:16:33,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16472/20117 [10:31:37<2:15:48,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16473/20117 [10:31:40<2:16:05,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16474/20117 [10:31:42<2:15:04,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16475/20117 [10:31:44<2:14:45,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16476/20117 [10:31:46<2:15:21,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16477/20117 [10:31:49<2:15:39,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16478/20117 [10:31:51<2:14:54,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16479/20117 [10:31:53<2:14:47,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16480/20117 [10:31:55<2:15:05,  2.23s/it]                                                                                                                                 {'loss': 0.1803, 'grad_norm': 0.4496712386608124, 'learning_rate': 1.5862306867277155e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.25, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▎              | 16480/20117 [10:31:55<2:15:05,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16481/20117 [10:31:57<2:14:28,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16482/20117 [10:32:00<2:15:04,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16483/20117 [10:32:02<2:15:45,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▎              | 16484/20117 [10:32:04<2:15:35,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16485/20117 [10:32:06<2:14:45,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16486/20117 [10:32:09<2:14:19,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16487/20117 [10:32:11<2:13:41,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16488/20117 [10:32:13<2:14:14,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16489/20117 [10:32:15<2:13:47,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16490/20117 [10:32:17<2:13:25,  2.21s/it]                                                                                                                                 {'loss': 0.1214, 'grad_norm': 0.696696937084198, 'learning_rate': 1.5777589132442373e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.62, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▍              | 16490/20117 [10:32:17<2:13:25,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16491/20117 [10:32:20<2:13:19,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16492/20117 [10:32:22<2:14:17,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16493/20117 [10:32:24<2:14:07,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16494/20117 [10:32:26<2:14:12,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16495/20117 [10:32:29<2:14:21,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16496/20117 [10:32:31<2:15:43,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16497/20117 [10:32:33<2:15:50,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16498/20117 [10:32:35<2:15:00,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16499/20117 [10:32:38<2:14:29,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16500/20117 [10:32:40<2:15:02,  2.24s/it]                                                                                                                                 {'loss': 0.1722, 'grad_norm': 0.5139226317405701, 'learning_rate': 1.569307885520639e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 391.42, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▍              | 16500/20117 [10:32:40<2:15:02,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16501/20117 [10:32:42<2:14:36,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16502/20117 [10:32:44<2:13:39,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16503/20117 [10:32:46<2:14:38,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16504/20117 [10:32:49<2:14:05,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16505/20117 [10:32:51<2:14:58,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16506/20117 [10:32:53<2:14:48,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16507/20117 [10:32:55<2:14:39,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16508/20117 [10:32:58<2:14:18,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16509/20117 [10:33:00<2:15:17,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16510/20117 [10:33:02<2:15:31,  2.25s/it]                                                                                                                                 {'loss': 0.1549, 'grad_norm': 0.5510913133621216, 'learning_rate': 1.5608776243735834e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.41, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▍              | 16510/20117 [10:33:02<2:15:31,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16511/20117 [10:33:04<2:14:44,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16512/20117 [10:33:07<2:14:56,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16513/20117 [10:33:09<2:15:19,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16514/20117 [10:33:11<2:15:49,  2.26s/it] 82%|██████████████████████████████████████████████████████████████████▍              | 16515/20117 [10:33:14<2:22:54,  2.38s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16516/20117 [10:33:16<2:20:01,  2.33s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16517/20117 [10:33:18<2:17:21,  2.29s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16518/20117 [10:33:20<2:15:23,  2.26s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16519/20117 [10:33:23<2:14:17,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16520/20117 [10:33:25<2:14:02,  2.24s/it]                                                                                                                                 {'loss': 0.1431, 'grad_norm': 0.4640497863292694, 'learning_rate': 1.5524681505685888e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 387.36, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▌              | 16520/20117 [10:33:25<2:14:02,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16521/20117 [10:33:27<2:13:20,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16522/20117 [10:33:29<2:12:34,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16523/20117 [10:33:31<2:12:13,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16524/20117 [10:33:34<2:13:12,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16525/20117 [10:33:36<2:13:22,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16526/20117 [10:33:38<2:13:09,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16527/20117 [10:33:40<2:12:33,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16528/20117 [10:33:43<2:12:07,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16529/20117 [10:33:45<2:11:31,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16530/20117 [10:33:47<2:11:57,  2.21s/it]                                                                                                                                 {'loss': 0.1749, 'grad_norm': 0.59361332654953, 'learning_rate': 1.5440794848199657e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 407.49, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▌              | 16530/20117 [10:33:47<2:11:57,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16531/20117 [10:33:49<2:12:08,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16532/20117 [10:33:51<2:12:04,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16533/20117 [10:33:54<2:12:31,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16534/20117 [10:33:56<2:13:34,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16535/20117 [10:33:58<2:12:53,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16536/20117 [10:34:00<2:13:22,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16537/20117 [10:34:03<2:13:08,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16538/20117 [10:34:05<2:13:51,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16539/20117 [10:34:07<2:13:38,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16540/20117 [10:34:09<2:12:51,  2.23s/it]                                                                                                                                 {'loss': 0.1278, 'grad_norm': 0.552768349647522, 'learning_rate': 1.5357116477907728e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.79, 'epoch': 1.64}
 82%|██████████████████████████████████████████████████████████████████▌              | 16540/20117 [10:34:09<2:12:51,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16541/20117 [10:34:12<2:12:11,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16542/20117 [10:34:14<2:12:16,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16543/20117 [10:34:16<2:12:44,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16544/20117 [10:34:18<2:11:45,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16545/20117 [10:34:20<2:12:02,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▌              | 16546/20117 [10:34:23<2:12:16,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16547/20117 [10:34:25<2:14:09,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16548/20117 [10:34:27<2:12:42,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16549/20117 [10:34:29<2:12:22,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16550/20117 [10:34:32<2:11:49,  2.22s/it]                                                                                                                                 {'loss': 0.163, 'grad_norm': 0.7618371248245239, 'learning_rate': 1.5273646600927583e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.2, 'epoch': 1.65}
 82%|██████████████████████████████████████████████████████████████████▋              | 16550/20117 [10:34:32<2:11:49,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16551/20117 [10:34:34<2:11:08,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16552/20117 [10:34:36<2:12:19,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16553/20117 [10:34:38<2:11:42,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16554/20117 [10:34:40<2:11:05,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16555/20117 [10:34:43<2:11:48,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16556/20117 [10:34:45<2:12:02,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16557/20117 [10:34:47<2:12:59,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16558/20117 [10:34:49<2:12:07,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16559/20117 [10:34:52<2:11:09,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16560/20117 [10:34:54<2:11:28,  2.22s/it]                                                                                                                                 {'loss': 0.1467, 'grad_norm': 0.5775349140167236, 'learning_rate': 1.5190385422863174e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.94, 'epoch': 1.65}
 82%|██████████████████████████████████████████████████████████████████▋              | 16560/20117 [10:34:54<2:11:28,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16561/20117 [10:34:56<2:10:48,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16562/20117 [10:34:58<2:10:57,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16563/20117 [10:35:00<2:10:32,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16564/20117 [10:35:03<2:10:18,  2.20s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16565/20117 [10:35:05<2:11:36,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16566/20117 [10:35:07<2:12:28,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16567/20117 [10:35:09<2:11:51,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16568/20117 [10:35:11<2:11:38,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16569/20117 [10:35:14<2:11:06,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16570/20117 [10:35:16<2:16:06,  2.30s/it]                                                                                                                                 {'loss': 0.1599, 'grad_norm': 0.5621904134750366, 'learning_rate': 1.5107333148804414e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.44, 'epoch': 1.65}
 82%|██████████████████████████████████████████████████████████████████▋              | 16570/20117 [10:35:16<2:16:06,  2.30s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16571/20117 [10:35:18<2:15:16,  2.29s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16572/20117 [10:35:21<2:13:37,  2.26s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16573/20117 [10:35:23<2:13:26,  2.26s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16574/20117 [10:35:25<2:12:50,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16575/20117 [10:35:27<2:12:28,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16576/20117 [10:35:30<2:12:03,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▋              | 16577/20117 [10:35:32<2:11:50,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16578/20117 [10:35:34<2:11:38,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16579/20117 [10:35:36<2:11:30,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16580/20117 [10:35:39<2:11:41,  2.23s/it]                                                                                                                                 {'loss': 0.2052, 'grad_norm': 0.565984845161438, 'learning_rate': 1.5024489983326562e-05, 'memory/max_active (GiB)': 19.67, 'memory/max_allocated (GiB)': 19.67, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 412.88, 'epoch': 1.65}
 82%|██████████████████████████████████████████████████████████████████▊              | 16580/20117 [10:35:39<2:11:41,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16581/20117 [10:35:41<2:12:54,  2.26s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16582/20117 [10:35:43<2:12:17,  2.25s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16583/20117 [10:35:45<2:11:29,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16584/20117 [10:35:47<2:11:55,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16585/20117 [10:35:50<2:10:58,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16586/20117 [10:35:52<2:10:14,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16587/20117 [10:35:54<2:10:05,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16588/20117 [10:35:56<2:09:53,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16589/20117 [10:35:59<2:10:08,  2.21s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16590/20117 [10:36:01<2:10:49,  2.23s/it]                                                                                                                                 {'loss': 0.1494, 'grad_norm': 0.456301748752594, 'learning_rate': 1.4941856130489884e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 374.55, 'epoch': 1.65}
 82%|██████████████████████████████████████████████████████████████████▊              | 16590/20117 [10:36:01<2:10:49,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16591/20117 [10:36:03<2:11:00,  2.23s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16592/20117 [10:36:05<2:10:13,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16593/20117 [10:36:07<2:10:34,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16594/20117 [10:36:10<2:10:17,  2.22s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16595/20117 [10:36:12<2:11:15,  2.24s/it] 82%|██████████████████████████████████████████████████████████████████▊              | 16596/20117 [10:36:14<2:10:36,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16597/20117 [10:36:16<2:10:42,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16598/20117 [10:36:19<2:12:01,  2.25s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16599/20117 [10:36:21<2:13:38,  2.28s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16600/20117 [10:36:23<2:12:00,  2.25s/it]                                                                                                                                 {'loss': 0.131, 'grad_norm': 0.414070725440979, 'learning_rate': 1.4859431793838995e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.68, 'epoch': 1.65}
 83%|██████████████████████████████████████████████████████████████████▊              | 16600/20117 [10:36:23<2:12:00,  2.25s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16601/20117 [10:36:25<2:10:51,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16602/20117 [10:36:28<2:13:17,  2.28s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16603/20117 [10:36:30<2:11:56,  2.25s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16604/20117 [10:36:32<2:10:40,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16605/20117 [10:36:34<2:12:00,  2.26s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16606/20117 [10:36:37<2:11:44,  2.25s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16607/20117 [10:36:39<2:11:10,  2.24s/it] 83%|██████████████████████████████████████████████████████████████████▊              | 16608/20117 [10:36:41<2:09:54,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16609/20117 [10:36:43<2:09:42,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16610/20117 [10:36:45<2:08:59,  2.21s/it]                                                                                                                                 {'loss': 0.1532, 'grad_norm': 0.6677629947662354, 'learning_rate': 1.477721717640248e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 419.18, 'epoch': 1.65}
 83%|██████████████████████████████████████████████████████████████████▉              | 16610/20117 [10:36:45<2:08:59,  2.21s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16611/20117 [10:36:48<2:10:15,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16612/20117 [10:36:50<2:09:38,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16613/20117 [10:36:52<2:08:55,  2.21s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16614/20117 [10:36:54<2:09:30,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16615/20117 [10:36:57<2:09:29,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16616/20117 [10:36:59<2:08:59,  2.21s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16617/20117 [10:37:01<2:09:12,  2.21s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16618/20117 [10:37:03<2:10:11,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16619/20117 [10:37:06<2:10:20,  2.24s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16620/20117 [10:37:08<2:10:57,  2.25s/it]                                                                                                                                 {'loss': 0.1519, 'grad_norm': 0.22176572680473328, 'learning_rate': 1.4695212480692277e-05, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 401.78, 'epoch': 1.65}
 83%|██████████████████████████████████████████████████████████████████▉              | 16620/20117 [10:37:08<2:10:57,  2.25s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16621/20117 [10:37:10<2:11:40,  2.26s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16622/20117 [10:37:12<2:10:40,  2.24s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16623/20117 [10:37:15<2:15:25,  2.33s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16624/20117 [10:37:17<2:13:03,  2.29s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16625/20117 [10:37:19<2:11:37,  2.26s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16626/20117 [10:37:21<2:10:07,  2.24s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16627/20117 [10:37:24<2:09:38,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16628/20117 [10:37:26<2:08:50,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16629/20117 [10:37:28<2:09:06,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16630/20117 [10:37:30<2:08:38,  2.21s/it]                                                                                                                                 {'loss': 0.1728, 'grad_norm': 0.6775956749916077, 'learning_rate': 1.4613417908703342e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.56, 'epoch': 1.65}
 83%|██████████████████████████████████████████████████████████████████▉              | 16630/20117 [10:37:30<2:08:38,  2.21s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16631/20117 [10:37:32<2:08:32,  2.21s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16632/20117 [10:37:35<2:09:04,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16633/20117 [10:37:37<2:08:44,  2.22s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16634/20117 [10:37:39<2:09:20,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16635/20117 [10:37:41<2:09:13,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16636/20117 [10:37:44<2:08:03,  2.21s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16637/20117 [10:37:46<2:09:38,  2.24s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16638/20117 [10:37:48<2:09:18,  2.23s/it] 83%|██████████████████████████████████████████████████████████████████▉              | 16639/20117 [10:37:50<2:09:23,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16640/20117 [10:37:52<2:09:02,  2.23s/it]                                                                                                                                 {'loss': 0.1076, 'grad_norm': 0.4293968081474304, 'learning_rate': 1.4531833661912942e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.88, 'epoch': 1.65}
 83%|███████████████████████████████████████████████████████████████████              | 16640/20117 [10:37:52<2:09:02,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16641/20117 [10:37:55<2:08:13,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16642/20117 [10:37:57<2:08:03,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16643/20117 [10:37:59<2:08:15,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████              | 16644/20117 [10:38:01<2:08:08,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16645/20117 [10:38:04<2:08:02,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16646/20117 [10:38:06<2:08:30,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████              | 16647/20117 [10:38:08<2:08:52,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16648/20117 [10:38:10<2:08:25,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████              | 16649/20117 [10:38:12<2:09:11,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████              | 16650/20117 [10:38:15<2:09:24,  2.24s/it]                                                                                                                                 {'loss': 0.1682, 'grad_norm': 0.5610436797142029, 'learning_rate': 1.445045994128037e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.34, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████              | 16650/20117 [10:38:15<2:09:24,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████              | 16651/20117 [10:38:17<2:09:13,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████              | 16652/20117 [10:38:19<2:10:33,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████              | 16653/20117 [10:38:21<2:09:51,  2.25s/it] 83%|███████████████████████████████████████████████████████████████████              | 16654/20117 [10:38:24<2:08:34,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16655/20117 [10:38:26<2:08:04,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████              | 16656/20117 [10:38:28<2:07:29,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16657/20117 [10:38:30<2:07:28,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16658/20117 [10:38:32<2:07:24,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16659/20117 [10:38:35<2:07:08,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16660/20117 [10:38:37<2:08:20,  2.23s/it]                                                                                                                                 {'loss': 0.1587, 'grad_norm': 0.616041362285614, 'learning_rate': 1.4369296947246236e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.03, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████              | 16660/20117 [10:38:37<2:08:20,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16661/20117 [10:38:39<2:08:48,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████              | 16662/20117 [10:38:41<2:08:05,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████              | 16663/20117 [10:38:44<2:07:24,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16664/20117 [10:38:46<2:07:06,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████              | 16665/20117 [10:38:48<2:07:31,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████              | 16666/20117 [10:38:50<2:08:24,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16667/20117 [10:38:52<2:07:55,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████              | 16668/20117 [10:38:55<2:08:31,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████              | 16669/20117 [10:38:57<2:07:56,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16670/20117 [10:38:59<2:07:59,  2.23s/it]                                                                                                                                 {'loss': 0.1889, 'grad_norm': 0.3607831299304962, 'learning_rate': 1.4288344879732185e-05, 'memory/max_active (GiB)': 18.86, 'memory/max_allocated (GiB)': 18.86, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.37, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████              | 16670/20117 [10:38:59<2:07:59,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████              | 16671/20117 [10:39:01<2:07:55,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16672/20117 [10:39:04<2:08:16,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16673/20117 [10:39:06<2:07:56,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16674/20117 [10:39:08<2:07:55,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16675/20117 [10:39:10<2:07:06,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16676/20117 [10:39:13<2:07:37,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16677/20117 [10:39:15<2:07:04,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16678/20117 [10:39:17<2:12:15,  2.31s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16679/20117 [10:39:20<2:11:22,  2.29s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16680/20117 [10:39:22<2:09:58,  2.27s/it]                                                                                                                                 {'loss': 0.1525, 'grad_norm': 0.2895027697086334, 'learning_rate': 1.420760393814028e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 375.97, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████▏             | 16680/20117 [10:39:22<2:09:58,  2.27s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16681/20117 [10:39:24<2:08:27,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16682/20117 [10:39:26<2:07:23,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16683/20117 [10:39:28<2:07:57,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16684/20117 [10:39:31<2:08:17,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16685/20117 [10:39:33<2:07:11,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16686/20117 [10:39:35<2:07:37,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16687/20117 [10:39:37<2:07:49,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16688/20117 [10:39:39<2:07:09,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16689/20117 [10:39:42<2:06:11,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16690/20117 [10:39:44<2:07:05,  2.23s/it]                                                                                                                                 {'loss': 0.1653, 'grad_norm': 0.35571709275245667, 'learning_rate': 1.4127074321352517e-05, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.31, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████▏             | 16690/20117 [10:39:44<2:07:05,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16691/20117 [10:39:46<2:06:21,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16692/20117 [10:39:48<2:05:53,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16693/20117 [10:39:51<2:06:35,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16694/20117 [10:39:53<2:06:24,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16695/20117 [10:39:55<2:06:40,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16696/20117 [10:39:57<2:06:01,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16697/20117 [10:39:59<2:06:43,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16698/20117 [10:40:02<2:06:18,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16699/20117 [10:40:04<2:07:00,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16700/20117 [10:40:06<2:07:13,  2.23s/it]                                                                                                                                 {'loss': 0.1795, 'grad_norm': 0.6197568774223328, 'learning_rate': 1.404675622773034e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 320.06, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████▏             | 16700/20117 [10:40:06<2:07:13,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16701/20117 [10:40:08<2:07:38,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▏             | 16702/20117 [10:40:11<2:08:03,  2.25s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16703/20117 [10:40:13<2:07:36,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16704/20117 [10:40:15<2:06:44,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16705/20117 [10:40:17<2:06:27,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16706/20117 [10:40:20<2:06:06,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16707/20117 [10:40:22<2:05:43,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16708/20117 [10:40:24<2:05:49,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16709/20117 [10:40:26<2:05:14,  2.20s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16710/20117 [10:40:28<2:04:35,  2.19s/it]                                                                                                                                 {'loss': 0.1592, 'grad_norm': 0.5286340713500977, 'learning_rate': 1.3966649855114211e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.26, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████▎             | 16710/20117 [10:40:28<2:04:35,  2.19s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16711/20117 [10:40:31<2:05:56,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16712/20117 [10:40:33<2:06:34,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16713/20117 [10:40:35<2:06:40,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16714/20117 [10:40:37<2:06:16,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16715/20117 [10:40:40<2:06:30,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16716/20117 [10:40:42<2:06:24,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16717/20117 [10:40:44<2:06:45,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16718/20117 [10:40:46<2:05:48,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16719/20117 [10:40:48<2:06:05,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16720/20117 [10:40:51<2:06:17,  2.23s/it]                                                                                                                                 {'loss': 0.1282, 'grad_norm': 0.4165874719619751, 'learning_rate': 1.3886755400823071e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 383.56, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████▎             | 16720/20117 [10:40:51<2:06:17,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16721/20117 [10:40:53<2:06:30,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16722/20117 [10:40:55<2:05:30,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16723/20117 [10:40:57<2:04:58,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16724/20117 [10:40:59<2:04:48,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16725/20117 [10:41:02<2:05:04,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16726/20117 [10:41:04<2:07:28,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16727/20117 [10:41:06<2:06:45,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16728/20117 [10:41:09<2:06:56,  2.25s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16729/20117 [10:41:11<2:12:52,  2.35s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16730/20117 [10:41:13<2:10:49,  2.32s/it]                                                                                                                                 {'loss': 0.2177, 'grad_norm': 0.4585205316543579, 'learning_rate': 1.3807073061653809e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 423.57, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████▎             | 16730/20117 [10:41:13<2:10:49,  2.32s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16731/20117 [10:41:16<2:08:45,  2.28s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16732/20117 [10:41:18<2:07:23,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▎             | 16733/20117 [10:41:20<2:06:25,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16734/20117 [10:41:22<2:07:27,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16735/20117 [10:41:24<2:06:23,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16736/20117 [10:41:27<2:07:30,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16737/20117 [10:41:29<2:06:29,  2.25s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16738/20117 [10:41:31<2:05:23,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16739/20117 [10:41:33<2:04:56,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16740/20117 [10:41:36<2:05:10,  2.22s/it]                                                                                                                                 {'loss': 0.104, 'grad_norm': 0.4411196708679199, 'learning_rate': 1.372760303388091e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 305.5, 'epoch': 1.66}
 83%|███████████████████████████████████████████████████████████████████▍             | 16740/20117 [10:41:36<2:05:10,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16741/20117 [10:41:38<2:04:36,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16742/20117 [10:41:40<2:04:46,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16743/20117 [10:41:42<2:05:38,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16744/20117 [10:41:44<2:04:42,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16745/20117 [10:41:47<2:04:11,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16746/20117 [10:41:49<2:05:12,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16747/20117 [10:41:51<2:05:11,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16748/20117 [10:41:53<2:04:40,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16749/20117 [10:41:56<2:03:55,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16750/20117 [10:41:58<2:03:45,  2.21s/it]                                                                                                                                 {'loss': 0.1809, 'grad_norm': 0.5241126418113708, 'learning_rate': 1.36483455132558e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.92, 'epoch': 1.67}
 83%|███████████████████████████████████████████████████████████████████▍             | 16750/20117 [10:41:58<2:03:45,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16751/20117 [10:42:00<2:04:51,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16752/20117 [10:42:02<2:04:40,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16753/20117 [10:42:04<2:04:10,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16754/20117 [10:42:07<2:04:42,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16755/20117 [10:42:09<2:04:50,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16756/20117 [10:42:11<2:04:55,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16757/20117 [10:42:13<2:05:58,  2.25s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16758/20117 [10:42:16<2:06:49,  2.27s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16759/20117 [10:42:18<2:06:15,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16760/20117 [10:42:20<2:06:30,  2.26s/it]                                                                                                                                 {'loss': 0.1311, 'grad_norm': 0.6349103450775146, 'learning_rate': 1.3569300695006548e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 388.79, 'epoch': 1.67}
 83%|███████████████████████████████████████████████████████████████████▍             | 16760/20117 [10:42:20<2:06:30,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16761/20117 [10:42:23<2:06:37,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16762/20117 [10:42:25<2:05:25,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16763/20117 [10:42:27<2:04:15,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▍             | 16764/20117 [10:42:29<2:03:53,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16765/20117 [10:42:31<2:05:11,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16766/20117 [10:42:34<2:05:04,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16767/20117 [10:42:36<2:04:37,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16768/20117 [10:42:38<2:03:38,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16769/20117 [10:42:40<2:03:23,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16770/20117 [10:42:42<2:03:34,  2.22s/it]                                                                                                                                 {'loss': 0.1204, 'grad_norm': 0.46360382437705994, 'learning_rate': 1.3490468773837217e-05, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.52, 'epoch': 1.67}
 83%|███████████████████████████████████████████████████████████████████▌             | 16770/20117 [10:42:42<2:03:34,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16771/20117 [10:42:45<2:03:04,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16772/20117 [10:42:47<2:04:07,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16773/20117 [10:42:49<2:04:05,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16774/20117 [10:42:51<2:03:07,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16775/20117 [10:42:54<2:03:12,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16776/20117 [10:42:56<2:03:07,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16777/20117 [10:42:58<2:04:14,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16778/20117 [10:43:00<2:03:51,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16779/20117 [10:43:02<2:02:57,  2.21s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16780/20117 [10:43:05<2:03:46,  2.23s/it]                                                                                                                                 {'loss': 0.1616, 'grad_norm': 0.49904516339302063, 'learning_rate': 1.3411849943927513e-05, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.09, 'epoch': 1.67}
 83%|███████████████████████████████████████████████████████████████████▌             | 16780/20117 [10:43:05<2:03:46,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16781/20117 [10:43:07<2:04:39,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16782/20117 [10:43:09<2:09:23,  2.33s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16783/20117 [10:43:12<2:08:13,  2.31s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16784/20117 [10:43:14<2:06:02,  2.27s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16785/20117 [10:43:16<2:05:24,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16786/20117 [10:43:18<2:05:19,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16787/20117 [10:43:21<2:04:24,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16788/20117 [10:43:23<2:03:52,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16789/20117 [10:43:25<2:04:37,  2.25s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16790/20117 [10:43:27<2:05:55,  2.27s/it]                                                                                                                                 {'loss': 0.1091, 'grad_norm': 0.19544100761413574, 'learning_rate': 1.3333444398932205e-05, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 289.19, 'epoch': 1.67}
 83%|███████████████████████████████████████████████████████████████████▌             | 16790/20117 [10:43:27<2:05:55,  2.27s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16791/20117 [10:43:30<2:05:29,  2.26s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16792/20117 [10:43:32<2:04:13,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16793/20117 [10:43:34<2:03:27,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16794/20117 [10:43:36<2:03:20,  2.23s/it] 83%|███████████████████████████████████████████████████████████████████▌             | 16795/20117 [10:43:39<2:03:52,  2.24s/it] 83%|███████████████████████████████████████████████████████████████████▋             | 16796/20117 [10:43:41<2:02:58,  2.22s/it] 83%|███████████████████████████████████████████████████████████████████▋             | 16797/20117 [10:43:43<2:03:27,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16798/20117 [10:43:45<2:02:50,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16799/20117 [10:43:47<2:03:13,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16800/20117 [10:43:50<2:04:01,  2.24s/it]                                                                                                                                 {'loss': 0.1004, 'grad_norm': 0.31488218903541565, 'learning_rate': 1.325525233198076e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.73, 'epoch': 1.67}
 84%|███████████████████████████████████████████████████████████████████▋             | 16800/20117 [10:43:50<2:04:01,  2.24s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16801/20117 [10:43:52<2:03:21,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16802/20117 [10:43:54<2:02:34,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16803/20117 [10:43:56<2:02:10,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16804/20117 [10:43:58<2:01:35,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16805/20117 [10:44:01<2:01:59,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16806/20117 [10:44:03<2:01:20,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16807/20117 [10:44:05<2:00:57,  2.19s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16808/20117 [10:44:07<2:00:43,  2.19s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16809/20117 [10:44:10<2:02:42,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16810/20117 [10:44:12<2:01:44,  2.21s/it]                                                                                                                                 {'loss': 0.1112, 'grad_norm': 0.5296607613563538, 'learning_rate': 1.3177273935676715e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.62, 'epoch': 1.67}
 84%|███████████████████████████████████████████████████████████████████▋             | 16810/20117 [10:44:12<2:01:44,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16811/20117 [10:44:14<2:01:45,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16812/20117 [10:44:16<2:01:39,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16813/20117 [10:44:18<2:02:30,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16814/20117 [10:44:21<2:01:52,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16815/20117 [10:44:23<2:01:58,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16816/20117 [10:44:25<2:01:46,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16817/20117 [10:44:27<2:01:40,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16818/20117 [10:44:29<2:01:00,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16819/20117 [10:44:32<2:01:02,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16820/20117 [10:44:34<2:01:11,  2.21s/it]                                                                                                                                 {'loss': 0.1181, 'grad_norm': 0.4741019308567047, 'learning_rate': 1.3099509402097377e-05, 'memory/max_active (GiB)': 19.11, 'memory/max_allocated (GiB)': 19.11, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.04, 'epoch': 1.67}
 84%|███████████████████████████████████████████████████████████████████▋             | 16820/20117 [10:44:34<2:01:11,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16821/20117 [10:44:36<2:01:05,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16822/20117 [10:44:38<2:02:00,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16823/20117 [10:44:40<2:01:06,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16824/20117 [10:44:43<2:00:31,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16825/20117 [10:44:45<2:00:50,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▋             | 16826/20117 [10:44:47<2:00:58,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16827/20117 [10:44:49<2:01:18,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16828/20117 [10:44:52<2:01:48,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16829/20117 [10:44:54<2:01:55,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16830/20117 [10:44:56<2:01:45,  2.22s/it]                                                                                                                                 {'loss': 0.1247, 'grad_norm': 0.545759916305542, 'learning_rate': 1.3021958922793209e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.83, 'epoch': 1.67}
 84%|███████████████████████████████████████████████████████████████████▊             | 16830/20117 [10:44:56<2:01:45,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16831/20117 [10:44:58<2:01:03,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16832/20117 [10:45:00<2:00:39,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16833/20117 [10:45:03<2:00:12,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16834/20117 [10:45:05<2:00:00,  2.19s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16835/20117 [10:45:07<2:00:16,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16836/20117 [10:45:09<2:05:00,  2.29s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16837/20117 [10:45:12<2:04:06,  2.27s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16838/20117 [10:45:14<2:03:34,  2.26s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16839/20117 [10:45:16<2:01:56,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16840/20117 [10:45:18<2:01:07,  2.22s/it]                                                                                                                                 {'loss': 0.1867, 'grad_norm': 0.5977072715759277, 'learning_rate': 1.2944622688787445e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.62, 'epoch': 1.67}
 84%|███████████████████████████████████████████████████████████████████▊             | 16840/20117 [10:45:18<2:01:07,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16841/20117 [10:45:20<2:00:19,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16842/20117 [10:45:23<2:01:09,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16843/20117 [10:45:25<2:00:36,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16844/20117 [10:45:27<1:59:59,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16845/20117 [10:45:29<1:59:49,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16846/20117 [10:45:31<1:59:53,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16847/20117 [10:45:34<1:59:40,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16848/20117 [10:45:36<1:59:31,  2.19s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16849/20117 [10:45:38<1:59:40,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16850/20117 [10:45:40<1:59:12,  2.19s/it]                                                                                                                                 {'loss': 0.2069, 'grad_norm': 0.6563873887062073, 'learning_rate': 1.2867500890575601e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.57, 'epoch': 1.68}
 84%|███████████████████████████████████████████████████████████████████▊             | 16850/20117 [10:45:40<1:59:12,  2.19s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16851/20117 [10:45:42<1:59:36,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16852/20117 [10:45:45<2:00:29,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16853/20117 [10:45:47<2:01:46,  2.24s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16854/20117 [10:45:49<2:01:00,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16855/20117 [10:45:51<2:00:38,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16856/20117 [10:45:53<1:59:45,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▊             | 16857/20117 [10:45:56<2:00:14,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16858/20117 [10:45:58<1:59:25,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16859/20117 [10:46:00<2:00:04,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16860/20117 [10:46:02<1:59:08,  2.19s/it]                                                                                                                                 {'loss': 0.1413, 'grad_norm': 0.5459251999855042, 'learning_rate': 1.279059371812491e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 355.16, 'epoch': 1.68}
 84%|███████████████████████████████████████████████████████████████████▉             | 16860/20117 [10:46:02<1:59:08,  2.19s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16861/20117 [10:46:05<2:00:06,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16862/20117 [10:46:07<2:00:30,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16863/20117 [10:46:09<2:02:21,  2.26s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16864/20117 [10:46:11<2:01:43,  2.25s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16865/20117 [10:46:14<2:00:42,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16866/20117 [10:46:16<1:59:52,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16867/20117 [10:46:18<1:59:16,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16868/20117 [10:46:20<1:59:20,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16869/20117 [10:46:22<2:00:05,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16870/20117 [10:46:25<1:59:40,  2.21s/it]                                                                                                                                 {'loss': 0.1136, 'grad_norm': 0.5088791847229004, 'learning_rate': 1.2713901360874037e-05, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.52, 'epoch': 1.68}
 84%|███████████████████████████████████████████████████████████████████▉             | 16870/20117 [10:46:25<1:59:40,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16871/20117 [10:46:27<2:00:08,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16872/20117 [10:46:29<1:59:45,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16873/20117 [10:46:31<1:59:25,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16874/20117 [10:46:33<1:59:01,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16875/20117 [10:46:36<1:58:57,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16876/20117 [10:46:38<1:59:06,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16877/20117 [10:46:40<1:59:13,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16878/20117 [10:46:42<1:58:47,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16879/20117 [10:46:44<1:58:46,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16880/20117 [10:46:47<1:58:42,  2.20s/it]                                                                                                                                 {'loss': 0.1708, 'grad_norm': 0.6554404497146606, 'learning_rate': 1.2637424007732434e-05, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.39, 'epoch': 1.68}
 84%|███████████████████████████████████████████████████████████████████▉             | 16880/20117 [10:46:47<1:58:42,  2.20s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16881/20117 [10:46:49<2:00:28,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16882/20117 [10:46:51<2:00:29,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16883/20117 [10:46:53<1:59:56,  2.23s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16884/20117 [10:46:55<1:58:49,  2.21s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16885/20117 [10:46:58<1:59:45,  2.22s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16886/20117 [10:47:00<2:00:50,  2.24s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16887/20117 [10:47:03<2:06:42,  2.35s/it] 84%|███████████████████████████████████████████████████████████████████▉             | 16888/20117 [10:47:05<2:05:03,  2.32s/it] 84%|████████████████████████████████████████████████████████████████████             | 16889/20117 [10:47:07<2:02:03,  2.27s/it] 84%|████████████████████████████████████████████████████████████████████             | 16890/20117 [10:47:09<2:02:16,  2.27s/it]                                                                                                                                 {'loss': 0.1766, 'grad_norm': 0.49274370074272156, 'learning_rate': 1.2561161847080028e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.39, 'epoch': 1.68}
 84%|████████████████████████████████████████████████████████████████████             | 16890/20117 [10:47:09<2:02:16,  2.27s/it] 84%|████████████████████████████████████████████████████████████████████             | 16891/20117 [10:47:12<2:02:00,  2.27s/it] 84%|████████████████████████████████████████████████████████████████████             | 16892/20117 [10:47:14<2:00:17,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████             | 16893/20117 [10:47:16<1:59:24,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████             | 16894/20117 [10:47:18<1:59:31,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████             | 16895/20117 [10:47:20<1:58:56,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████             | 16896/20117 [10:47:23<1:59:07,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████             | 16897/20117 [10:47:25<1:59:32,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████             | 16898/20117 [10:47:27<1:59:14,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████             | 16899/20117 [10:47:29<1:58:27,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████             | 16900/20117 [10:47:31<1:58:23,  2.21s/it]                                                                                                                                 {'loss': 0.1887, 'grad_norm': 0.6049704551696777, 'learning_rate': 1.2485115066766584e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 395.86, 'epoch': 1.68}
 84%|████████████████████████████████████████████████████████████████████             | 16900/20117 [10:47:31<1:58:23,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████             | 16901/20117 [10:47:34<1:58:39,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████             | 16902/20117 [10:47:36<1:59:17,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████             | 16903/20117 [10:47:38<2:00:36,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████             | 16904/20117 [10:47:40<2:00:37,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████             | 16905/20117 [10:47:43<2:00:45,  2.26s/it] 84%|████████████████████████████████████████████████████████████████████             | 16906/20117 [10:47:45<2:01:12,  2.26s/it] 84%|████████████████████████████████████████████████████████████████████             | 16907/20117 [10:47:47<2:00:27,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████             | 16908/20117 [10:47:49<2:00:12,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████             | 16909/20117 [10:47:52<1:59:55,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████             | 16910/20117 [10:47:54<2:00:46,  2.26s/it]                                                                                                                                 {'loss': 0.1627, 'grad_norm': 0.45951762795448303, 'learning_rate': 1.2409283854111442e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 374.46, 'epoch': 1.68}
 84%|████████████████████████████████████████████████████████████████████             | 16910/20117 [10:47:54<2:00:46,  2.26s/it] 84%|████████████████████████████████████████████████████████████████████             | 16911/20117 [10:47:56<1:59:54,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████             | 16912/20117 [10:47:58<2:00:04,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████             | 16913/20117 [10:48:01<2:00:02,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████             | 16914/20117 [10:48:03<1:59:16,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████             | 16915/20117 [10:48:05<1:58:53,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████             | 16916/20117 [10:48:07<1:59:49,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████             | 16917/20117 [10:48:10<1:59:16,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████             | 16918/20117 [10:48:12<1:58:53,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████             | 16919/20117 [10:48:14<1:58:18,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16920/20117 [10:48:16<1:59:03,  2.23s/it]                                                                                                                                 {'loss': 0.1635, 'grad_norm': 0.31577378511428833, 'learning_rate': 1.2333668395902875e-05, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 302.79, 'epoch': 1.68}
 84%|████████████████████████████████████████████████████████████████████▏            | 16920/20117 [10:48:16<1:59:03,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16921/20117 [10:48:19<2:00:49,  2.27s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16922/20117 [10:48:21<2:00:04,  2.26s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16923/20117 [10:48:23<1:59:37,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16924/20117 [10:48:25<1:59:12,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16925/20117 [10:48:28<1:58:45,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16926/20117 [10:48:30<1:59:25,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16927/20117 [10:48:32<1:58:47,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16928/20117 [10:48:34<1:58:24,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16929/20117 [10:48:37<1:58:43,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16930/20117 [10:48:39<1:59:27,  2.25s/it]                                                                                                                                 {'loss': 0.0991, 'grad_norm': 0.3830484449863434, 'learning_rate': 1.225826887839776e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 384.62, 'epoch': 1.68}
 84%|████████████████████████████████████████████████████████████████████▏            | 16930/20117 [10:48:39<1:59:27,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16931/20117 [10:48:41<1:59:46,  2.26s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16932/20117 [10:48:43<1:59:41,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16933/20117 [10:48:46<1:58:50,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16934/20117 [10:48:48<1:58:10,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16935/20117 [10:48:50<1:57:43,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16936/20117 [10:48:52<1:56:51,  2.20s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16937/20117 [10:48:54<1:57:38,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16938/20117 [10:48:57<1:57:10,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16939/20117 [10:48:59<1:59:04,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16940/20117 [10:49:01<1:58:34,  2.24s/it]                                                                                                                                 {'loss': 0.1612, 'grad_norm': 0.4875403642654419, 'learning_rate': 1.2183085487321022e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 393.48, 'epoch': 1.68}
 84%|████████████████████████████████████████████████████████████████████▏            | 16940/20117 [10:49:01<1:58:34,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16941/20117 [10:49:04<2:03:08,  2.33s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16942/20117 [10:49:06<2:00:39,  2.28s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16943/20117 [10:49:08<1:59:14,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16944/20117 [10:49:10<1:58:22,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16945/20117 [10:49:12<1:58:32,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16946/20117 [10:49:15<1:58:29,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16947/20117 [10:49:17<1:57:25,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16948/20117 [10:49:19<1:56:55,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16949/20117 [10:49:21<1:57:16,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▏            | 16950/20117 [10:49:23<1:56:58,  2.22s/it]                                                                                                                                 {'loss': 0.1823, 'grad_norm': 0.4757268726825714, 'learning_rate': 1.2108118407865254e-05, 'memory/max_active (GiB)': 20.61, 'memory/max_allocated (GiB)': 20.61, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.56, 'epoch': 1.69}
 84%|████████████████████████████████████████████████████████████████████▏            | 16950/20117 [10:49:23<1:56:58,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16951/20117 [10:49:26<1:57:39,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16952/20117 [10:49:28<1:57:15,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16953/20117 [10:49:30<1:57:33,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16954/20117 [10:49:33<1:59:00,  2.26s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16955/20117 [10:49:35<1:58:19,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16956/20117 [10:49:37<1:57:30,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16957/20117 [10:49:39<1:57:19,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16958/20117 [10:49:41<1:56:19,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16959/20117 [10:49:44<1:57:46,  2.24s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16960/20117 [10:49:46<1:56:31,  2.21s/it]                                                                                                                                 {'loss': 0.1261, 'grad_norm': 0.3833984136581421, 'learning_rate': 1.2033367824690223e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 431.85, 'epoch': 1.69}
 84%|████████████████████████████████████████████████████████████████████▎            | 16960/20117 [10:49:46<1:56:31,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16961/20117 [10:49:48<1:56:19,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16962/20117 [10:49:50<1:56:20,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16963/20117 [10:49:52<1:57:16,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16964/20117 [10:49:55<1:57:11,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16965/20117 [10:49:57<1:57:10,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16966/20117 [10:49:59<1:56:51,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16967/20117 [10:50:01<1:56:02,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16968/20117 [10:50:04<1:55:44,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16969/20117 [10:50:06<1:55:01,  2.19s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16970/20117 [10:50:08<1:55:19,  2.20s/it]                                                                                                                                 {'loss': 0.1545, 'grad_norm': 0.35014206171035767, 'learning_rate': 1.1958833921922418e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.84, 'epoch': 1.69}
 84%|████████████████████████████████████████████████████████████████████▎            | 16970/20117 [10:50:08<1:55:19,  2.20s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16971/20117 [10:50:10<1:54:59,  2.19s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16972/20117 [10:50:12<1:56:23,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16973/20117 [10:50:15<1:56:20,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16974/20117 [10:50:17<1:55:50,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16975/20117 [10:50:19<1:56:52,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16976/20117 [10:50:21<1:56:38,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16977/20117 [10:50:23<1:55:33,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16978/20117 [10:50:26<1:56:06,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16979/20117 [10:50:28<1:55:18,  2.20s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16980/20117 [10:50:30<1:55:26,  2.21s/it]                                                                                                                                 {'loss': 0.1116, 'grad_norm': 0.5870986580848694, 'learning_rate': 1.1884516883154606e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.54, 'epoch': 1.69}
 84%|████████████████████████████████████████████████████████████████████▎            | 16980/20117 [10:50:30<1:55:26,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▎            | 16981/20117 [10:50:32<1:56:25,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16982/20117 [10:50:35<1:55:29,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16983/20117 [10:50:37<1:54:54,  2.20s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16984/20117 [10:50:39<1:55:07,  2.20s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16985/20117 [10:50:41<1:55:29,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16986/20117 [10:50:43<1:55:49,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16987/20117 [10:50:46<1:55:58,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16988/20117 [10:50:48<1:55:40,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16989/20117 [10:50:50<1:56:09,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16990/20117 [10:50:52<1:55:48,  2.22s/it]                                                                                                                                 {'loss': 0.1723, 'grad_norm': 0.3304491639137268, 'learning_rate': 1.1810416891445319e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.93, 'epoch': 1.69}
 84%|████████████████████████████████████████████████████████████████████▍            | 16990/20117 [10:50:52<1:55:48,  2.22s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16991/20117 [10:50:55<1:56:06,  2.23s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16992/20117 [10:50:57<1:55:04,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16993/20117 [10:50:59<1:55:15,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16994/20117 [10:51:01<1:55:13,  2.21s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16995/20117 [10:51:04<1:59:37,  2.30s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16996/20117 [10:51:06<1:58:47,  2.28s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16997/20117 [10:51:08<1:57:13,  2.25s/it] 84%|████████████████████████████████████████████████████████████████████▍            | 16998/20117 [10:51:10<1:56:20,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 16999/20117 [10:51:12<1:55:49,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17000/20117 [10:51:15<1:54:58,  2.21s/it]                                                                                                                                 {'loss': 0.1596, 'grad_norm': 0.295502632856369, 'learning_rate': 1.1736534129318532e-05, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.23, 'epoch': 1.69}
 85%|████████████████████████████████████████████████████████████████████▍            | 17000/20117 [10:51:15<1:54:58,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17001/20117 [10:51:17<1:54:24,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17002/20117 [10:51:19<1:53:48,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17003/20117 [10:51:21<1:54:08,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17004/20117 [10:51:23<1:54:03,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17005/20117 [10:51:26<1:54:30,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17006/20117 [10:51:28<1:54:00,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17007/20117 [10:51:30<1:54:57,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17008/20117 [10:51:32<1:54:51,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17009/20117 [10:51:34<1:54:43,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17010/20117 [10:51:37<1:55:09,  2.22s/it]                                                                                                                                 {'loss': 0.1732, 'grad_norm': 0.48762744665145874, 'learning_rate': 1.1662868778763092e-05, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 390.51, 'epoch': 1.69}
 85%|████████████████████████████████████████████████████████████████████▍            | 17010/20117 [10:51:37<1:55:09,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17011/20117 [10:51:39<1:54:48,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▍            | 17012/20117 [10:51:41<1:54:40,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17013/20117 [10:51:43<1:54:59,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17014/20117 [10:51:46<1:54:11,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17015/20117 [10:51:48<1:54:27,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17016/20117 [10:51:50<1:54:12,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17017/20117 [10:51:52<1:55:11,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17018/20117 [10:51:54<1:55:12,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17019/20117 [10:51:57<1:54:25,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17020/20117 [10:51:59<1:55:57,  2.25s/it]                                                                                                                                 {'loss': 0.1351, 'grad_norm': 0.4020582437515259, 'learning_rate': 1.1589421021232338e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.17, 'epoch': 1.69}
 85%|████████████████████████████████████████████████████████████████████▌            | 17020/20117 [10:51:59<1:55:57,  2.25s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17021/20117 [10:52:01<1:55:26,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17022/20117 [10:52:03<1:55:15,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17023/20117 [10:52:06<1:55:03,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17024/20117 [10:52:08<1:54:38,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17025/20117 [10:52:10<1:54:07,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17026/20117 [10:52:12<1:53:55,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17027/20117 [10:52:15<1:54:25,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17028/20117 [10:52:17<1:53:50,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17029/20117 [10:52:19<1:53:42,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17030/20117 [10:52:21<1:53:13,  2.20s/it]                                                                                                                                 {'loss': 0.1425, 'grad_norm': 0.5929551124572754, 'learning_rate': 1.1516191037643598e-05, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.59, 'epoch': 1.69}
 85%|████████████████████████████████████████████████████████████████████▌            | 17030/20117 [10:52:21<1:53:13,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17031/20117 [10:52:23<1:52:45,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17032/20117 [10:52:25<1:53:10,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17033/20117 [10:52:28<1:52:46,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17034/20117 [10:52:30<1:53:59,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17035/20117 [10:52:32<1:53:34,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17036/20117 [10:52:34<1:52:55,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17037/20117 [10:52:37<1:53:36,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17038/20117 [10:52:39<1:52:43,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17039/20117 [10:52:41<1:52:56,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17040/20117 [10:52:43<1:52:42,  2.20s/it]                                                                                                                                 {'loss': 0.1139, 'grad_norm': 0.47912150621414185, 'learning_rate': 1.1443179008377825e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.04, 'epoch': 1.69}
 85%|████████████████████████████████████████████████████████████████████▌            | 17040/20117 [10:52:43<1:52:42,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17041/20117 [10:52:45<1:52:03,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17042/20117 [10:52:47<1:51:45,  2.18s/it] 85%|████████████████████████████████████████████████████████████████████▌            | 17043/20117 [10:52:50<1:52:04,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17044/20117 [10:52:52<1:52:14,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17045/20117 [10:52:54<1:52:31,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17046/20117 [10:52:57<1:56:38,  2.28s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17047/20117 [10:52:59<1:57:14,  2.29s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17048/20117 [10:53:01<1:54:51,  2.25s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17049/20117 [10:53:03<1:54:17,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17050/20117 [10:53:05<1:54:48,  2.25s/it]                                                                                                                                 {'loss': 0.1163, 'grad_norm': 0.3503008782863617, 'learning_rate': 1.1370385113279047e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.13, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▋            | 17050/20117 [10:53:05<1:54:48,  2.25s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17051/20117 [10:53:08<1:53:40,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17052/20117 [10:53:10<1:53:40,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17053/20117 [10:53:12<1:53:14,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17054/20117 [10:53:14<1:52:55,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17055/20117 [10:53:17<1:54:10,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17056/20117 [10:53:19<1:54:23,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17057/20117 [10:53:21<1:53:19,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17058/20117 [10:53:23<1:52:59,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17059/20117 [10:53:25<1:52:54,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17060/20117 [10:53:28<1:53:51,  2.23s/it]                                                                                                                                 {'loss': 0.1619, 'grad_norm': 0.3980488181114197, 'learning_rate': 1.1297809531654046e-05, 'memory/max_active (GiB)': 20.78, 'memory/max_allocated (GiB)': 20.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.97, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▋            | 17060/20117 [10:53:28<1:53:51,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17061/20117 [10:53:30<1:53:37,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17062/20117 [10:53:32<1:54:06,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17063/20117 [10:53:34<1:53:49,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17064/20117 [10:53:37<1:53:38,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17065/20117 [10:53:39<1:53:21,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17066/20117 [10:53:41<1:53:12,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17067/20117 [10:53:43<1:52:29,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17068/20117 [10:53:45<1:52:54,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17069/20117 [10:53:48<1:52:55,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17070/20117 [10:53:50<1:53:13,  2.23s/it]                                                                                                                                 {'loss': 0.1308, 'grad_norm': 0.5639301538467407, 'learning_rate': 1.1225452442271789e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.08, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▋            | 17070/20117 [10:53:50<1:53:13,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17071/20117 [10:53:52<1:52:48,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17072/20117 [10:53:54<1:53:32,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17073/20117 [10:53:57<1:53:48,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▋            | 17074/20117 [10:53:59<1:53:30,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17075/20117 [10:54:01<1:53:38,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17076/20117 [10:54:03<1:54:24,  2.26s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17077/20117 [10:54:06<1:54:07,  2.25s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17078/20117 [10:54:08<1:53:50,  2.25s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17079/20117 [10:54:10<1:53:00,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17080/20117 [10:54:12<1:53:08,  2.24s/it]                                                                                                                                 {'loss': 0.1434, 'grad_norm': 0.20409265160560608, 'learning_rate': 1.1153314023363126e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.47, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▊            | 17080/20117 [10:54:12<1:53:08,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17081/20117 [10:54:15<1:52:25,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17082/20117 [10:54:17<1:52:29,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17083/20117 [10:54:19<1:52:01,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17084/20117 [10:54:21<1:51:34,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17085/20117 [10:54:23<1:51:49,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17086/20117 [10:54:26<1:51:11,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17087/20117 [10:54:28<1:51:11,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17088/20117 [10:54:30<1:52:09,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17089/20117 [10:54:32<1:52:36,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17090/20117 [10:54:35<1:52:27,  2.23s/it]                                                                                                                                 {'loss': 0.1646, 'grad_norm': 0.3582451343536377, 'learning_rate': 1.1081394452620164e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 296.97, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▊            | 17090/20117 [10:54:35<1:52:27,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17091/20117 [10:54:37<1:52:41,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17092/20117 [10:54:39<1:52:17,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17093/20117 [10:54:41<1:52:18,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17094/20117 [10:54:43<1:52:24,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17095/20117 [10:54:46<1:51:46,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17096/20117 [10:54:48<1:51:07,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17097/20117 [10:54:50<1:50:37,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17098/20117 [10:54:52<1:50:04,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17099/20117 [10:54:55<1:54:55,  2.28s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17100/20117 [10:54:57<1:53:26,  2.26s/it]                                                                                                                                 {'loss': 0.1628, 'grad_norm': 0.396576851606369, 'learning_rate': 1.100969390719605e-05, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 324.04, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▊            | 17100/20117 [10:54:57<1:53:26,  2.26s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17101/20117 [10:54:59<1:52:39,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17102/20117 [10:55:01<1:53:11,  2.25s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17103/20117 [10:55:04<1:52:47,  2.25s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17104/20117 [10:55:06<1:52:31,  2.24s/it] 85%|████████████████████████████████████████████████████████████████████▊            | 17105/20117 [10:55:08<1:51:51,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17106/20117 [10:55:10<1:52:00,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17107/20117 [10:55:12<1:52:06,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17108/20117 [10:55:15<1:51:32,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17109/20117 [10:55:17<1:51:52,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17110/20117 [10:55:19<1:51:45,  2.23s/it]                                                                                                                                 {'loss': 0.1304, 'grad_norm': 0.53066086769104, 'learning_rate': 1.0938212563704364e-05, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 414.9, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▉            | 17110/20117 [10:55:19<1:51:45,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17111/20117 [10:55:21<1:51:36,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17112/20117 [10:55:24<1:51:49,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17113/20117 [10:55:26<1:51:05,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17114/20117 [10:55:28<1:50:41,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17115/20117 [10:55:30<1:51:13,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17116/20117 [10:55:32<1:51:34,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17117/20117 [10:55:35<1:51:17,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17118/20117 [10:55:37<1:50:41,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17119/20117 [10:55:39<1:50:19,  2.21s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17120/20117 [10:55:41<1:49:39,  2.20s/it]                                                                                                                                 {'loss': 0.1336, 'grad_norm': 0.4172995388507843, 'learning_rate': 1.0866950598218772e-05, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.1, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▉            | 17120/20117 [10:55:41<1:49:39,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17121/20117 [10:55:43<1:49:40,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17122/20117 [10:55:46<1:49:40,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17123/20117 [10:55:48<1:49:36,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17124/20117 [10:55:50<1:49:17,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17125/20117 [10:55:52<1:49:05,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17126/20117 [10:55:54<1:49:05,  2.19s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17127/20117 [10:55:57<1:49:48,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17128/20117 [10:55:59<1:49:47,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17129/20117 [10:56:01<1:49:18,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17130/20117 [10:56:03<1:48:39,  2.18s/it]                                                                                                                                 {'loss': 0.1354, 'grad_norm': 0.8753495216369629, 'learning_rate': 1.0795908186272585e-05, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.94, 'epoch': 1.7}
 85%|████████████████████████████████████████████████████████████████████▉            | 17130/20117 [10:56:03<1:48:39,  2.18s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17131/20117 [10:56:05<1:49:29,  2.20s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17132/20117 [10:56:08<1:50:27,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17133/20117 [10:56:10<1:50:40,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17134/20117 [10:56:12<1:50:51,  2.23s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17135/20117 [10:56:14<1:50:11,  2.22s/it] 85%|████████████████████████████████████████████████████████████████████▉            | 17136/20117 [10:56:17<1:49:35,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17137/20117 [10:56:19<1:49:12,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17138/20117 [10:56:21<1:48:58,  2.19s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17139/20117 [10:56:23<1:49:27,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17140/20117 [10:56:25<1:48:48,  2.19s/it]                                                                                                                                 {'loss': 0.1675, 'grad_norm': 0.5303057432174683, 'learning_rate': 1.0725085502858223e-05, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 418.9, 'epoch': 1.7}
 85%|█████████████████████████████████████████████████████████████████████            | 17140/20117 [10:56:25<1:48:48,  2.19s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17141/20117 [10:56:28<1:49:11,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17142/20117 [10:56:30<1:49:14,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17143/20117 [10:56:32<1:49:14,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17144/20117 [10:56:34<1:48:59,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17145/20117 [10:56:36<1:48:29,  2.19s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17146/20117 [10:56:39<1:49:15,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17147/20117 [10:56:41<1:49:00,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17148/20117 [10:56:43<1:48:51,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17149/20117 [10:56:45<1:48:24,  2.19s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17150/20117 [10:56:47<1:48:16,  2.19s/it]                                                                                                                                 {'loss': 0.1446, 'grad_norm': 0.3423836827278137, 'learning_rate': 1.0654482722426984e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.63, 'epoch': 1.7}
 85%|█████████████████████████████████████████████████████████████████████            | 17150/20117 [10:56:47<1:48:16,  2.19s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17151/20117 [10:56:49<1:48:23,  2.19s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17152/20117 [10:56:52<1:49:05,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17153/20117 [10:56:54<1:53:04,  2.29s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17154/20117 [10:56:56<1:51:37,  2.26s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17155/20117 [10:56:59<1:50:49,  2.24s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17156/20117 [10:57:01<1:50:37,  2.24s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17157/20117 [10:57:03<1:50:22,  2.24s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17158/20117 [10:57:05<1:49:40,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17159/20117 [10:57:07<1:49:28,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17160/20117 [10:57:10<1:49:31,  2.22s/it]                                                                                                                                 {'loss': 0.173, 'grad_norm': 0.46940287947654724, 'learning_rate': 1.0584100018888376e-05, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 398.16, 'epoch': 1.71}
 85%|█████████████████████████████████████████████████████████████████████            | 17160/20117 [10:57:10<1:49:31,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17161/20117 [10:57:12<1:50:00,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17162/20117 [10:57:14<1:49:51,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17163/20117 [10:57:16<1:49:24,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17164/20117 [10:57:19<1:49:27,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17165/20117 [10:57:21<1:49:43,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17166/20117 [10:57:23<1:48:52,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████            | 17167/20117 [10:57:25<1:49:07,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17168/20117 [10:57:27<1:49:05,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17169/20117 [10:57:30<1:48:38,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17170/20117 [10:57:32<1:48:19,  2.21s/it]                                                                                                                                 {'loss': 0.1594, 'grad_norm': 0.5457295775413513, 'learning_rate': 1.0513937565609922e-05, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.42, 'epoch': 1.71}
 85%|█████████████████████████████████████████████████████████████████████▏           | 17170/20117 [10:57:32<1:48:19,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17171/20117 [10:57:34<1:47:52,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17172/20117 [10:57:36<1:49:13,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17173/20117 [10:57:39<1:49:49,  2.24s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17174/20117 [10:57:41<1:50:50,  2.26s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17175/20117 [10:57:43<1:49:53,  2.24s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17176/20117 [10:57:45<1:49:13,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17177/20117 [10:57:48<1:49:09,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17178/20117 [10:57:50<1:48:30,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17179/20117 [10:57:52<1:48:44,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17180/20117 [10:57:54<1:48:47,  2.22s/it]                                                                                                                                 {'loss': 0.0897, 'grad_norm': 0.5194307565689087, 'learning_rate': 1.044399553541653e-05, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.63, 'epoch': 1.71}
 85%|█████████████████████████████████████████████████████████████████████▏           | 17180/20117 [10:57:54<1:48:47,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17181/20117 [10:57:56<1:49:01,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17182/20117 [10:57:59<1:48:09,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17183/20117 [10:58:01<1:48:59,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17184/20117 [10:58:03<1:48:33,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17185/20117 [10:58:05<1:48:01,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17186/20117 [10:58:07<1:47:38,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17187/20117 [10:58:10<1:47:38,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17188/20117 [10:58:12<1:48:10,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17189/20117 [10:58:14<1:48:23,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17190/20117 [10:58:16<1:48:09,  2.22s/it]                                                                                                                                 {'loss': 0.157, 'grad_norm': 0.6305141448974609, 'learning_rate': 1.0374274100590254e-05, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.77, 'epoch': 1.71}
 85%|█████████████████████████████████████████████████████████████████████▏           | 17190/20117 [10:58:16<1:48:09,  2.22s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17191/20117 [10:58:19<1:48:36,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17192/20117 [10:58:21<1:48:54,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17193/20117 [10:58:23<1:49:03,  2.24s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17194/20117 [10:58:25<1:49:11,  2.24s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17195/20117 [10:58:28<1:48:40,  2.23s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17196/20117 [10:58:30<1:47:45,  2.21s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17197/20117 [10:58:32<1:47:12,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████▏           | 17198/20117 [10:58:34<1:46:50,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████▎           | 17199/20117 [10:58:36<1:47:00,  2.20s/it] 85%|█████████████████████████████████████████████████████████████████████▎           | 17200/20117 [10:58:38<1:46:52,  2.20s/it]                                                                                                                                 {'loss': 0.2061, 'grad_norm': 0.7084689140319824, 'learning_rate': 1.0304773432869675e-05, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.04, 'epoch': 1.71}
 85%|█████████████████████████████████████████████████████████████████████▎           | 17200/20117 [10:58:38<1:46:52,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17201/20117 [10:58:41<1:46:50,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17202/20117 [10:58:43<1:46:37,  2.19s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17203/20117 [10:58:45<1:46:22,  2.19s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17204/20117 [10:58:47<1:46:09,  2.19s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17205/20117 [10:58:50<1:50:17,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17206/20117 [10:58:52<1:49:11,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17207/20117 [10:58:54<1:48:29,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17208/20117 [10:58:56<1:47:23,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17209/20117 [10:58:58<1:47:07,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17210/20117 [10:59:01<1:46:22,  2.20s/it]                                                                                                                                 {'loss': 0.1498, 'grad_norm': 0.5986453294754028, 'learning_rate': 1.0235493703449673e-05, 'memory/max_active (GiB)': 17.01, 'memory/max_allocated (GiB)': 17.01, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.56, 'epoch': 1.71}
 86%|█████████████████████████████████████████████████████████████████████▎           | 17210/20117 [10:59:01<1:46:22,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17211/20117 [10:59:03<1:45:49,  2.18s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17212/20117 [10:59:05<1:47:21,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17213/20117 [10:59:07<1:46:34,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17214/20117 [10:59:10<1:49:17,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17215/20117 [10:59:12<1:48:33,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17216/20117 [10:59:14<1:47:37,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17217/20117 [10:59:16<1:47:24,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17218/20117 [10:59:18<1:46:56,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17219/20117 [10:59:21<1:46:13,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17220/20117 [10:59:23<1:46:12,  2.20s/it]                                                                                                                                 {'loss': 0.1473, 'grad_norm': 0.4625334143638611, 'learning_rate': 1.0166435082980818e-05, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.72, 'epoch': 1.71}
 86%|█████████████████████████████████████████████████████████████████████▎           | 17220/20117 [10:59:23<1:46:12,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17221/20117 [10:59:25<1:46:42,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17222/20117 [10:59:27<1:46:53,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17223/20117 [10:59:29<1:47:11,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17224/20117 [10:59:32<1:46:53,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17225/20117 [10:59:34<1:47:25,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17226/20117 [10:59:36<1:47:00,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17227/20117 [10:59:38<1:46:48,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17228/20117 [10:59:41<1:47:30,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▎           | 17229/20117 [10:59:43<1:47:08,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17230/20117 [10:59:45<1:46:24,  2.21s/it]                                                                                                                                 {'loss': 0.1247, 'grad_norm': 0.3249736726284027, 'learning_rate': 1.0097597741569109e-05, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 387.91, 'epoch': 1.71}
 86%|█████████████████████████████████████████████████████████████████████▍           | 17230/20117 [10:59:45<1:46:24,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17231/20117 [10:59:47<1:46:35,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17232/20117 [10:59:49<1:46:27,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17233/20117 [10:59:52<1:46:43,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17234/20117 [10:59:54<1:46:49,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17235/20117 [10:59:56<1:46:10,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17236/20117 [10:59:58<1:46:42,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17237/20117 [11:00:01<1:45:50,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17238/20117 [11:00:03<1:47:05,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17239/20117 [11:00:05<1:47:20,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17240/20117 [11:00:07<1:46:20,  2.22s/it]                                                                                                                                 {'loss': 0.1227, 'grad_norm': 0.395663321018219, 'learning_rate': 1.0028981848775499e-05, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 400.44, 'epoch': 1.71}
 86%|█████████████████████████████████████████████████████████████████████▍           | 17240/20117 [11:00:07<1:46:20,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17241/20117 [11:00:09<1:46:51,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17242/20117 [11:00:12<1:46:46,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17243/20117 [11:00:14<1:46:52,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17244/20117 [11:00:16<1:46:23,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17245/20117 [11:00:18<1:47:28,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17246/20117 [11:00:21<1:46:21,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17247/20117 [11:00:23<1:46:32,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17248/20117 [11:00:25<1:47:07,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17249/20117 [11:00:27<1:47:29,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17250/20117 [11:00:30<1:47:05,  2.24s/it]                                                                                                                                 {'loss': 0.1161, 'grad_norm': 0.3278411030769348, 'learning_rate': 9.960587573615376e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 419.85, 'epoch': 1.71}
 86%|█████████████████████████████████████████████████████████████████████▍           | 17250/20117 [11:00:30<1:47:05,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17251/20117 [11:00:32<1:46:56,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17252/20117 [11:00:34<1:46:33,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17253/20117 [11:00:36<1:45:53,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17254/20117 [11:00:38<1:45:22,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17255/20117 [11:00:41<1:45:11,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17256/20117 [11:00:43<1:45:02,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17257/20117 [11:00:45<1:45:33,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17258/20117 [11:00:48<1:49:54,  2.31s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17259/20117 [11:00:50<1:48:40,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▍           | 17260/20117 [11:00:52<1:48:18,  2.27s/it]                                                                                                                                 {'loss': 0.1541, 'grad_norm': 0.27274656295776367, 'learning_rate': 9.892415084558315e-06, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.93, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▍           | 17260/20117 [11:00:52<1:48:18,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17261/20117 [11:00:54<1:47:06,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17262/20117 [11:00:57<1:46:51,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17263/20117 [11:00:59<1:45:41,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17264/20117 [11:01:01<1:46:01,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17265/20117 [11:01:03<1:45:59,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17266/20117 [11:01:05<1:45:42,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17267/20117 [11:01:08<1:45:26,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17268/20117 [11:01:10<1:45:35,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17269/20117 [11:01:12<1:45:01,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17270/20117 [11:01:14<1:44:23,  2.20s/it]                                                                                                                                 {'loss': 0.1644, 'grad_norm': 0.6081755757331848, 'learning_rate': 9.82446454952759e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.9, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▌           | 17270/20117 [11:01:14<1:44:23,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17271/20117 [11:01:16<1:44:56,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17272/20117 [11:01:19<1:44:59,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17273/20117 [11:01:21<1:44:33,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17274/20117 [11:01:23<1:44:43,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17275/20117 [11:01:25<1:44:59,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17276/20117 [11:01:27<1:44:38,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17277/20117 [11:01:30<1:44:29,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17278/20117 [11:01:32<1:44:23,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17279/20117 [11:01:34<1:43:56,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17280/20117 [11:01:36<1:44:14,  2.20s/it]                                                                                                                                 {'loss': 0.1686, 'grad_norm': 0.5921940207481384, 'learning_rate': 9.756736135899724e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 422.69, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▌           | 17280/20117 [11:01:36<1:44:14,  2.20s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17281/20117 [11:01:39<1:45:01,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17282/20117 [11:01:41<1:45:05,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17283/20117 [11:01:43<1:44:33,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17284/20117 [11:01:45<1:44:16,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17285/20117 [11:01:47<1:44:17,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17286/20117 [11:01:50<1:44:46,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17287/20117 [11:01:52<1:46:01,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17288/20117 [11:01:54<1:45:30,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17289/20117 [11:01:56<1:45:10,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17290/20117 [11:01:59<1:45:15,  2.23s/it]                                                                                                                                 {'loss': 0.1546, 'grad_norm': 0.43674856424331665, 'learning_rate': 9.68923001050408e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.0, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▌           | 17290/20117 [11:01:59<1:45:15,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▌           | 17291/20117 [11:02:01<1:45:44,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17292/20117 [11:02:03<1:45:57,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17293/20117 [11:02:05<1:45:52,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17294/20117 [11:02:08<1:47:31,  2.29s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17295/20117 [11:02:10<1:46:37,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17296/20117 [11:02:12<1:45:45,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17297/20117 [11:02:14<1:46:38,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17298/20117 [11:02:17<1:45:37,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17299/20117 [11:02:19<1:45:05,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17300/20117 [11:02:21<1:44:35,  2.23s/it]                                                                                                                                 {'loss': 0.1627, 'grad_norm': 0.39371129870414734, 'learning_rate': 9.621946339622567e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 404.49, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▋           | 17300/20117 [11:02:21<1:44:35,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17301/20117 [11:02:23<1:44:37,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17302/20117 [11:02:26<1:44:16,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17303/20117 [11:02:28<1:44:16,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17304/20117 [11:02:30<1:44:56,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17305/20117 [11:02:32<1:45:27,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17306/20117 [11:02:35<1:44:49,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17307/20117 [11:02:37<1:44:40,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17308/20117 [11:02:39<1:45:40,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17309/20117 [11:02:41<1:46:27,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17310/20117 [11:02:44<1:46:03,  2.27s/it]                                                                                                                                 {'loss': 0.1846, 'grad_norm': 0.6137861609458923, 'learning_rate': 9.554885288989035e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 400.34, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▋           | 17310/20117 [11:02:44<1:46:03,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17311/20117 [11:02:46<1:49:16,  2.34s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17312/20117 [11:02:48<1:47:29,  2.30s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17313/20117 [11:02:51<1:46:22,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17314/20117 [11:02:53<1:46:01,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17315/20117 [11:02:55<1:45:02,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17316/20117 [11:02:57<1:44:29,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17317/20117 [11:02:59<1:44:25,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17318/20117 [11:03:02<1:44:19,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17319/20117 [11:03:04<1:44:57,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17320/20117 [11:03:06<1:45:13,  2.26s/it]                                                                                                                                 {'loss': 0.1593, 'grad_norm': 0.43737614154815674, 'learning_rate': 9.488047023789059e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.64, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▋           | 17320/20117 [11:03:06<1:45:13,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17321/20117 [11:03:08<1:44:58,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▋           | 17322/20117 [11:03:11<1:44:38,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17323/20117 [11:03:13<1:44:31,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17324/20117 [11:03:15<1:44:17,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17325/20117 [11:03:17<1:44:02,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17326/20117 [11:03:20<1:44:18,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17327/20117 [11:03:22<1:43:44,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17328/20117 [11:03:24<1:44:04,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17329/20117 [11:03:26<1:44:41,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17330/20117 [11:03:29<1:45:40,  2.28s/it]                                                                                                                                 {'loss': 0.1476, 'grad_norm': 0.20106801390647888, 'learning_rate': 9.42143170865939e-06, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.76, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▊           | 17330/20117 [11:03:29<1:45:40,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17331/20117 [11:03:31<1:44:41,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17332/20117 [11:03:33<1:44:51,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17333/20117 [11:03:35<1:44:33,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17334/20117 [11:03:38<1:44:23,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17335/20117 [11:03:40<1:43:53,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17336/20117 [11:03:42<1:44:02,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17337/20117 [11:03:45<1:46:40,  2.30s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17338/20117 [11:03:47<1:45:06,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17339/20117 [11:03:49<1:45:28,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17340/20117 [11:03:51<1:44:23,  2.26s/it]                                                                                                                                 {'loss': 0.191, 'grad_norm': 0.652117133140564, 'learning_rate': 9.355039507687657e-06, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 433.9, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▊           | 17340/20117 [11:03:51<1:44:23,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17341/20117 [11:03:54<1:43:49,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17342/20117 [11:03:56<1:43:46,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17343/20117 [11:03:58<1:44:06,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17344/20117 [11:04:00<1:42:58,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17345/20117 [11:04:02<1:42:56,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17346/20117 [11:04:05<1:42:29,  2.22s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17347/20117 [11:04:07<1:42:06,  2.21s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17348/20117 [11:04:09<1:43:06,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17349/20117 [11:04:11<1:43:40,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17350/20117 [11:04:14<1:44:06,  2.26s/it]                                                                                                                                 {'loss': 0.1297, 'grad_norm': 0.39825040102005005, 'learning_rate': 9.288870584411835e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.48, 'epoch': 1.72}
 86%|█████████████████████████████████████████████████████████████████████▊           | 17350/20117 [11:04:14<1:44:06,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17351/20117 [11:04:16<1:45:01,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17352/20117 [11:04:18<1:44:25,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17353/20117 [11:04:21<1:44:53,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▊           | 17354/20117 [11:04:23<1:44:46,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17355/20117 [11:04:25<1:44:17,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17356/20117 [11:04:27<1:43:45,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17357/20117 [11:04:30<1:43:17,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17358/20117 [11:04:32<1:43:02,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17359/20117 [11:04:34<1:42:46,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17360/20117 [11:04:36<1:43:47,  2.26s/it]                                                                                                                                 {'loss': 0.1617, 'grad_norm': 0.20952925086021423, 'learning_rate': 9.222925101820012e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.53, 'epoch': 1.73}
 86%|█████████████████████████████████████████████████████████████████████▉           | 17360/20117 [11:04:36<1:43:47,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17361/20117 [11:04:39<1:43:26,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17362/20117 [11:04:41<1:43:26,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17363/20117 [11:04:43<1:47:15,  2.34s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17364/20117 [11:04:46<1:45:35,  2.30s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17365/20117 [11:04:48<1:44:26,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17366/20117 [11:04:50<1:44:16,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17367/20117 [11:04:52<1:43:26,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17368/20117 [11:04:54<1:43:33,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17369/20117 [11:04:57<1:42:50,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17370/20117 [11:04:59<1:42:11,  2.23s/it]                                                                                                                                 {'loss': 0.2117, 'grad_norm': 0.6603105068206787, 'learning_rate': 9.157203222349853e-06, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 424.27, 'epoch': 1.73}
 86%|█████████████████████████████████████████████████████████████████████▉           | 17370/20117 [11:04:59<1:42:11,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17371/20117 [11:05:01<1:42:35,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17372/20117 [11:05:03<1:42:47,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17373/20117 [11:05:06<1:42:18,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17374/20117 [11:05:08<1:42:23,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17375/20117 [11:05:10<1:43:15,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17376/20117 [11:05:12<1:43:20,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17377/20117 [11:05:15<1:43:59,  2.28s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17378/20117 [11:05:17<1:43:19,  2.26s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17379/20117 [11:05:19<1:42:19,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17380/20117 [11:05:21<1:41:42,  2.23s/it]                                                                                                                                 {'loss': 0.181, 'grad_norm': 0.5097489356994629, 'learning_rate': 9.091705107888204e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 487.99, 'epoch': 1.73}
 86%|█████████████████████████████████████████████████████████████████████▉           | 17380/20117 [11:05:21<1:41:42,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17381/20117 [11:05:24<1:41:39,  2.23s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17382/20117 [11:05:26<1:42:03,  2.24s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17383/20117 [11:05:28<1:42:17,  2.25s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17384/20117 [11:05:30<1:43:13,  2.27s/it] 86%|█████████████████████████████████████████████████████████████████████▉           | 17385/20117 [11:05:33<1:42:43,  2.26s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17386/20117 [11:05:35<1:41:52,  2.24s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17387/20117 [11:05:37<1:42:01,  2.24s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17388/20117 [11:05:39<1:41:25,  2.23s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17389/20117 [11:05:42<1:40:57,  2.22s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17390/20117 [11:05:44<1:41:03,  2.22s/it]                                                                                                                                 {'loss': 0.1491, 'grad_norm': 0.674410879611969, 'learning_rate': 9.026430919770767e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.95, 'epoch': 1.73}
 86%|██████████████████████████████████████████████████████████████████████           | 17390/20117 [11:05:44<1:41:03,  2.22s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17391/20117 [11:05:46<1:41:04,  2.22s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17392/20117 [11:05:48<1:41:32,  2.24s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17393/20117 [11:05:51<1:41:37,  2.24s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17394/20117 [11:05:53<1:42:10,  2.25s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17395/20117 [11:05:55<1:41:10,  2.23s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17396/20117 [11:05:57<1:41:17,  2.23s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17397/20117 [11:05:59<1:41:46,  2.25s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17398/20117 [11:06:02<1:41:17,  2.24s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17399/20117 [11:06:04<1:40:52,  2.23s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17400/20117 [11:06:06<1:40:30,  2.22s/it]                                                                                                                                 {'loss': 0.1672, 'grad_norm': 0.370148241519928, 'learning_rate': 8.961380818781695e-06, 'memory/max_active (GiB)': 21.54, 'memory/max_allocated (GiB)': 21.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.8, 'epoch': 1.73}
 86%|██████████████████████████████████████████████████████████████████████           | 17400/20117 [11:06:06<1:40:30,  2.22s/it] 86%|██████████████████████████████████████████████████████████████████████           | 17401/20117 [11:06:08<1:39:59,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17402/20117 [11:06:11<1:40:22,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17403/20117 [11:06:13<1:41:24,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17404/20117 [11:06:15<1:42:22,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17405/20117 [11:06:17<1:41:57,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17406/20117 [11:06:20<1:41:36,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17407/20117 [11:06:22<1:41:26,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17408/20117 [11:06:24<1:41:04,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17409/20117 [11:06:26<1:40:48,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17410/20117 [11:06:29<1:40:43,  2.23s/it]                                                                                                                                 {'loss': 0.1515, 'grad_norm': 0.5135669112205505, 'learning_rate': 8.896554965153126e-06, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.7, 'epoch': 1.73}
 87%|██████████████████████████████████████████████████████████████████████           | 17410/20117 [11:06:29<1:40:43,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17411/20117 [11:06:31<1:41:33,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17412/20117 [11:06:33<1:40:59,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17413/20117 [11:06:35<1:40:48,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17414/20117 [11:06:38<1:41:57,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17415/20117 [11:06:40<1:41:25,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████           | 17416/20117 [11:06:42<1:45:15,  2.34s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17417/20117 [11:06:45<1:43:29,  2.30s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17418/20117 [11:06:47<1:42:42,  2.28s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17419/20117 [11:06:49<1:41:22,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17420/20117 [11:06:51<1:40:53,  2.24s/it]                                                                                                                                 {'loss': 0.0882, 'grad_norm': 0.5589896440505981, 'learning_rate': 8.831953518564816e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 296.23, 'epoch': 1.73}
 87%|██████████████████████████████████████████████████████████████████████▏          | 17420/20117 [11:06:51<1:40:53,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17421/20117 [11:06:53<1:40:15,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17422/20117 [11:06:56<1:40:11,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17423/20117 [11:06:58<1:41:16,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17424/20117 [11:07:00<1:41:13,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17425/20117 [11:07:02<1:40:56,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17426/20117 [11:07:05<1:41:42,  2.27s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17427/20117 [11:07:07<1:41:44,  2.27s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17428/20117 [11:07:09<1:42:16,  2.28s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17429/20117 [11:07:12<1:41:12,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17430/20117 [11:07:14<1:40:43,  2.25s/it]                                                                                                                                 {'loss': 0.1411, 'grad_norm': 0.6967418789863586, 'learning_rate': 8.767576638143804e-06, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.21, 'epoch': 1.73}
 87%|██████████████████████████████████████████████████████████████████████▏          | 17430/20117 [11:07:14<1:40:43,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17431/20117 [11:07:16<1:40:58,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17432/20117 [11:07:18<1:41:51,  2.28s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17433/20117 [11:07:21<1:40:57,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17434/20117 [11:07:23<1:40:25,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17435/20117 [11:07:25<1:40:30,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17436/20117 [11:07:27<1:39:34,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17437/20117 [11:07:30<1:39:57,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17438/20117 [11:07:32<1:39:36,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17439/20117 [11:07:34<1:39:03,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17440/20117 [11:07:36<1:39:34,  2.23s/it]                                                                                                                                 {'loss': 0.1521, 'grad_norm': 0.7863910794258118, 'learning_rate': 8.70342448246394e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.01, 'epoch': 1.73}
 87%|██████████████████████████████████████████████████████████████████████▏          | 17440/20117 [11:07:36<1:39:34,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17441/20117 [11:07:38<1:39:35,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17442/20117 [11:07:41<1:39:15,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17443/20117 [11:07:43<1:40:02,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17444/20117 [11:07:45<1:39:20,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17445/20117 [11:07:47<1:39:16,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17446/20117 [11:07:50<1:38:54,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▏          | 17447/20117 [11:07:52<1:39:27,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17448/20117 [11:07:54<1:39:45,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17449/20117 [11:07:56<1:39:01,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17450/20117 [11:07:58<1:39:18,  2.23s/it]                                                                                                                                 {'loss': 0.1717, 'grad_norm': 0.49855732917785645, 'learning_rate': 8.639497209545556e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.33, 'epoch': 1.73}
 87%|██████████████████████████████████████████████████████████████████████▎          | 17450/20117 [11:07:59<1:39:18,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17451/20117 [11:08:01<1:39:29,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17452/20117 [11:08:03<1:38:47,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17453/20117 [11:08:05<1:38:26,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17454/20117 [11:08:07<1:38:14,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17455/20117 [11:08:10<1:38:03,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17456/20117 [11:08:12<1:37:49,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17457/20117 [11:08:14<1:37:22,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17458/20117 [11:08:16<1:37:39,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17459/20117 [11:08:18<1:39:10,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17460/20117 [11:08:21<1:40:16,  2.26s/it]                                                                                                                                 {'loss': 0.1185, 'grad_norm': 0.5448564291000366, 'learning_rate': 8.57579497685501e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 338.51, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▎          | 17460/20117 [11:08:21<1:40:16,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17461/20117 [11:08:23<1:39:18,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17462/20117 [11:08:25<1:38:32,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17463/20117 [11:08:27<1:38:14,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17464/20117 [11:08:30<1:38:03,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17465/20117 [11:08:32<1:37:39,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17466/20117 [11:08:34<1:38:07,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17467/20117 [11:08:36<1:38:44,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17468/20117 [11:08:39<1:38:43,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17469/20117 [11:08:41<1:42:29,  2.32s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17470/20117 [11:08:43<1:40:57,  2.29s/it]                                                                                                                                 {'loss': 0.1824, 'grad_norm': 0.9186548590660095, 'learning_rate': 8.512317941304404e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.66, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▎          | 17470/20117 [11:08:43<1:40:57,  2.29s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17471/20117 [11:08:45<1:39:31,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17472/20117 [11:08:48<1:39:03,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17473/20117 [11:08:50<1:38:31,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17474/20117 [11:08:52<1:38:05,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17475/20117 [11:08:54<1:38:10,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17476/20117 [11:08:57<1:38:09,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17477/20117 [11:08:59<1:38:04,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▎          | 17478/20117 [11:09:01<1:37:44,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17479/20117 [11:09:03<1:37:40,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17480/20117 [11:09:05<1:37:33,  2.22s/it]                                                                                                                                 {'loss': 0.1354, 'grad_norm': 0.6779747605323792, 'learning_rate': 8.44906625925106e-06, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.68, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▍          | 17480/20117 [11:09:05<1:37:33,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17481/20117 [11:09:08<1:37:25,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17482/20117 [11:09:10<1:37:23,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17483/20117 [11:09:12<1:38:06,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17484/20117 [11:09:14<1:38:06,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17485/20117 [11:09:17<1:37:23,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17486/20117 [11:09:19<1:36:58,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17487/20117 [11:09:21<1:36:31,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17488/20117 [11:09:23<1:37:14,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17489/20117 [11:09:25<1:37:50,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17490/20117 [11:09:28<1:37:08,  2.22s/it]                                                                                                                                 {'loss': 0.1223, 'grad_norm': 0.6516660451889038, 'learning_rate': 8.386040086497238e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.11, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▍          | 17490/20117 [11:09:28<1:37:08,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17491/20117 [11:09:30<1:37:31,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17492/20117 [11:09:32<1:37:17,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17493/20117 [11:09:34<1:36:48,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17494/20117 [11:09:37<1:37:05,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17495/20117 [11:09:39<1:37:10,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17496/20117 [11:09:41<1:38:09,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17497/20117 [11:09:43<1:37:56,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17498/20117 [11:09:46<1:37:39,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17499/20117 [11:09:48<1:37:25,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17500/20117 [11:09:50<1:37:31,  2.24s/it]                                                                                                                                 {'loss': 0.1306, 'grad_norm': 0.4572422504425049, 'learning_rate': 8.323239578289754e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 387.29, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▍          | 17500/20117 [11:09:50<1:37:31,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17501/20117 [11:09:52<1:37:20,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17502/20117 [11:09:54<1:36:41,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17503/20117 [11:09:57<1:38:04,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17504/20117 [11:09:59<1:37:40,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17505/20117 [11:10:01<1:37:26,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17506/20117 [11:10:03<1:36:55,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17507/20117 [11:10:06<1:37:25,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17508/20117 [11:10:08<1:36:55,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▍          | 17509/20117 [11:10:10<1:36:57,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17510/20117 [11:10:12<1:38:02,  2.26s/it]                                                                                                                                 {'loss': 0.1567, 'grad_norm': 0.5288609862327576, 'learning_rate': 8.260664889319502e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.2, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▌          | 17510/20117 [11:10:12<1:38:02,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17511/20117 [11:10:15<1:37:57,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17512/20117 [11:10:17<1:37:17,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17513/20117 [11:10:19<1:37:03,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17514/20117 [11:10:21<1:37:21,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17515/20117 [11:10:24<1:37:23,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17516/20117 [11:10:26<1:37:14,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17517/20117 [11:10:28<1:36:30,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17518/20117 [11:10:30<1:36:31,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17519/20117 [11:10:32<1:35:45,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17520/20117 [11:10:35<1:35:49,  2.21s/it]                                                                                                                                 {'loss': 0.1687, 'grad_norm': 0.6885465383529663, 'learning_rate': 8.198316173721199e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.42, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▌          | 17520/20117 [11:10:35<1:35:49,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17521/20117 [11:10:37<1:35:26,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17522/20117 [11:10:39<1:35:26,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17523/20117 [11:10:42<1:39:23,  2.30s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17524/20117 [11:10:44<1:38:04,  2.27s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17525/20117 [11:10:46<1:37:38,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17526/20117 [11:10:48<1:36:51,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17527/20117 [11:10:50<1:36:19,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17528/20117 [11:10:53<1:36:25,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17529/20117 [11:10:55<1:36:09,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17530/20117 [11:10:57<1:36:26,  2.24s/it]                                                                                                                                 {'loss': 0.1494, 'grad_norm': 0.4652736485004425, 'learning_rate': 8.136193585072871e-06, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.48, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▌          | 17530/20117 [11:10:57<1:36:26,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17531/20117 [11:10:59<1:36:38,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17532/20117 [11:11:02<1:36:37,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17533/20117 [11:11:04<1:36:27,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17534/20117 [11:11:06<1:36:41,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17535/20117 [11:11:08<1:36:04,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17536/20117 [11:11:11<1:36:30,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17537/20117 [11:11:13<1:36:36,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17538/20117 [11:11:15<1:35:48,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17539/20117 [11:11:17<1:35:21,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▌          | 17540/20117 [11:11:19<1:35:59,  2.24s/it]                                                                                                                                 {'loss': 0.1341, 'grad_norm': 0.2998766303062439, 'learning_rate': 8.074297276395592e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.21, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▌          | 17540/20117 [11:11:19<1:35:59,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17541/20117 [11:11:22<1:36:16,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17542/20117 [11:11:24<1:35:29,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17543/20117 [11:11:26<1:35:50,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17544/20117 [11:11:28<1:35:21,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17545/20117 [11:11:31<1:35:17,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17546/20117 [11:11:33<1:34:53,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17547/20117 [11:11:35<1:34:36,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17548/20117 [11:11:37<1:35:16,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17549/20117 [11:11:39<1:35:03,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17550/20117 [11:11:42<1:35:00,  2.22s/it]                                                                                                                                 {'loss': 0.1889, 'grad_norm': 0.5339378118515015, 'learning_rate': 8.012627400153073e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 366.33, 'epoch': 1.74}
 87%|██████████████████████████████████████████████████████████████████████▋          | 17550/20117 [11:11:42<1:35:00,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17551/20117 [11:11:44<1:34:56,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17552/20117 [11:11:46<1:34:32,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17553/20117 [11:11:48<1:34:18,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17554/20117 [11:11:50<1:34:10,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17555/20117 [11:11:53<1:34:05,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17556/20117 [11:11:55<1:33:51,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17557/20117 [11:11:57<1:33:44,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17558/20117 [11:11:59<1:33:39,  2.20s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17559/20117 [11:12:02<1:34:11,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17560/20117 [11:12:04<1:35:40,  2.25s/it]                                                                                                                                 {'loss': 0.1306, 'grad_norm': 0.2314610630273819, 'learning_rate': 7.951184108251242e-06, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 265.9, 'epoch': 1.75}
 87%|██████████████████████████████████████████████████████████████████████▋          | 17560/20117 [11:12:04<1:35:40,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17561/20117 [11:12:06<1:35:17,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17562/20117 [11:12:08<1:35:03,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17563/20117 [11:12:11<1:35:43,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17564/20117 [11:12:13<1:35:17,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17565/20117 [11:12:15<1:35:20,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17566/20117 [11:12:17<1:35:44,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17567/20117 [11:12:20<1:36:23,  2.27s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17568/20117 [11:12:22<1:36:08,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17569/20117 [11:12:24<1:35:27,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17570/20117 [11:12:26<1:35:22,  2.25s/it]                                                                                                                                 {'loss': 0.1782, 'grad_norm': 0.33561253547668457, 'learning_rate': 7.889967552037913e-06, 'memory/max_active (GiB)': 19.69, 'memory/max_allocated (GiB)': 19.69, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.12, 'epoch': 1.75}
 87%|██████████████████████████████████████████████████████████████████████▋          | 17570/20117 [11:12:26<1:35:22,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▋          | 17571/20117 [11:12:29<1:34:37,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17572/20117 [11:12:31<1:34:15,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17573/20117 [11:12:33<1:34:37,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17574/20117 [11:12:35<1:38:00,  2.31s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17575/20117 [11:12:38<1:37:58,  2.31s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17576/20117 [11:12:40<1:36:47,  2.29s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17577/20117 [11:12:42<1:35:58,  2.27s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17578/20117 [11:12:44<1:35:28,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17579/20117 [11:12:47<1:35:27,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17580/20117 [11:12:49<1:34:46,  2.24s/it]                                                                                                                                 {'loss': 0.1693, 'grad_norm': 0.4189736843109131, 'learning_rate': 7.828977882302413e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 394.43, 'epoch': 1.75}
 87%|██████████████████████████████████████████████████████████████████████▊          | 17580/20117 [11:12:49<1:34:46,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17581/20117 [11:12:51<1:33:57,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17582/20117 [11:12:53<1:33:44,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17583/20117 [11:12:56<1:33:27,  2.21s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17584/20117 [11:12:58<1:33:50,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17585/20117 [11:13:00<1:34:22,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17586/20117 [11:13:02<1:34:49,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17587/20117 [11:13:04<1:34:13,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17588/20117 [11:13:07<1:35:03,  2.26s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17589/20117 [11:13:09<1:34:48,  2.25s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17590/20117 [11:13:11<1:34:10,  2.24s/it]                                                                                                                                 {'loss': 0.1826, 'grad_norm': 0.3725840747356415, 'learning_rate': 7.768215249275168e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.73, 'epoch': 1.75}
 87%|██████████████████████████████████████████████████████████████████████▊          | 17590/20117 [11:13:11<1:34:10,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17591/20117 [11:13:13<1:34:10,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17592/20117 [11:13:16<1:34:01,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17593/20117 [11:13:18<1:33:51,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17594/20117 [11:13:20<1:33:29,  2.22s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17595/20117 [11:13:22<1:33:48,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17596/20117 [11:13:25<1:34:07,  2.24s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17597/20117 [11:13:27<1:33:36,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17598/20117 [11:13:29<1:33:38,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17599/20117 [11:13:31<1:33:23,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17600/20117 [11:13:34<1:33:37,  2.23s/it]                                                                                                                                 {'loss': 0.1428, 'grad_norm': 0.40016722679138184, 'learning_rate': 7.707679802627399e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 304.16, 'epoch': 1.75}
 87%|██████████████████████████████████████████████████████████████████████▊          | 17600/20117 [11:13:34<1:33:37,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17601/20117 [11:13:36<1:33:20,  2.23s/it] 87%|██████████████████████████████████████████████████████████████████████▊          | 17602/20117 [11:13:38<1:33:10,  2.22s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17603/20117 [11:13:40<1:32:58,  2.22s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17604/20117 [11:13:42<1:33:16,  2.23s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17605/20117 [11:13:45<1:33:02,  2.22s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17606/20117 [11:13:47<1:33:30,  2.23s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17607/20117 [11:13:49<1:33:21,  2.23s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17608/20117 [11:13:51<1:34:22,  2.26s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17609/20117 [11:13:54<1:36:17,  2.30s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17610/20117 [11:13:56<1:35:59,  2.30s/it]                                                                                                                                 {'loss': 0.0966, 'grad_norm': 0.274614542722702, 'learning_rate': 7.647371691470706e-06, 'memory/max_active (GiB)': 19.68, 'memory/max_allocated (GiB)': 19.68, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.33, 'epoch': 1.75}
 88%|██████████████████████████████████████████████████████████████████████▉          | 17610/20117 [11:13:56<1:35:59,  2.30s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17611/20117 [11:13:58<1:36:11,  2.30s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17612/20117 [11:14:01<1:35:57,  2.30s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17613/20117 [11:14:03<1:34:42,  2.27s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17614/20117 [11:14:05<1:34:02,  2.25s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17615/20117 [11:14:07<1:33:31,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17616/20117 [11:14:10<1:33:18,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17617/20117 [11:14:12<1:33:35,  2.25s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17618/20117 [11:14:14<1:33:18,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17619/20117 [11:14:16<1:33:08,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17620/20117 [11:14:19<1:33:01,  2.24s/it]                                                                                                                                 {'loss': 0.1495, 'grad_norm': 0.5691856145858765, 'learning_rate': 7.587291064356716e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 285.38, 'epoch': 1.75}
 88%|██████████████████████████████████████████████████████████████████████▉          | 17620/20117 [11:14:19<1:33:01,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17621/20117 [11:14:21<1:33:05,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17622/20117 [11:14:23<1:33:10,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17623/20117 [11:14:25<1:32:57,  2.24s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17624/20117 [11:14:28<1:33:19,  2.25s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17625/20117 [11:14:30<1:33:59,  2.26s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17626/20117 [11:14:32<1:37:20,  2.34s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17627/20117 [11:14:35<1:36:14,  2.32s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17628/20117 [11:14:37<1:35:06,  2.29s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17629/20117 [11:14:39<1:35:00,  2.29s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17630/20117 [11:14:41<1:34:28,  2.28s/it]                                                                                                                                 {'loss': 0.1387, 'grad_norm': 0.529352068901062, 'learning_rate': 7.5274380692766825e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 328.86, 'epoch': 1.75}
 88%|██████████████████████████████████████████████████████████████████████▉          | 17630/20117 [11:14:41<1:34:28,  2.28s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17631/20117 [11:14:44<1:34:13,  2.27s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17632/20117 [11:14:46<1:33:43,  2.26s/it] 88%|██████████████████████████████████████████████████████████████████████▉          | 17633/20117 [11:14:48<1:33:37,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17634/20117 [11:14:50<1:33:26,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17635/20117 [11:14:53<1:33:24,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17636/20117 [11:14:55<1:33:13,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17637/20117 [11:14:57<1:32:55,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17638/20117 [11:14:59<1:32:24,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17639/20117 [11:15:02<1:32:21,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17640/20117 [11:15:04<1:32:34,  2.24s/it]                                                                                                                                 {'loss': 0.136, 'grad_norm': 0.4417911171913147, 'learning_rate': 7.467812853661216e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.36, 'epoch': 1.75}
 88%|███████████████████████████████████████████████████████████████████████          | 17640/20117 [11:15:04<1:32:34,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17641/20117 [11:15:06<1:33:06,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17642/20117 [11:15:08<1:32:49,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17643/20117 [11:15:11<1:32:28,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17644/20117 [11:15:13<1:32:13,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17645/20117 [11:15:15<1:32:29,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17646/20117 [11:15:17<1:32:06,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17647/20117 [11:15:20<1:32:19,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17648/20117 [11:15:22<1:31:43,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17649/20117 [11:15:24<1:32:15,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17650/20117 [11:15:26<1:32:54,  2.26s/it]                                                                                                                                 {'loss': 0.1326, 'grad_norm': 0.45288828015327454, 'learning_rate': 7.4084155643798335e-06, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 329.09, 'epoch': 1.75}
 88%|███████████████████████████████████████████████████████████████████████          | 17650/20117 [11:15:26<1:32:54,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17651/20117 [11:15:29<1:32:36,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17652/20117 [11:15:31<1:31:55,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17653/20117 [11:15:33<1:32:25,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17654/20117 [11:15:35<1:32:07,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17655/20117 [11:15:38<1:31:39,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17656/20117 [11:15:40<1:31:56,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17657/20117 [11:15:42<1:32:16,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17658/20117 [11:15:44<1:32:02,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17659/20117 [11:15:47<1:32:46,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17660/20117 [11:15:49<1:33:41,  2.29s/it]                                                                                                                                 {'loss': 0.179, 'grad_norm': 0.4209536015987396, 'learning_rate': 7.349246347740568e-06, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 426.76, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████          | 17660/20117 [11:15:49<1:33:41,  2.29s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17661/20117 [11:15:51<1:33:18,  2.28s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17662/20117 [11:15:53<1:32:11,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17663/20117 [11:15:56<1:31:35,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████          | 17664/20117 [11:15:58<1:32:56,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17665/20117 [11:16:00<1:32:46,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17666/20117 [11:16:02<1:32:13,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17667/20117 [11:16:05<1:31:59,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17668/20117 [11:16:07<1:32:18,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17669/20117 [11:16:09<1:33:19,  2.29s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17670/20117 [11:16:12<1:33:59,  2.30s/it]                                                                                                                                 {'loss': 0.2134, 'grad_norm': 0.49393320083618164, 'learning_rate': 7.290305349489734e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 411.57, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▏         | 17670/20117 [11:16:12<1:33:59,  2.30s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17671/20117 [11:16:14<1:32:49,  2.28s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17672/20117 [11:16:16<1:32:11,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17673/20117 [11:16:18<1:31:55,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17674/20117 [11:16:21<1:31:36,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17675/20117 [11:16:23<1:31:49,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17676/20117 [11:16:25<1:31:18,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17677/20117 [11:16:27<1:31:34,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17678/20117 [11:16:30<1:34:38,  2.33s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17679/20117 [11:16:32<1:33:06,  2.29s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17680/20117 [11:16:34<1:32:02,  2.27s/it]                                                                                                                                 {'loss': 0.1676, 'grad_norm': 0.6331411600112915, 'learning_rate': 7.2315927148114635e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 306.06, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▏         | 17680/20117 [11:16:34<1:32:02,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17681/20117 [11:16:36<1:31:25,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17682/20117 [11:16:39<1:30:53,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17683/20117 [11:16:41<1:31:31,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17684/20117 [11:16:43<1:31:06,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17685/20117 [11:16:45<1:30:44,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17686/20117 [11:16:48<1:31:58,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17687/20117 [11:16:50<1:31:48,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17688/20117 [11:16:52<1:31:23,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17689/20117 [11:16:55<1:31:35,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17690/20117 [11:16:57<1:31:25,  2.26s/it]                                                                                                                                 {'loss': 0.14, 'grad_norm': 0.383672297000885, 'learning_rate': 7.173108588327415e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.56, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▏         | 17690/20117 [11:16:57<1:31:25,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17691/20117 [11:16:59<1:31:24,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17692/20117 [11:17:01<1:31:16,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17693/20117 [11:17:04<1:30:51,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17694/20117 [11:17:06<1:30:33,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▏         | 17695/20117 [11:17:08<1:30:20,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17696/20117 [11:17:10<1:30:39,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17697/20117 [11:17:12<1:30:21,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17698/20117 [11:17:15<1:30:43,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17699/20117 [11:17:17<1:30:16,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17700/20117 [11:17:19<1:30:45,  2.25s/it]                                                                                                                                 {'loss': 0.1358, 'grad_norm': 0.4192826449871063, 'learning_rate': 7.1148531140962986e-06, 'memory/max_active (GiB)': 18.85, 'memory/max_allocated (GiB)': 18.85, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.72, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▎         | 17700/20117 [11:17:19<1:30:45,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17701/20117 [11:17:21<1:30:37,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17702/20117 [11:17:24<1:31:04,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17703/20117 [11:17:26<1:31:49,  2.28s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17704/20117 [11:17:28<1:30:42,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17705/20117 [11:17:30<1:30:05,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17706/20117 [11:17:33<1:29:38,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17707/20117 [11:17:35<1:29:55,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17708/20117 [11:17:37<1:29:36,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17709/20117 [11:17:39<1:29:58,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17710/20117 [11:17:42<1:31:04,  2.27s/it]                                                                                                                                 {'loss': 0.1483, 'grad_norm': 0.3609708845615387, 'learning_rate': 7.056826435613706e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.78, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▎         | 17710/20117 [11:17:42<1:31:04,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17711/20117 [11:17:44<1:30:43,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17712/20117 [11:17:46<1:29:54,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17713/20117 [11:17:48<1:29:37,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17714/20117 [11:17:51<1:31:59,  2.30s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17715/20117 [11:17:53<1:31:52,  2.29s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17716/20117 [11:17:56<1:33:03,  2.33s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17717/20117 [11:17:58<1:33:30,  2.34s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17718/20117 [11:18:00<1:32:05,  2.30s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17719/20117 [11:18:02<1:30:37,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17720/20117 [11:18:05<1:30:32,  2.27s/it]                                                                                                                                 {'loss': 0.1457, 'grad_norm': 0.535681962966919, 'learning_rate': 6.999028695811572e-06, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 315.36, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▎         | 17720/20117 [11:18:05<1:30:32,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17721/20117 [11:18:07<1:30:15,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17722/20117 [11:18:09<1:29:29,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17723/20117 [11:18:11<1:29:29,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17724/20117 [11:18:13<1:28:53,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17725/20117 [11:18:16<1:29:03,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▎         | 17726/20117 [11:18:18<1:28:27,  2.22s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17727/20117 [11:18:20<1:28:22,  2.22s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17728/20117 [11:18:22<1:29:04,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17729/20117 [11:18:25<1:31:45,  2.31s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17730/20117 [11:18:27<1:30:48,  2.28s/it]                                                                                                                                 {'loss': 0.187, 'grad_norm': 0.7409544587135315, 'learning_rate': 6.941460037057979e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.48, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▍         | 17730/20117 [11:18:27<1:30:48,  2.28s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17731/20117 [11:18:29<1:30:12,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17732/20117 [11:18:32<1:30:30,  2.28s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17733/20117 [11:18:34<1:30:11,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17734/20117 [11:18:36<1:29:33,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17735/20117 [11:18:38<1:28:58,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17736/20117 [11:18:41<1:29:11,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17737/20117 [11:18:43<1:29:51,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17738/20117 [11:18:45<1:29:52,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17739/20117 [11:18:47<1:29:08,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17740/20117 [11:18:50<1:28:35,  2.24s/it]                                                                                                                                 {'loss': 0.1178, 'grad_norm': 0.4647164046764374, 'learning_rate': 6.8841206011566625e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.67, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▍         | 17740/20117 [11:18:50<1:28:35,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17741/20117 [11:18:52<1:28:20,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17742/20117 [11:18:54<1:28:44,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17743/20117 [11:18:56<1:29:59,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17744/20117 [11:18:59<1:29:48,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17745/20117 [11:19:01<1:29:13,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17746/20117 [11:19:03<1:28:58,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17747/20117 [11:19:05<1:29:14,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17748/20117 [11:19:08<1:28:33,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17749/20117 [11:19:10<1:28:29,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17750/20117 [11:19:12<1:29:09,  2.26s/it]                                                                                                                                 {'loss': 0.1156, 'grad_norm': 0.37413302063941956, 'learning_rate': 6.827010529346822e-06, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.82, 'epoch': 1.76}
 88%|███████████████████████████████████████████████████████████████████████▍         | 17750/20117 [11:19:12<1:29:09,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17751/20117 [11:19:14<1:29:25,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17752/20117 [11:19:17<1:28:49,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17753/20117 [11:19:19<1:29:17,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17754/20117 [11:19:21<1:28:23,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17755/20117 [11:19:23<1:27:44,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17756/20117 [11:19:26<1:27:02,  2.21s/it] 88%|███████████████████████████████████████████████████████████████████████▍         | 17757/20117 [11:19:28<1:27:27,  2.22s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17758/20117 [11:19:30<1:27:54,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17759/20117 [11:19:32<1:28:20,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17760/20117 [11:19:35<1:28:12,  2.25s/it]                                                                                                                                 {'loss': 0.1514, 'grad_norm': 0.7163899540901184, 'learning_rate': 6.7701299623025846e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.76, 'epoch': 1.77}
 88%|███████████████████████████████████████████████████████████████████████▌         | 17760/20117 [11:19:35<1:28:12,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17761/20117 [11:19:37<1:27:59,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17762/20117 [11:19:39<1:27:59,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17763/20117 [11:19:41<1:27:47,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17764/20117 [11:19:44<1:27:57,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17765/20117 [11:19:46<1:28:09,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17766/20117 [11:19:48<1:27:59,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17767/20117 [11:19:50<1:27:51,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17768/20117 [11:19:53<1:27:58,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17769/20117 [11:19:55<1:28:02,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17770/20117 [11:19:57<1:28:25,  2.26s/it]                                                                                                                                 {'loss': 0.1582, 'grad_norm': 0.47301217913627625, 'learning_rate': 6.713479040132841e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.61, 'epoch': 1.77}
 88%|███████████████████████████████████████████████████████████████████████▌         | 17770/20117 [11:19:57<1:28:25,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17771/20117 [11:19:59<1:28:06,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17772/20117 [11:20:01<1:27:20,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17773/20117 [11:20:04<1:27:23,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17774/20117 [11:20:06<1:26:57,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17775/20117 [11:20:08<1:27:01,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17776/20117 [11:20:10<1:27:01,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17777/20117 [11:20:13<1:27:05,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17778/20117 [11:20:15<1:26:36,  2.22s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17779/20117 [11:20:17<1:27:06,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17780/20117 [11:20:19<1:26:22,  2.22s/it]                                                                                                                                 {'loss': 0.1119, 'grad_norm': 0.4980860948562622, 'learning_rate': 6.657057902380792e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 302.2, 'epoch': 1.77}
 88%|███████████████████████████████████████████████████████████████████████▌         | 17780/20117 [11:20:19<1:26:22,  2.22s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17781/20117 [11:20:21<1:26:10,  2.21s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17782/20117 [11:20:24<1:29:56,  2.31s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17783/20117 [11:20:26<1:29:29,  2.30s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17784/20117 [11:20:29<1:28:23,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17785/20117 [11:20:31<1:27:48,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17786/20117 [11:20:33<1:27:09,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17787/20117 [11:20:35<1:27:23,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▌         | 17788/20117 [11:20:37<1:27:17,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17789/20117 [11:20:40<1:27:17,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17790/20117 [11:20:42<1:27:13,  2.25s/it]                                                                                                                                 {'loss': 0.1737, 'grad_norm': 0.6229726076126099, 'learning_rate': 6.600866688023588e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 408.97, 'epoch': 1.77}
 88%|███████████████████████████████████████████████████████████████████████▋         | 17790/20117 [11:20:42<1:27:13,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17791/20117 [11:20:44<1:27:25,  2.26s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17792/20117 [11:20:47<1:27:57,  2.27s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17793/20117 [11:20:49<1:27:05,  2.25s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17794/20117 [11:20:51<1:26:20,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17795/20117 [11:20:53<1:26:25,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17796/20117 [11:20:55<1:26:30,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17797/20117 [11:20:58<1:26:33,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17798/20117 [11:21:00<1:26:16,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17799/20117 [11:21:02<1:25:56,  2.22s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17800/20117 [11:21:04<1:26:22,  2.24s/it]                                                                                                                                 {'loss': 0.1809, 'grad_norm': 0.5228248834609985, 'learning_rate': 6.5449055354721125e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.16, 'epoch': 1.77}
 88%|███████████████████████████████████████████████████████████████████████▋         | 17800/20117 [11:21:04<1:26:22,  2.24s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17801/20117 [11:21:07<1:26:12,  2.23s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17802/20117 [11:21:09<1:25:36,  2.22s/it] 88%|███████████████████████████████████████████████████████████████████████▋         | 17803/20117 [11:21:11<1:25:29,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17804/20117 [11:21:13<1:25:19,  2.21s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17805/20117 [11:21:15<1:26:28,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17806/20117 [11:21:18<1:27:05,  2.26s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17807/20117 [11:21:20<1:26:16,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17808/20117 [11:21:22<1:26:12,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17809/20117 [11:21:24<1:26:47,  2.26s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17810/20117 [11:21:27<1:26:43,  2.26s/it]                                                                                                                                 {'loss': 0.147, 'grad_norm': 0.4032490849494934, 'learning_rate': 6.489174582570467e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.57, 'epoch': 1.77}
 89%|███████████████████████████████████████████████████████████████████████▋         | 17810/20117 [11:21:27<1:26:43,  2.26s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17811/20117 [11:21:29<1:26:36,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17812/20117 [11:21:31<1:26:11,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17813/20117 [11:21:33<1:25:47,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17814/20117 [11:21:36<1:25:33,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17815/20117 [11:21:38<1:25:28,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17816/20117 [11:21:40<1:25:01,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17817/20117 [11:21:42<1:25:25,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17818/20117 [11:21:45<1:25:39,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▋         | 17819/20117 [11:21:47<1:25:34,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17820/20117 [11:21:49<1:25:48,  2.24s/it]                                                                                                                                 {'loss': 0.1556, 'grad_norm': 0.44193387031555176, 'learning_rate': 6.433673966595788e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.37, 'epoch': 1.77}
 89%|███████████████████████████████████████████████████████████████████████▊         | 17820/20117 [11:21:49<1:25:48,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17821/20117 [11:21:51<1:25:26,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17822/20117 [11:21:54<1:25:30,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17823/20117 [11:21:56<1:25:43,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17824/20117 [11:21:58<1:26:12,  2.26s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17825/20117 [11:22:00<1:25:49,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17826/20117 [11:22:02<1:25:22,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17827/20117 [11:22:05<1:25:05,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17828/20117 [11:22:07<1:25:00,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17829/20117 [11:22:09<1:24:32,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17830/20117 [11:22:11<1:24:51,  2.23s/it]                                                                                                                                 {'loss': 0.1261, 'grad_norm': 0.41534683108329773, 'learning_rate': 6.3784038242578285e-06, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 317.12, 'epoch': 1.77}
 89%|███████████████████████████████████████████████████████████████████████▊         | 17830/20117 [11:22:11<1:24:51,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17831/20117 [11:22:14<1:24:41,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17832/20117 [11:22:16<1:25:21,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17833/20117 [11:22:18<1:25:15,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17834/20117 [11:22:20<1:25:10,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17835/20117 [11:22:23<1:24:35,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17836/20117 [11:22:25<1:24:37,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17837/20117 [11:22:27<1:27:29,  2.30s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17838/20117 [11:22:30<1:26:51,  2.29s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17839/20117 [11:22:32<1:26:19,  2.27s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17840/20117 [11:22:34<1:26:13,  2.27s/it]                                                                                                                                 {'loss': 0.1922, 'grad_norm': 0.7784578800201416, 'learning_rate': 6.323364291698642e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.84, 'epoch': 1.77}
 89%|███████████████████████████████████████████████████████████████████████▊         | 17840/20117 [11:22:34<1:26:13,  2.27s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17841/20117 [11:22:36<1:25:29,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17842/20117 [11:22:38<1:25:25,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17843/20117 [11:22:41<1:25:23,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17844/20117 [11:22:43<1:25:57,  2.27s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17845/20117 [11:22:45<1:25:46,  2.27s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17846/20117 [11:22:48<1:26:26,  2.28s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17847/20117 [11:22:50<1:25:15,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17848/20117 [11:22:52<1:24:36,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17849/20117 [11:22:54<1:25:23,  2.26s/it] 89%|███████████████████████████████████████████████████████████████████████▊         | 17850/20117 [11:22:57<1:25:12,  2.26s/it]                                                                                                                                 {'loss': 0.1375, 'grad_norm': 0.7484685778617859, 'learning_rate': 6.2685555044921905e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.93, 'epoch': 1.77}
 89%|███████████████████████████████████████████████████████████████████████▊         | 17850/20117 [11:22:57<1:25:12,  2.26s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17851/20117 [11:22:59<1:24:29,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17852/20117 [11:23:01<1:24:01,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17853/20117 [11:23:03<1:24:09,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17854/20117 [11:23:05<1:24:50,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17855/20117 [11:23:08<1:24:32,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17856/20117 [11:23:10<1:24:15,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17857/20117 [11:23:12<1:23:54,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17858/20117 [11:23:14<1:24:44,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17859/20117 [11:23:17<1:24:46,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17860/20117 [11:23:19<1:24:06,  2.24s/it]                                                                                                                                 {'loss': 0.1087, 'grad_norm': 0.5791344046592712, 'learning_rate': 6.213977597644138e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.95, 'epoch': 1.78}
 89%|███████████████████████████████████████████████████████████████████████▉         | 17860/20117 [11:23:19<1:24:06,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17861/20117 [11:23:21<1:24:11,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17862/20117 [11:23:23<1:23:48,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17863/20117 [11:23:26<1:24:07,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17864/20117 [11:23:28<1:24:09,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17865/20117 [11:23:30<1:23:51,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17866/20117 [11:23:32<1:23:57,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17867/20117 [11:23:35<1:24:11,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17868/20117 [11:23:37<1:23:29,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17869/20117 [11:23:39<1:23:34,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17870/20117 [11:23:41<1:23:18,  2.22s/it]                                                                                                                                 {'loss': 0.1257, 'grad_norm': 0.22857262194156647, 'learning_rate': 6.159630705591379e-06, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.98, 'epoch': 1.78}
 89%|███████████████████████████████████████████████████████████████████████▉         | 17870/20117 [11:23:41<1:23:18,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17871/20117 [11:23:43<1:22:58,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17872/20117 [11:23:46<1:23:00,  2.22s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17873/20117 [11:23:48<1:23:16,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17874/20117 [11:23:50<1:23:28,  2.23s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17875/20117 [11:23:52<1:23:38,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17876/20117 [11:23:55<1:23:46,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17877/20117 [11:23:57<1:23:42,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17878/20117 [11:23:59<1:23:48,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17879/20117 [11:24:01<1:23:30,  2.24s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17880/20117 [11:24:04<1:23:51,  2.25s/it]                                                                                                                                 {'loss': 0.1652, 'grad_norm': 0.43469542264938354, 'learning_rate': 6.10551496220183e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.12, 'epoch': 1.78}
 89%|███████████████████████████████████████████████████████████████████████▉         | 17880/20117 [11:24:04<1:23:51,  2.25s/it] 89%|███████████████████████████████████████████████████████████████████████▉         | 17881/20117 [11:24:06<1:23:37,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17882/20117 [11:24:08<1:23:34,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17883/20117 [11:24:10<1:23:37,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17884/20117 [11:24:13<1:23:00,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17885/20117 [11:24:15<1:22:34,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17886/20117 [11:24:17<1:24:02,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17887/20117 [11:24:19<1:23:45,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17888/20117 [11:24:22<1:23:52,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17889/20117 [11:24:24<1:26:31,  2.33s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17890/20117 [11:24:26<1:25:28,  2.30s/it]                                                                                                                                 {'loss': 0.1787, 'grad_norm': 0.7687386274337769, 'learning_rate': 6.0516305007739525e-06, 'memory/max_active (GiB)': 20.72, 'memory/max_allocated (GiB)': 20.72, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.52, 'epoch': 1.78}
 89%|████████████████████████████████████████████████████████████████████████         | 17890/20117 [11:24:26<1:25:28,  2.30s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17891/20117 [11:24:29<1:25:26,  2.30s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17892/20117 [11:24:31<1:24:22,  2.28s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17893/20117 [11:24:33<1:24:11,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17894/20117 [11:24:35<1:23:40,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17895/20117 [11:24:38<1:23:06,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17896/20117 [11:24:40<1:23:08,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17897/20117 [11:24:42<1:23:00,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17898/20117 [11:24:44<1:22:20,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17899/20117 [11:24:46<1:22:12,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17900/20117 [11:24:49<1:22:26,  2.23s/it]                                                                                                                                 {'loss': 0.1817, 'grad_norm': 0.3083856999874115, 'learning_rate': 5.997977454036608e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.94, 'epoch': 1.78}
 89%|████████████████████████████████████████████████████████████████████████         | 17900/20117 [11:24:49<1:22:26,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17901/20117 [11:24:51<1:22:20,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17902/20117 [11:24:53<1:22:21,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17903/20117 [11:24:55<1:22:45,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17904/20117 [11:24:58<1:22:47,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17905/20117 [11:25:00<1:23:08,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17906/20117 [11:25:02<1:22:37,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17907/20117 [11:25:04<1:23:02,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17908/20117 [11:25:07<1:22:32,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17909/20117 [11:25:09<1:22:17,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17910/20117 [11:25:11<1:22:34,  2.24s/it]                                                                                                                                 {'loss': 0.1563, 'grad_norm': 0.3785597085952759, 'learning_rate': 5.944555954148579e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 396.7, 'epoch': 1.78}
 89%|████████████████████████████████████████████████████████████████████████         | 17910/20117 [11:25:11<1:22:34,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17911/20117 [11:25:13<1:22:11,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████         | 17912/20117 [11:25:16<1:22:12,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17913/20117 [11:25:18<1:22:10,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17914/20117 [11:25:20<1:21:41,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17915/20117 [11:25:22<1:21:45,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17916/20117 [11:25:24<1:21:31,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17917/20117 [11:25:27<1:21:10,  2.21s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17918/20117 [11:25:29<1:20:55,  2.21s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17919/20117 [11:25:31<1:20:45,  2.20s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17920/20117 [11:25:33<1:20:59,  2.21s/it]                                                                                                                                 {'loss': 0.1876, 'grad_norm': 0.5601521730422974, 'learning_rate': 5.891366132698295e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.52, 'epoch': 1.78}
 89%|████████████████████████████████████████████████████████████████████████▏        | 17920/20117 [11:25:33<1:20:59,  2.21s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17921/20117 [11:25:36<1:21:06,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17922/20117 [11:25:38<1:21:35,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17923/20117 [11:25:40<1:21:32,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17924/20117 [11:25:42<1:21:24,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17925/20117 [11:25:44<1:21:27,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17926/20117 [11:25:47<1:20:58,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17927/20117 [11:25:49<1:21:03,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17928/20117 [11:25:51<1:21:30,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17929/20117 [11:25:53<1:21:11,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17930/20117 [11:25:56<1:21:12,  2.23s/it]                                                                                                                                 {'loss': 0.1495, 'grad_norm': 0.6354232430458069, 'learning_rate': 5.838408120703554e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.19, 'epoch': 1.78}
 89%|████████████████████████████████████████████████████████████████████████▏        | 17930/20117 [11:25:56<1:21:12,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17931/20117 [11:25:58<1:21:38,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17932/20117 [11:26:00<1:22:12,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17933/20117 [11:26:02<1:22:11,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17934/20117 [11:26:05<1:21:56,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17935/20117 [11:26:07<1:21:48,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17936/20117 [11:26:09<1:21:10,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17937/20117 [11:26:11<1:21:10,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17938/20117 [11:26:14<1:21:24,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17939/20117 [11:26:16<1:21:04,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17940/20117 [11:26:18<1:23:38,  2.31s/it]                                                                                                                                 {'loss': 0.172, 'grad_norm': 0.6532102227210999, 'learning_rate': 5.785682048611097e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 391.52, 'epoch': 1.78}
 89%|████████████████████████████████████████████████████████████████████████▏        | 17940/20117 [11:26:18<1:23:38,  2.31s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17941/20117 [11:26:21<1:22:57,  2.29s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17942/20117 [11:26:23<1:23:03,  2.29s/it] 89%|████████████████████████████████████████████████████████████████████████▏        | 17943/20117 [11:26:25<1:23:16,  2.30s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17944/20117 [11:26:27<1:22:32,  2.28s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17945/20117 [11:26:30<1:22:21,  2.28s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17946/20117 [11:26:32<1:21:40,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17947/20117 [11:26:34<1:21:23,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17948/20117 [11:26:36<1:21:02,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17949/20117 [11:26:39<1:21:39,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17950/20117 [11:26:41<1:21:26,  2.25s/it]                                                                                                                                 {'loss': 0.1516, 'grad_norm': 0.4626152515411377, 'learning_rate': 5.733188046296423e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.51, 'epoch': 1.78}
 89%|████████████████████████████████████████████████████████████████████████▎        | 17950/20117 [11:26:41<1:21:26,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17951/20117 [11:26:43<1:21:47,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17952/20117 [11:26:45<1:21:10,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17953/20117 [11:26:48<1:21:22,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17954/20117 [11:26:50<1:20:46,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17955/20117 [11:26:52<1:20:28,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17956/20117 [11:26:54<1:20:20,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17957/20117 [11:26:57<1:20:22,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17958/20117 [11:26:59<1:21:05,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17959/20117 [11:27:01<1:20:48,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17960/20117 [11:27:03<1:21:18,  2.26s/it]                                                                                                                                 {'loss': 0.1487, 'grad_norm': 0.3500106930732727, 'learning_rate': 5.680926243063322e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 395.4, 'epoch': 1.79}
 89%|████████████████████████████████████████████████████████████████████████▎        | 17960/20117 [11:27:03<1:21:18,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17961/20117 [11:27:06<1:21:12,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17962/20117 [11:27:08<1:21:34,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17963/20117 [11:27:10<1:21:37,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17964/20117 [11:27:12<1:20:41,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17965/20117 [11:27:15<1:21:21,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17966/20117 [11:27:17<1:21:24,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17967/20117 [11:27:19<1:21:07,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17968/20117 [11:27:21<1:21:07,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17969/20117 [11:27:24<1:22:14,  2.30s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17970/20117 [11:27:26<1:22:06,  2.29s/it]                                                                                                                                 {'loss': 0.1401, 'grad_norm': 0.5653957724571228, 'learning_rate': 5.628896767643677e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.67, 'epoch': 1.79}
 89%|████████████████████████████████████████████████████████████████████████▎        | 17970/20117 [11:27:26<1:22:06,  2.29s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17971/20117 [11:27:28<1:21:03,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17972/20117 [11:27:31<1:20:43,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17973/20117 [11:27:33<1:20:27,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▎        | 17974/20117 [11:27:35<1:20:00,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17975/20117 [11:27:37<1:20:39,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17976/20117 [11:27:40<1:21:00,  2.27s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17977/20117 [11:27:42<1:20:07,  2.25s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17978/20117 [11:27:44<1:19:47,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17979/20117 [11:27:46<1:19:47,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17980/20117 [11:27:48<1:19:21,  2.23s/it]                                                                                                                                 {'loss': 0.1497, 'grad_norm': 0.4583251476287842, 'learning_rate': 5.577099748197079e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.79, 'epoch': 1.79}
 89%|████████████████████████████████████████████████████████████████████████▍        | 17980/20117 [11:27:48<1:19:21,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17981/20117 [11:27:51<1:19:01,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17982/20117 [11:27:53<1:19:20,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17983/20117 [11:27:55<1:19:14,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17984/20117 [11:27:57<1:19:11,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17985/20117 [11:28:00<1:18:51,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17986/20117 [11:28:02<1:19:00,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17987/20117 [11:28:04<1:19:18,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17988/20117 [11:28:06<1:19:03,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17989/20117 [11:28:08<1:18:36,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17990/20117 [11:28:11<1:19:05,  2.23s/it]                                                                                                                                 {'loss': 0.1878, 'grad_norm': 0.7779932022094727, 'learning_rate': 5.525535312310559e-06, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.6, 'epoch': 1.79}
 89%|████████████████████████████████████████████████████████████████████████▍        | 17990/20117 [11:28:11<1:19:05,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17991/20117 [11:28:13<1:22:13,  2.32s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17992/20117 [11:28:15<1:20:56,  2.29s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17993/20117 [11:28:18<1:20:04,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17994/20117 [11:28:20<1:19:53,  2.26s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17995/20117 [11:28:22<1:19:03,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17996/20117 [11:28:24<1:19:08,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17997/20117 [11:28:27<1:19:17,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17998/20117 [11:28:29<1:19:14,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 17999/20117 [11:28:31<1:19:00,  2.24s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 18000/20117 [11:28:33<1:18:33,  2.23s/it]                                                                                                                                 {'loss': 0.0951, 'grad_norm': 0.5615857243537903, 'learning_rate': 5.4742035869981726e-06, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.54, 'epoch': 1.79}
 89%|████████████████████████████████████████████████████████████████████████▍        | 18000/20117 [11:28:33<1:18:33,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 18001/20117 [11:28:36<1:18:25,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 18002/20117 [11:28:38<1:18:35,  2.23s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 18003/20117 [11:28:40<1:18:22,  2.22s/it] 89%|████████████████████████████████████████████████████████████████████████▍        | 18004/20117 [11:28:42<1:18:05,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▍        | 18005/20117 [11:28:44<1:18:12,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18006/20117 [11:28:47<1:18:03,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18007/20117 [11:28:49<1:17:52,  2.21s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18008/20117 [11:28:51<1:18:11,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18009/20117 [11:28:53<1:18:45,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18010/20117 [11:28:56<1:18:40,  2.24s/it]                                                                                                                                 {'loss': 0.15, 'grad_norm': 0.6377988457679749, 'learning_rate': 5.423104698700853e-06, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.19, 'epoch': 1.79}
 90%|████████████████████████████████████████████████████████████████████████▌        | 18010/20117 [11:28:56<1:18:40,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18011/20117 [11:28:58<1:18:17,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18012/20117 [11:29:00<1:18:37,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18013/20117 [11:29:02<1:18:23,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18014/20117 [11:29:05<1:18:20,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18015/20117 [11:29:07<1:17:49,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18016/20117 [11:29:09<1:18:04,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18017/20117 [11:29:11<1:18:44,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18018/20117 [11:29:13<1:18:08,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18019/20117 [11:29:16<1:18:29,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18020/20117 [11:29:18<1:17:52,  2.23s/it]                                                                                                                                 {'loss': 0.1748, 'grad_norm': 0.6940702199935913, 'learning_rate': 5.372238773285931e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.58, 'epoch': 1.79}
 90%|████████████████████████████████████████████████████████████████████████▌        | 18020/20117 [11:29:18<1:17:52,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18021/20117 [11:29:20<1:17:56,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18022/20117 [11:29:22<1:17:40,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18023/20117 [11:29:25<1:18:32,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18024/20117 [11:29:27<1:17:56,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18025/20117 [11:29:29<1:18:05,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18026/20117 [11:29:31<1:17:34,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18027/20117 [11:29:34<1:17:33,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18028/20117 [11:29:36<1:17:30,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18029/20117 [11:29:38<1:17:22,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18030/20117 [11:29:40<1:17:33,  2.23s/it]                                                                                                                                 {'loss': 0.1468, 'grad_norm': 0.558322548866272, 'learning_rate': 5.321605936046947e-06, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 305.22, 'epoch': 1.79}
 90%|████████████████████████████████████████████████████████████████████████▌        | 18030/20117 [11:29:40<1:17:33,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18031/20117 [11:29:42<1:17:48,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18032/20117 [11:29:45<1:17:59,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18033/20117 [11:29:47<1:17:25,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18034/20117 [11:29:49<1:17:10,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18035/20117 [11:29:51<1:17:05,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18036/20117 [11:29:54<1:17:13,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▌        | 18037/20117 [11:29:56<1:16:56,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18038/20117 [11:29:58<1:17:23,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18039/20117 [11:30:00<1:17:50,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18040/20117 [11:30:03<1:17:44,  2.25s/it]                                                                                                                                 {'loss': 0.1207, 'grad_norm': 0.3536572754383087, 'learning_rate': 5.271206311703281e-06, 'memory/max_active (GiB)': 19.19, 'memory/max_allocated (GiB)': 19.19, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.32, 'epoch': 1.79}
 90%|████████████████████████████████████████████████████████████████████████▋        | 18040/20117 [11:30:03<1:17:44,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18041/20117 [11:30:05<1:17:55,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18042/20117 [11:30:07<1:17:48,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18043/20117 [11:30:09<1:17:19,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18044/20117 [11:30:12<1:19:53,  2.31s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18045/20117 [11:30:14<1:18:54,  2.28s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18046/20117 [11:30:16<1:18:56,  2.29s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18047/20117 [11:30:19<1:18:06,  2.26s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18048/20117 [11:30:21<1:17:33,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18049/20117 [11:30:23<1:17:06,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18050/20117 [11:30:25<1:16:43,  2.23s/it]                                                                                                                                 {'loss': 0.1664, 'grad_norm': 0.5200352668762207, 'learning_rate': 5.221040024399848e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 281.71, 'epoch': 1.79}
 90%|████████████████████████████████████████████████████████████████████████▋        | 18050/20117 [11:30:25<1:16:43,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18051/20117 [11:30:27<1:16:48,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18052/20117 [11:30:30<1:16:43,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18053/20117 [11:30:32<1:17:10,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18054/20117 [11:30:34<1:16:33,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18055/20117 [11:30:36<1:16:46,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18056/20117 [11:30:39<1:16:39,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18057/20117 [11:30:41<1:16:54,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18058/20117 [11:30:43<1:17:01,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18059/20117 [11:30:45<1:17:06,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18060/20117 [11:30:48<1:17:09,  2.25s/it]                                                                                                                                 {'loss': 0.1725, 'grad_norm': 0.3343510329723358, 'learning_rate': 5.171107197706837e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.94, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▋        | 18060/20117 [11:30:48<1:17:09,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18061/20117 [11:30:50<1:16:34,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18062/20117 [11:30:52<1:16:01,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18063/20117 [11:30:54<1:15:37,  2.21s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18064/20117 [11:30:56<1:15:59,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18065/20117 [11:30:59<1:15:40,  2.21s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18066/20117 [11:31:01<1:16:40,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18067/20117 [11:31:03<1:16:29,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▋        | 18068/20117 [11:31:05<1:17:25,  2.27s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18069/20117 [11:31:08<1:16:48,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18070/20117 [11:31:10<1:16:28,  2.24s/it]                                                                                                                                 {'loss': 0.1521, 'grad_norm': 0.4875069856643677, 'learning_rate': 5.121407954619339e-06, 'memory/max_active (GiB)': 19.81, 'memory/max_allocated (GiB)': 19.81, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.58, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▊        | 18070/20117 [11:31:10<1:16:28,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18071/20117 [11:31:12<1:16:08,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18072/20117 [11:31:14<1:17:02,  2.26s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18073/20117 [11:31:17<1:16:38,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18074/20117 [11:31:19<1:16:28,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18075/20117 [11:31:21<1:16:21,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18076/20117 [11:31:23<1:15:59,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18077/20117 [11:31:26<1:15:33,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18078/20117 [11:31:28<1:15:07,  2.21s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18079/20117 [11:31:30<1:15:41,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18080/20117 [11:31:32<1:15:15,  2.22s/it]                                                                                                                                 {'loss': 0.1604, 'grad_norm': 0.524474024772644, 'learning_rate': 5.071942417557096e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 386.02, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▊        | 18080/20117 [11:31:32<1:15:15,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18081/20117 [11:31:34<1:15:52,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18082/20117 [11:31:37<1:15:36,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18083/20117 [11:31:39<1:15:21,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18084/20117 [11:31:41<1:15:28,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18085/20117 [11:31:43<1:15:46,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18086/20117 [11:31:46<1:15:18,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18087/20117 [11:31:48<1:15:09,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18088/20117 [11:31:50<1:14:52,  2.21s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18089/20117 [11:31:52<1:15:25,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18090/20117 [11:31:54<1:15:18,  2.23s/it]                                                                                                                                 {'loss': 0.1682, 'grad_norm': 0.3819403052330017, 'learning_rate': 5.0227107083641756e-06, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.67, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▊        | 18090/20117 [11:31:54<1:15:18,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18091/20117 [11:31:57<1:14:57,  2.22s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18092/20117 [11:31:59<1:14:33,  2.21s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18093/20117 [11:32:01<1:15:14,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18094/20117 [11:32:03<1:15:14,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18095/20117 [11:32:06<1:15:08,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18096/20117 [11:32:08<1:15:46,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18097/20117 [11:32:10<1:18:18,  2.33s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18098/20117 [11:32:13<1:17:28,  2.30s/it] 90%|████████████████████████████████████████████████████████████████████████▊        | 18099/20117 [11:32:15<1:16:16,  2.27s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18100/20117 [11:32:17<1:15:53,  2.26s/it]                                                                                                                                 {'loss': 0.1726, 'grad_norm': 0.4870153069496155, 'learning_rate': 4.9737129483086845e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 409.71, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▉        | 18100/20117 [11:32:17<1:15:53,  2.26s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18101/20117 [11:32:19<1:15:39,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18102/20117 [11:32:22<1:16:04,  2.27s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18103/20117 [11:32:24<1:15:39,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18104/20117 [11:32:26<1:15:15,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18105/20117 [11:32:28<1:15:03,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18106/20117 [11:32:31<1:15:30,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18107/20117 [11:32:33<1:15:16,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18108/20117 [11:32:35<1:15:13,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18109/20117 [11:32:37<1:15:20,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18110/20117 [11:32:40<1:15:29,  2.26s/it]                                                                                                                                 {'loss': 0.1807, 'grad_norm': 0.48295846581459045, 'learning_rate': 4.924949258082468e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 386.2, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▉        | 18110/20117 [11:32:40<1:15:29,  2.26s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18111/20117 [11:32:42<1:15:08,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18112/20117 [11:32:44<1:15:06,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18113/20117 [11:32:46<1:14:46,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18114/20117 [11:32:49<1:15:05,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18115/20117 [11:32:51<1:14:45,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18116/20117 [11:32:53<1:14:41,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18117/20117 [11:32:55<1:14:23,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18118/20117 [11:32:57<1:14:34,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18119/20117 [11:33:00<1:14:44,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18120/20117 [11:33:02<1:14:32,  2.24s/it]                                                                                                                                 {'loss': 0.1368, 'grad_norm': 0.43393832445144653, 'learning_rate': 4.8764197578008095e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.61, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▉        | 18120/20117 [11:33:02<1:14:32,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18121/20117 [11:33:04<1:14:37,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18122/20117 [11:33:06<1:14:12,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18123/20117 [11:33:09<1:13:56,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18124/20117 [11:33:11<1:14:17,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18125/20117 [11:33:13<1:14:45,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18126/20117 [11:33:15<1:14:07,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18127/20117 [11:33:18<1:14:37,  2.25s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18128/20117 [11:33:20<1:14:02,  2.23s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18129/20117 [11:33:22<1:14:09,  2.24s/it] 90%|████████████████████████████████████████████████████████████████████████▉        | 18130/20117 [11:33:24<1:14:20,  2.25s/it]                                                                                                                                 {'loss': 0.141, 'grad_norm': 0.4767451584339142, 'learning_rate': 4.828124567002113e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.81, 'epoch': 1.8}
 90%|████████████████████████████████████████████████████████████████████████▉        | 18130/20117 [11:33:24<1:14:20,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18131/20117 [11:33:27<1:14:59,  2.27s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18132/20117 [11:33:29<1:14:22,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18133/20117 [11:33:31<1:13:50,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18134/20117 [11:33:33<1:13:24,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18135/20117 [11:33:36<1:13:38,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18136/20117 [11:33:38<1:13:21,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18137/20117 [11:33:40<1:13:34,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18138/20117 [11:33:42<1:13:39,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18139/20117 [11:33:44<1:13:55,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18140/20117 [11:33:47<1:14:09,  2.25s/it]                                                                                                                                 {'loss': 0.1356, 'grad_norm': 0.4560125768184662, 'learning_rate': 4.780063804647639e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.92, 'epoch': 1.8}
 90%|█████████████████████████████████████████████████████████████████████████        | 18140/20117 [11:33:47<1:14:09,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18141/20117 [11:33:49<1:14:00,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18142/20117 [11:33:51<1:14:12,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18143/20117 [11:33:53<1:14:01,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18144/20117 [11:33:56<1:13:22,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18145/20117 [11:33:58<1:13:27,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18146/20117 [11:34:00<1:13:30,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18147/20117 [11:34:02<1:13:13,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18148/20117 [11:34:05<1:13:58,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18149/20117 [11:34:07<1:13:57,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18150/20117 [11:34:10<1:17:21,  2.36s/it]                                                                                                                                 {'loss': 0.1273, 'grad_norm': 0.23658177256584167, 'learning_rate': 4.732237589121202e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 289.7, 'epoch': 1.8}
 90%|█████████████████████████████████████████████████████████████████████████        | 18150/20117 [11:34:10<1:17:21,  2.36s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18151/20117 [11:34:12<1:16:47,  2.34s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18152/20117 [11:34:14<1:15:15,  2.30s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18153/20117 [11:34:16<1:14:18,  2.27s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18154/20117 [11:34:18<1:13:49,  2.26s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18155/20117 [11:34:21<1:14:28,  2.28s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18156/20117 [11:34:23<1:15:03,  2.30s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18157/20117 [11:34:25<1:13:56,  2.26s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18158/20117 [11:34:28<1:13:41,  2.26s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18159/20117 [11:34:30<1:13:00,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18160/20117 [11:34:32<1:13:12,  2.24s/it]                                                                                                                                 {'loss': 0.1614, 'grad_norm': 0.310188353061676, 'learning_rate': 4.684646038228891e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.62, 'epoch': 1.81}
 90%|█████████████████████████████████████████████████████████████████████████        | 18160/20117 [11:34:32<1:13:12,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████        | 18161/20117 [11:34:34<1:12:53,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18162/20117 [11:34:36<1:12:21,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18163/20117 [11:34:39<1:12:04,  2.21s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18164/20117 [11:34:41<1:11:53,  2.21s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18165/20117 [11:34:43<1:12:19,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18166/20117 [11:34:45<1:12:13,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18167/20117 [11:34:48<1:12:16,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18168/20117 [11:34:50<1:12:17,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18169/20117 [11:34:52<1:12:08,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18170/20117 [11:34:54<1:11:55,  2.22s/it]                                                                                                                                 {'loss': 0.1765, 'grad_norm': 0.49459826946258545, 'learning_rate': 4.6372892691987525e-06, 'memory/max_active (GiB)': 19.67, 'memory/max_allocated (GiB)': 19.67, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.47, 'epoch': 1.81}
 90%|█████████████████████████████████████████████████████████████████████████▏       | 18170/20117 [11:34:54<1:11:55,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18171/20117 [11:34:56<1:11:48,  2.21s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18172/20117 [11:34:59<1:11:40,  2.21s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18173/20117 [11:35:01<1:11:57,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18174/20117 [11:35:03<1:11:59,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18175/20117 [11:35:05<1:12:14,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18176/20117 [11:35:08<1:12:27,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18177/20117 [11:35:10<1:12:41,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18178/20117 [11:35:12<1:12:44,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18179/20117 [11:35:14<1:12:40,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18180/20117 [11:35:17<1:12:36,  2.25s/it]                                                                                                                                 {'loss': 0.1767, 'grad_norm': 0.6684255599975586, 'learning_rate': 4.5901673986804896e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.86, 'epoch': 1.81}
 90%|█████████████████████████████████████████████████████████████████████████▏       | 18180/20117 [11:35:17<1:12:36,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18181/20117 [11:35:19<1:12:46,  2.26s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18182/20117 [11:35:21<1:12:21,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18183/20117 [11:35:23<1:12:20,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18184/20117 [11:35:26<1:11:49,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18185/20117 [11:35:28<1:12:08,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18186/20117 [11:35:30<1:12:27,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18187/20117 [11:35:32<1:12:19,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18188/20117 [11:35:35<1:11:55,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18189/20117 [11:35:37<1:11:44,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18190/20117 [11:35:39<1:11:40,  2.23s/it]                                                                                                                                 {'loss': 0.131, 'grad_norm': 0.46647799015045166, 'learning_rate': 4.5432805427452765e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.2, 'epoch': 1.81}
 90%|█████████████████████████████████████████████████████████████████████████▏       | 18190/20117 [11:35:39<1:11:40,  2.23s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18191/20117 [11:35:41<1:11:50,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▏       | 18192/20117 [11:35:44<1:12:29,  2.26s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18193/20117 [11:35:46<1:12:11,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18194/20117 [11:35:48<1:12:42,  2.27s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18195/20117 [11:35:50<1:12:47,  2.27s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18196/20117 [11:35:53<1:12:24,  2.26s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18197/20117 [11:35:55<1:11:45,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18198/20117 [11:35:57<1:11:42,  2.24s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18199/20117 [11:35:59<1:11:02,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18200/20117 [11:36:01<1:10:58,  2.22s/it]                                                                                                                                 {'loss': 0.145, 'grad_norm': 0.5966955423355103, 'learning_rate': 4.496628816885318e-06, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 401.16, 'epoch': 1.81}
 90%|█████████████████████████████████████████████████████████████████████████▎       | 18200/20117 [11:36:01<1:10:58,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18201/20117 [11:36:04<1:10:58,  2.22s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18202/20117 [11:36:06<1:11:40,  2.25s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18203/20117 [11:36:08<1:13:38,  2.31s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18204/20117 [11:36:11<1:12:48,  2.28s/it] 90%|█████████████████████████████████████████████████████████████████████████▎       | 18205/20117 [11:36:13<1:12:52,  2.29s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18206/20117 [11:36:15<1:12:30,  2.28s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18207/20117 [11:36:17<1:11:51,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18208/20117 [11:36:20<1:11:13,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18209/20117 [11:36:22<1:10:55,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18210/20117 [11:36:24<1:10:35,  2.22s/it]                                                                                                                                 {'loss': 0.1557, 'grad_norm': 0.37769100069999695, 'learning_rate': 4.450212336013681e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 335.53, 'epoch': 1.81}
 91%|█████████████████████████████████████████████████████████████████████████▎       | 18210/20117 [11:36:24<1:10:35,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18211/20117 [11:36:26<1:10:59,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18212/20117 [11:36:29<1:11:06,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18213/20117 [11:36:31<1:10:33,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18214/20117 [11:36:33<1:10:21,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18215/20117 [11:36:35<1:09:58,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18216/20117 [11:36:37<1:10:08,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18217/20117 [11:36:40<1:10:29,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18218/20117 [11:36:42<1:11:05,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18219/20117 [11:36:44<1:10:55,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18220/20117 [11:36:46<1:11:11,  2.25s/it]                                                                                                                                 {'loss': 0.1208, 'grad_norm': 0.4889051616191864, 'learning_rate': 4.404031214463966e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.5, 'epoch': 1.81}
 91%|█████████████████████████████████████████████████████████████████████████▎       | 18220/20117 [11:36:46<1:11:11,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18221/20117 [11:36:49<1:10:43,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18222/20117 [11:36:51<1:10:29,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▎       | 18223/20117 [11:36:53<1:11:01,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18224/20117 [11:36:55<1:11:02,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18225/20117 [11:36:58<1:10:53,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18226/20117 [11:37:00<1:10:41,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18227/20117 [11:37:02<1:10:17,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18228/20117 [11:37:04<1:09:57,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18229/20117 [11:37:06<1:09:39,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18230/20117 [11:37:09<1:09:17,  2.20s/it]                                                                                                                                 {'loss': 0.1608, 'grad_norm': 0.4089222848415375, 'learning_rate': 4.358085565990044e-06, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 362.98, 'epoch': 1.81}
 91%|█████████████████████████████████████████████████████████████████████████▍       | 18230/20117 [11:37:09<1:09:17,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18231/20117 [11:37:11<1:09:16,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18232/20117 [11:37:13<1:08:55,  2.19s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18233/20117 [11:37:15<1:08:59,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18234/20117 [11:37:17<1:09:51,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18235/20117 [11:37:20<1:10:04,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18236/20117 [11:37:22<1:10:03,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18237/20117 [11:37:24<1:09:44,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18238/20117 [11:37:26<1:09:32,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18239/20117 [11:37:29<1:09:35,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18240/20117 [11:37:31<1:10:08,  2.24s/it]                                                                                                                                 {'loss': 0.1807, 'grad_norm': 0.7150986194610596, 'learning_rate': 4.312375503765742e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.29, 'epoch': 1.81}
 91%|█████████████████████████████████████████████████████████████████████████▍       | 18240/20117 [11:37:31<1:10:08,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18241/20117 [11:37:33<1:09:49,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18242/20117 [11:37:35<1:09:24,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18243/20117 [11:37:38<1:09:38,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18244/20117 [11:37:40<1:09:11,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18245/20117 [11:37:42<1:09:14,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18246/20117 [11:37:44<1:09:20,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18247/20117 [11:37:46<1:08:58,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18248/20117 [11:37:49<1:09:21,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18249/20117 [11:37:51<1:09:48,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18250/20117 [11:37:53<1:10:18,  2.26s/it]                                                                                                                                 {'loss': 0.145, 'grad_norm': 0.5326732397079468, 'learning_rate': 4.266901140384616e-06, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.23, 'epoch': 1.81}
 91%|█████████████████████████████████████████████████████████████████████████▍       | 18250/20117 [11:37:53<1:10:18,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18251/20117 [11:37:55<1:09:44,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18252/20117 [11:37:58<1:09:33,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18253/20117 [11:38:00<1:09:06,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▍       | 18254/20117 [11:38:02<1:09:22,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18255/20117 [11:38:04<1:09:00,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18256/20117 [11:38:06<1:08:45,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18257/20117 [11:38:09<1:08:36,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18258/20117 [11:38:11<1:11:05,  2.29s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18259/20117 [11:38:13<1:10:30,  2.28s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18260/20117 [11:38:16<1:09:51,  2.26s/it]                                                                                                                                 {'loss': 0.1534, 'grad_norm': 0.5794299840927124, 'learning_rate': 4.221662587859631e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 334.07, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▌       | 18260/20117 [11:38:16<1:09:51,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18261/20117 [11:38:18<1:09:16,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18262/20117 [11:38:20<1:09:17,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18263/20117 [11:38:22<1:08:56,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18264/20117 [11:38:24<1:08:37,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18265/20117 [11:38:27<1:08:36,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18266/20117 [11:38:29<1:08:30,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18267/20117 [11:38:31<1:08:10,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18268/20117 [11:38:33<1:07:54,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18269/20117 [11:38:35<1:07:49,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18270/20117 [11:38:38<1:08:24,  2.22s/it]                                                                                                                                 {'loss': 0.189, 'grad_norm': 0.4839019477367401, 'learning_rate': 4.1766599576229195e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.54, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▌       | 18270/20117 [11:38:38<1:08:24,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18271/20117 [11:38:40<1:08:19,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18272/20117 [11:38:42<1:08:01,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18273/20117 [11:38:44<1:08:01,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18274/20117 [11:38:47<1:07:44,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18275/20117 [11:38:49<1:07:36,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18276/20117 [11:38:51<1:07:39,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18277/20117 [11:38:53<1:07:31,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18278/20117 [11:38:55<1:07:31,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18279/20117 [11:38:58<1:07:55,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18280/20117 [11:39:00<1:08:30,  2.24s/it]                                                                                                                                 {'loss': 0.1432, 'grad_norm': 0.5688499212265015, 'learning_rate': 4.131893360525452e-06, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.5, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▌       | 18280/20117 [11:39:00<1:08:30,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18281/20117 [11:39:02<1:08:00,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18282/20117 [11:39:04<1:08:37,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18283/20117 [11:39:07<1:08:56,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18284/20117 [11:39:09<1:08:59,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▌       | 18285/20117 [11:39:11<1:09:16,  2.27s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18286/20117 [11:39:14<1:09:18,  2.27s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18287/20117 [11:39:16<1:09:34,  2.28s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18288/20117 [11:39:18<1:08:49,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18289/20117 [11:39:20<1:08:35,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18290/20117 [11:39:22<1:08:14,  2.24s/it]                                                                                                                                 {'loss': 0.1351, 'grad_norm': 0.5413910150527954, 'learning_rate': 4.087362906836812e-06, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 383.78, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▋       | 18290/20117 [11:39:22<1:08:14,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18291/20117 [11:39:25<1:07:49,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18292/20117 [11:39:27<1:08:00,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18293/20117 [11:39:29<1:07:37,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18294/20117 [11:39:31<1:07:54,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18295/20117 [11:39:34<1:07:54,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18296/20117 [11:39:36<1:07:39,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18297/20117 [11:39:38<1:07:33,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18298/20117 [11:39:40<1:07:14,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18299/20117 [11:39:42<1:07:20,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18300/20117 [11:39:45<1:07:02,  2.21s/it]                                                                                                                                 {'loss': 0.1253, 'grad_norm': 0.4739817678928375, 'learning_rate': 4.043068706244957e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.75, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▋       | 18300/20117 [11:39:45<1:07:02,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18301/20117 [11:39:47<1:06:49,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18302/20117 [11:39:49<1:07:13,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18303/20117 [11:39:51<1:07:34,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18304/20117 [11:39:54<1:07:43,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18305/20117 [11:39:56<1:07:46,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18306/20117 [11:39:58<1:07:18,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18307/20117 [11:40:00<1:07:14,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18308/20117 [11:40:03<1:07:20,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18309/20117 [11:40:05<1:08:05,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18310/20117 [11:40:07<1:08:50,  2.29s/it]                                                                                                                                 {'loss': 0.1211, 'grad_norm': 0.34818801283836365, 'learning_rate': 3.999010867855812e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 319.55, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▋       | 18310/20117 [11:40:07<1:08:50,  2.29s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18311/20117 [11:40:09<1:08:09,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18312/20117 [11:40:12<1:10:38,  2.35s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18313/20117 [11:40:14<1:09:30,  2.31s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18314/20117 [11:40:17<1:09:18,  2.31s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18315/20117 [11:40:19<1:08:25,  2.28s/it] 91%|█████████████████████████████████████████████████████████████████████████▋       | 18316/20117 [11:40:21<1:08:34,  2.28s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18317/20117 [11:40:23<1:08:01,  2.27s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18318/20117 [11:40:25<1:07:32,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18319/20117 [11:40:28<1:07:24,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18320/20117 [11:40:30<1:06:58,  2.24s/it]                                                                                                                                 {'loss': 0.1494, 'grad_norm': 0.692557692527771, 'learning_rate': 3.955189500193191e-06, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.18, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▊       | 18320/20117 [11:40:30<1:06:58,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18321/20117 [11:40:32<1:07:01,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18322/20117 [11:40:34<1:06:44,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18323/20117 [11:40:37<1:06:52,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18324/20117 [11:40:39<1:07:08,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18325/20117 [11:40:41<1:07:05,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18326/20117 [11:40:43<1:06:35,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18327/20117 [11:40:46<1:06:49,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18328/20117 [11:40:48<1:06:45,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18329/20117 [11:40:50<1:06:16,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18330/20117 [11:40:52<1:06:00,  2.22s/it]                                                                                                                                 {'loss': 0.1781, 'grad_norm': 0.713403582572937, 'learning_rate': 3.911604711198358e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.29, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▊       | 18330/20117 [11:40:52<1:06:00,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18331/20117 [11:40:55<1:06:38,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18332/20117 [11:40:57<1:06:20,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18333/20117 [11:40:59<1:06:30,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18334/20117 [11:41:01<1:06:03,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18335/20117 [11:41:03<1:06:17,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18336/20117 [11:41:06<1:06:06,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18337/20117 [11:41:08<1:05:56,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18338/20117 [11:41:10<1:05:39,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18339/20117 [11:41:12<1:05:39,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18340/20117 [11:41:15<1:05:59,  2.23s/it]                                                                                                                                 {'loss': 0.1802, 'grad_norm': 0.5273980498313904, 'learning_rate': 3.8682566082298695e-06, 'memory/max_active (GiB)': 18.84, 'memory/max_allocated (GiB)': 18.84, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.26, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▊       | 18340/20117 [11:41:15<1:05:59,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18341/20117 [11:41:17<1:05:48,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18342/20117 [11:41:19<1:05:41,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18343/20117 [11:41:21<1:05:22,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18344/20117 [11:41:23<1:05:16,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18345/20117 [11:41:26<1:05:35,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18346/20117 [11:41:28<1:05:12,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▊       | 18347/20117 [11:41:30<1:04:57,  2.20s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18348/20117 [11:41:32<1:05:31,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18349/20117 [11:41:34<1:05:08,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18350/20117 [11:41:37<1:05:32,  2.23s/it]                                                                                                                                 {'loss': 0.1553, 'grad_norm': 0.4756382703781128, 'learning_rate': 3.825145298063249e-06, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.84, 'epoch': 1.82}
 91%|█████████████████████████████████████████████████████████████████████████▉       | 18350/20117 [11:41:37<1:05:32,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18351/20117 [11:41:39<1:05:32,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18352/20117 [11:41:41<1:05:03,  2.21s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18353/20117 [11:41:43<1:05:15,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18354/20117 [11:41:46<1:05:05,  2.22s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18355/20117 [11:41:48<1:06:00,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18356/20117 [11:41:50<1:05:47,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18357/20117 [11:41:52<1:05:28,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18358/20117 [11:41:55<1:05:35,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18359/20117 [11:41:57<1:05:22,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18360/20117 [11:41:59<1:05:20,  2.23s/it]                                                                                                                                 {'loss': 0.1831, 'grad_norm': 0.27914801239967346, 'learning_rate': 3.782270886890793e-06, 'memory/max_active (GiB)': 21.37, 'memory/max_allocated (GiB)': 21.37, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.0, 'epoch': 1.83}
 91%|█████████████████████████████████████████████████████████████████████████▉       | 18360/20117 [11:41:59<1:05:20,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18361/20117 [11:42:01<1:05:09,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18362/20117 [11:42:03<1:05:18,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18363/20117 [11:42:06<1:07:29,  2.31s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18364/20117 [11:42:08<1:07:02,  2.29s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18365/20117 [11:42:10<1:06:53,  2.29s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18366/20117 [11:42:13<1:06:42,  2.29s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18367/20117 [11:42:15<1:06:07,  2.27s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18368/20117 [11:42:17<1:05:53,  2.26s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18369/20117 [11:42:19<1:05:36,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18370/20117 [11:42:22<1:05:37,  2.25s/it]                                                                                                                                 {'loss': 0.2023, 'grad_norm': 0.6465624570846558, 'learning_rate': 3.739633480321214e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 395.46, 'epoch': 1.83}
 91%|█████████████████████████████████████████████████████████████████████████▉       | 18370/20117 [11:42:22<1:05:37,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18371/20117 [11:42:24<1:05:23,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18372/20117 [11:42:26<1:04:52,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18373/20117 [11:42:28<1:04:44,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18374/20117 [11:42:31<1:04:57,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18375/20117 [11:42:33<1:04:49,  2.23s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18376/20117 [11:42:35<1:05:20,  2.25s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18377/20117 [11:42:37<1:04:49,  2.24s/it] 91%|█████████████████████████████████████████████████████████████████████████▉       | 18378/20117 [11:42:40<1:04:30,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18379/20117 [11:42:42<1:04:34,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18380/20117 [11:42:44<1:04:45,  2.24s/it]                                                                                                                                 {'loss': 0.189, 'grad_norm': 0.5615578293800354, 'learning_rate': 3.697233183379467e-06, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.47, 'epoch': 1.83}
 91%|██████████████████████████████████████████████████████████████████████████       | 18380/20117 [11:42:44<1:04:45,  2.24s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18381/20117 [11:42:46<1:04:40,  2.24s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18382/20117 [11:42:48<1:04:08,  2.22s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18383/20117 [11:42:51<1:03:58,  2.21s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18384/20117 [11:42:53<1:04:35,  2.24s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18385/20117 [11:42:55<1:04:55,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18386/20117 [11:42:57<1:05:02,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18387/20117 [11:43:00<1:05:04,  2.26s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18388/20117 [11:43:02<1:04:54,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18389/20117 [11:43:04<1:04:50,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18390/20117 [11:43:06<1:04:33,  2.24s/it]                                                                                                                                 {'loss': 0.1616, 'grad_norm': 0.5241686701774597, 'learning_rate': 3.6550701005064413e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.72, 'epoch': 1.83}
 91%|██████████████████████████████████████████████████████████████████████████       | 18390/20117 [11:43:06<1:04:33,  2.24s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18391/20117 [11:43:09<1:04:36,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18392/20117 [11:43:11<1:04:27,  2.24s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18393/20117 [11:43:13<1:04:07,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18394/20117 [11:43:15<1:04:01,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18395/20117 [11:43:18<1:04:04,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18396/20117 [11:43:20<1:03:47,  2.22s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18397/20117 [11:43:22<1:03:39,  2.22s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18398/20117 [11:43:24<1:04:01,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18399/20117 [11:43:27<1:03:54,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18400/20117 [11:43:29<1:04:30,  2.25s/it]                                                                                                                                 {'loss': 0.1789, 'grad_norm': 0.37639015913009644, 'learning_rate': 3.613144335558738e-06, 'memory/max_active (GiB)': 18.85, 'memory/max_allocated (GiB)': 18.85, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.78, 'epoch': 1.83}
 91%|██████████████████████████████████████████████████████████████████████████       | 18400/20117 [11:43:29<1:04:30,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18401/20117 [11:43:31<1:04:18,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18402/20117 [11:43:33<1:04:19,  2.25s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18403/20117 [11:43:36<1:03:47,  2.23s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18404/20117 [11:43:38<1:03:20,  2.22s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18405/20117 [11:43:40<1:03:02,  2.21s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18406/20117 [11:43:42<1:03:02,  2.21s/it] 91%|██████████████████████████████████████████████████████████████████████████       | 18407/20117 [11:43:44<1:03:31,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████       | 18408/20117 [11:43:47<1:03:37,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████       | 18409/20117 [11:43:49<1:03:32,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18410/20117 [11:43:51<1:03:31,  2.23s/it]                                                                                                                                 {'loss': 0.1619, 'grad_norm': 0.3994421064853668, 'learning_rate': 3.571455991808348e-06, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 387.6, 'epoch': 1.83}
 92%|██████████████████████████████████████████████████████████████████████████▏      | 18410/20117 [11:43:51<1:03:31,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18411/20117 [11:43:53<1:03:44,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18412/20117 [11:43:56<1:03:36,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18413/20117 [11:43:58<1:03:41,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18414/20117 [11:44:00<1:03:18,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18415/20117 [11:44:02<1:05:25,  2.31s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18416/20117 [11:44:05<1:04:59,  2.29s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18417/20117 [11:44:07<1:04:03,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18418/20117 [11:44:09<1:03:29,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18419/20117 [11:44:11<1:03:57,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18420/20117 [11:44:14<1:03:19,  2.24s/it]                                                                                                                                 {'loss': 0.137, 'grad_norm': 0.6425352692604065, 'learning_rate': 3.5300051719424854e-06, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.83, 'epoch': 1.83}
 92%|██████████████████████████████████████████████████████████████████████████▏      | 18420/20117 [11:44:14<1:03:19,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18421/20117 [11:44:16<1:03:12,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18422/20117 [11:44:18<1:03:05,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18423/20117 [11:44:20<1:02:51,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18424/20117 [11:44:23<1:02:39,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18425/20117 [11:44:25<1:02:35,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18426/20117 [11:44:27<1:02:51,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18427/20117 [11:44:29<1:02:38,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18428/20117 [11:44:31<1:02:47,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18429/20117 [11:44:34<1:03:17,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18430/20117 [11:44:36<1:03:08,  2.25s/it]                                                                                                                                 {'loss': 0.1569, 'grad_norm': 0.4775727391242981, 'learning_rate': 3.4887919780632995e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.11, 'epoch': 1.83}
 92%|██████████████████████████████████████████████████████████████████████████▏      | 18430/20117 [11:44:36<1:03:08,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18431/20117 [11:44:38<1:03:17,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18432/20117 [11:44:40<1:02:59,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18433/20117 [11:44:43<1:02:54,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18434/20117 [11:44:45<1:03:04,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18435/20117 [11:44:47<1:02:42,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18436/20117 [11:44:49<1:03:19,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18437/20117 [11:44:52<1:02:52,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18438/20117 [11:44:54<1:02:25,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18439/20117 [11:44:56<1:02:23,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▏      | 18440/20117 [11:44:58<1:02:32,  2.24s/it]                                                                                                                                 {'loss': 0.1745, 'grad_norm': 0.37590330839157104, 'learning_rate': 3.4478165116875626e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 426.6, 'epoch': 1.83}
 92%|██████████████████████████████████████████████████████████████████████████▏      | 18440/20117 [11:44:58<1:02:32,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18441/20117 [11:45:01<1:02:29,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18442/20117 [11:45:03<1:02:10,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18443/20117 [11:45:05<1:02:15,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18444/20117 [11:45:07<1:01:52,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18445/20117 [11:45:09<1:01:36,  2.21s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18446/20117 [11:45:12<1:01:33,  2.21s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18447/20117 [11:45:14<1:02:04,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18448/20117 [11:45:16<1:01:33,  2.21s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18449/20117 [11:45:18<1:02:00,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18450/20117 [11:45:21<1:02:12,  2.24s/it]                                                                                                                                 {'loss': 0.1438, 'grad_norm': 0.47229719161987305, 'learning_rate': 3.4070788737465497e-06, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.96, 'epoch': 1.83}
 92%|██████████████████████████████████████████████████████████████████████████▎      | 18450/20117 [11:45:21<1:02:12,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18451/20117 [11:45:23<1:02:08,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18452/20117 [11:45:25<1:02:04,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18453/20117 [11:45:27<1:01:51,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18454/20117 [11:45:30<1:01:50,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18455/20117 [11:45:32<1:01:47,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18456/20117 [11:45:34<1:01:50,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18457/20117 [11:45:36<1:01:37,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18458/20117 [11:45:38<1:01:44,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18459/20117 [11:45:41<1:02:05,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18460/20117 [11:45:43<1:01:39,  2.23s/it]                                                                                                                                 {'loss': 0.151, 'grad_norm': 0.5609657764434814, 'learning_rate': 3.3665791645856258e-06, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.0, 'epoch': 1.84}
 92%|██████████████████████████████████████████████████████████████████████████▎      | 18460/20117 [11:45:43<1:01:39,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18461/20117 [11:45:45<1:02:14,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18462/20117 [11:45:47<1:01:45,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18463/20117 [11:45:50<1:01:45,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18464/20117 [11:45:52<1:01:27,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18465/20117 [11:45:54<1:01:36,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18466/20117 [11:45:56<1:01:58,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18467/20117 [11:45:59<1:04:33,  2.35s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18468/20117 [11:46:01<1:03:27,  2.31s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18469/20117 [11:46:03<1:02:27,  2.27s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18470/20117 [11:46:06<1:02:52,  2.29s/it]                                                                                                                                 {'loss': 0.1469, 'grad_norm': 0.5257097482681274, 'learning_rate': 3.326317483964181e-06, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 425.66, 'epoch': 1.84}
 92%|██████████████████████████████████████████████████████████████████████████▎      | 18470/20117 [11:46:06<1:02:52,  2.29s/it] 92%|██████████████████████████████████████████████████████████████████████████▎      | 18471/20117 [11:46:08<1:02:45,  2.29s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18472/20117 [11:46:10<1:02:24,  2.28s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18473/20117 [11:46:13<1:02:27,  2.28s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18474/20117 [11:46:15<1:01:50,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18475/20117 [11:46:17<1:01:48,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18476/20117 [11:46:19<1:01:28,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18477/20117 [11:46:22<1:01:40,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18478/20117 [11:46:24<1:01:33,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18479/20117 [11:46:26<1:00:58,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18480/20117 [11:46:28<1:00:49,  2.23s/it]                                                                                                                                 {'loss': 0.1525, 'grad_norm': 0.9030627608299255, 'learning_rate': 3.2862939310552065e-06, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.97, 'epoch': 1.84}
 92%|██████████████████████████████████████████████████████████████████████████▍      | 18480/20117 [11:46:28<1:00:49,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18481/20117 [11:46:30<1:00:28,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18482/20117 [11:46:33<1:00:37,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18483/20117 [11:46:35<1:00:29,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18484/20117 [11:46:37<1:00:27,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18485/20117 [11:46:39<1:00:15,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▎      | 18486/20117 [11:46:41<59:57,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▎      | 18487/20117 [11:46:44<59:57,  2.21s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18488/20117 [11:46:46<1:00:23,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18489/20117 [11:46:48<1:00:14,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18490/20117 [11:46:50<1:01:04,  2.25s/it]                                                                                                                                 {'loss': 0.1649, 'grad_norm': 0.3405112326145172, 'learning_rate': 3.2465086044451976e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.6, 'epoch': 1.84}
 92%|██████████████████████████████████████████████████████████████████████████▍      | 18490/20117 [11:46:50<1:01:04,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18491/20117 [11:46:53<1:00:51,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18492/20117 [11:46:55<1:00:22,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18493/20117 [11:46:57<1:00:26,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18494/20117 [11:46:59<1:00:32,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18495/20117 [11:47:02<1:00:26,  2.24s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18496/20117 [11:47:04<1:00:13,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18497/20117 [11:47:06<1:00:06,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18498/20117 [11:47:08<1:00:06,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18499/20117 [11:47:11<1:00:08,  2.23s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18500/20117 [11:47:13<1:00:30,  2.25s/it]                                                                                                                                 {'loss': 0.1433, 'grad_norm': 0.3417605459690094, 'learning_rate': 3.206961602133807e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.55, 'epoch': 1.84}
 92%|██████████████████████████████████████████████████████████████████████████▍      | 18500/20117 [11:47:13<1:00:30,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18501/20117 [11:47:15<1:00:56,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▍      | 18502/20117 [11:47:17<1:00:45,  2.26s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18503/20117 [11:47:20<1:00:38,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18504/20117 [11:47:22<1:00:24,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18505/20117 [11:47:24<1:00:22,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18506/20117 [11:47:26<1:00:17,  2.25s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18507/20117 [11:47:29<1:00:02,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▎      | 18508/20117 [11:47:31<59:49,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▎      | 18509/20117 [11:47:33<59:49,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▎      | 18510/20117 [11:47:35<59:33,  2.22s/it]                                                                                                                                 {'loss': 0.1569, 'grad_norm': 0.6065083742141724, 'learning_rate': 3.1676530215336675e-06, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.3, 'epoch': 1.84}
 92%|████████████████████████████████████████████████████████████████████████████▎      | 18510/20117 [11:47:35<59:33,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▎      | 18511/20117 [11:47:37<59:30,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18512/20117 [11:47:40<59:53,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18513/20117 [11:47:42<59:34,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18514/20117 [11:47:44<59:23,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18515/20117 [11:47:46<59:06,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18516/20117 [11:47:48<59:02,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18517/20117 [11:47:51<59:05,  2.22s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18518/20117 [11:47:53<1:01:34,  2.31s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18519/20117 [11:47:55<1:00:44,  2.28s/it] 92%|██████████████████████████████████████████████████████████████████████████▌      | 18520/20117 [11:47:58<1:00:02,  2.26s/it]                                                                                                                                 {'loss': 0.1324, 'grad_norm': 0.43219438195228577, 'learning_rate': 3.1285829594701165e-06, 'memory/max_active (GiB)': 18.86, 'memory/max_allocated (GiB)': 18.86, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.88, 'epoch': 1.84}
 92%|██████████████████████████████████████████████████████████████████████████▌      | 18520/20117 [11:47:58<1:00:02,  2.26s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18521/20117 [11:48:00<59:50,  2.25s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18522/20117 [11:48:02<59:27,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18523/20117 [11:48:04<59:26,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18524/20117 [11:48:07<59:37,  2.25s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18525/20117 [11:48:09<59:21,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18526/20117 [11:48:11<59:06,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18527/20117 [11:48:13<58:43,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18528/20117 [11:48:15<58:53,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18529/20117 [11:48:18<58:46,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18530/20117 [11:48:20<59:10,  2.24s/it]                                                                                                                                 {'loss': 0.1842, 'grad_norm': 0.58506840467453, 'learning_rate': 3.089751512180972e-06, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 414.46, 'epoch': 1.84}
 92%|████████████████████████████████████████████████████████████████████████████▍      | 18530/20117 [11:48:20<59:10,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18531/20117 [11:48:22<59:09,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18532/20117 [11:48:24<59:02,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18533/20117 [11:48:27<58:41,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18534/20117 [11:48:29<58:52,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18535/20117 [11:48:31<58:37,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18536/20117 [11:48:33<58:31,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18537/20117 [11:48:35<58:10,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18538/20117 [11:48:38<58:05,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18539/20117 [11:48:40<58:43,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18540/20117 [11:48:42<58:40,  2.23s/it]                                                                                                                                 {'loss': 0.1406, 'grad_norm': 0.4577626883983612, 'learning_rate': 3.0511587753163094e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 423.64, 'epoch': 1.84}
 92%|████████████████████████████████████████████████████████████████████████████▍      | 18540/20117 [11:48:42<58:40,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▍      | 18541/20117 [11:48:44<58:53,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18542/20117 [11:48:47<58:17,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18543/20117 [11:48:49<58:02,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18544/20117 [11:48:51<58:07,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18545/20117 [11:48:53<58:18,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18546/20117 [11:48:55<58:05,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18547/20117 [11:48:58<58:31,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18548/20117 [11:49:00<58:15,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18549/20117 [11:49:02<58:32,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18550/20117 [11:49:05<58:51,  2.25s/it]                                                                                                                                 {'loss': 0.1355, 'grad_norm': 0.6649712324142456, 'learning_rate': 3.0128048439381886e-06, 'memory/max_active (GiB)': 19.23, 'memory/max_allocated (GiB)': 19.23, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 401.52, 'epoch': 1.84}
 92%|████████████████████████████████████████████████████████████████████████████▌      | 18550/20117 [11:49:05<58:51,  2.25s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18551/20117 [11:49:07<58:59,  2.26s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18552/20117 [11:49:09<58:43,  2.25s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18553/20117 [11:49:11<58:39,  2.25s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18554/20117 [11:49:13<58:11,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18555/20117 [11:49:16<57:53,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18556/20117 [11:49:18<58:01,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18557/20117 [11:49:20<58:04,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18558/20117 [11:49:22<57:38,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18559/20117 [11:49:25<57:17,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18560/20117 [11:49:27<57:40,  2.22s/it]                                                                                                                                 {'loss': 0.1429, 'grad_norm': 0.3550589084625244, 'learning_rate': 2.974689812520448e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 331.54, 'epoch': 1.85}
 92%|████████████████████████████████████████████████████████████████████████████▌      | 18560/20117 [11:49:27<57:40,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18561/20117 [11:49:29<57:37,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18562/20117 [11:49:31<57:34,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18563/20117 [11:49:33<57:16,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18564/20117 [11:49:36<57:02,  2.20s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18565/20117 [11:49:38<56:54,  2.20s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18566/20117 [11:49:40<57:05,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18567/20117 [11:49:42<57:01,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18568/20117 [11:49:44<57:09,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18569/20117 [11:49:47<57:11,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18570/20117 [11:49:49<57:15,  2.22s/it]                                                                                                                                 {'loss': 0.185, 'grad_norm': 0.5486629605293274, 'learning_rate': 2.9368137749484547e-06, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.95, 'epoch': 1.85}
 92%|████████████████████████████████████████████████████████████████████████████▌      | 18570/20117 [11:49:49<57:15,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▌      | 18571/20117 [11:49:51<57:13,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18572/20117 [11:49:53<57:15,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18573/20117 [11:49:56<59:31,  2.31s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18574/20117 [11:49:58<58:58,  2.29s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18575/20117 [11:50:00<58:19,  2.27s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18576/20117 [11:50:03<57:37,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18577/20117 [11:50:05<58:11,  2.27s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18578/20117 [11:50:07<57:25,  2.24s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18579/20117 [11:50:09<57:11,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18580/20117 [11:50:11<56:56,  2.22s/it]                                                                                                                                 {'loss': 0.1472, 'grad_norm': 0.45112037658691406, 'learning_rate': 2.8991768245189232e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 388.42, 'epoch': 1.85}
 92%|████████████████████████████████████████████████████████████████████████████▋      | 18580/20117 [11:50:11<56:56,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18581/20117 [11:50:14<57:05,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18582/20117 [11:50:16<56:50,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18583/20117 [11:50:18<56:30,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18584/20117 [11:50:20<56:39,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18585/20117 [11:50:23<56:52,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18586/20117 [11:50:25<56:54,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18587/20117 [11:50:27<56:47,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18588/20117 [11:50:29<56:40,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18589/20117 [11:50:31<56:16,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18590/20117 [11:50:34<56:18,  2.21s/it]                                                                                                                                 {'loss': 0.1218, 'grad_norm': 0.40423816442489624, 'learning_rate': 2.861779053939595e-06, 'memory/max_active (GiB)': 18.85, 'memory/max_allocated (GiB)': 18.85, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.26, 'epoch': 1.85}
 92%|████████████████████████████████████████████████████████████████████████████▋      | 18590/20117 [11:50:34<56:18,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18591/20117 [11:50:36<56:20,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18592/20117 [11:50:38<56:25,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18593/20117 [11:50:40<56:23,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18594/20117 [11:50:43<56:26,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18595/20117 [11:50:45<56:30,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18596/20117 [11:50:47<56:15,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18597/20117 [11:50:49<56:12,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18598/20117 [11:50:51<55:54,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18599/20117 [11:50:54<56:07,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18600/20117 [11:50:56<56:03,  2.22s/it]                                                                                                                                 {'loss': 0.181, 'grad_norm': 0.49648118019104004, 'learning_rate': 2.824620555329094e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 349.07, 'epoch': 1.85}
 92%|████████████████████████████████████████████████████████████████████████████▋      | 18600/20117 [11:50:56<56:03,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18601/20117 [11:50:58<55:47,  2.21s/it] 92%|████████████████████████████████████████████████████████████████████████████▋      | 18602/20117 [11:51:00<56:00,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▊      | 18603/20117 [11:51:03<56:10,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▊      | 18604/20117 [11:51:05<56:04,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▊      | 18605/20117 [11:51:07<55:56,  2.22s/it] 92%|████████████████████████████████████████████████████████████████████████████▊      | 18606/20117 [11:51:09<56:02,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▊      | 18607/20117 [11:51:11<56:03,  2.23s/it] 92%|████████████████████████████████████████████████████████████████████████████▊      | 18608/20117 [11:51:14<56:03,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18609/20117 [11:51:16<56:06,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18610/20117 [11:51:18<56:27,  2.25s/it]                                                                                                                                 {'loss': 0.1188, 'grad_norm': 0.4115513265132904, 'learning_rate': 2.7877014202166372e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.58, 'epoch': 1.85}
 93%|████████████████████████████████████████████████████████████████████████████▊      | 18610/20117 [11:51:18<56:27,  2.25s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18611/20117 [11:51:20<56:14,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18612/20117 [11:51:23<56:22,  2.25s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18613/20117 [11:51:25<56:00,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18614/20117 [11:51:27<55:40,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18615/20117 [11:51:29<55:27,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18616/20117 [11:51:32<55:51,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18617/20117 [11:51:34<56:16,  2.25s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18618/20117 [11:51:36<55:41,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18619/20117 [11:51:38<55:52,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18620/20117 [11:51:41<56:21,  2.26s/it]                                                                                                                                 {'loss': 0.127, 'grad_norm': 0.5676191449165344, 'learning_rate': 2.7510217395418815e-06, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 435.5, 'epoch': 1.85}
 93%|████████████████████████████████████████████████████████████████████████████▊      | 18620/20117 [11:51:41<56:21,  2.26s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18621/20117 [11:51:43<55:55,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18622/20117 [11:51:45<55:45,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18623/20117 [11:51:47<55:37,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18624/20117 [11:51:49<55:34,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18625/20117 [11:51:52<55:16,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18626/20117 [11:51:54<55:04,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18627/20117 [11:51:56<57:34,  2.32s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18628/20117 [11:51:59<56:32,  2.28s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18629/20117 [11:52:01<56:11,  2.27s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18630/20117 [11:52:03<55:48,  2.25s/it]                                                                                                                                 {'loss': 0.14, 'grad_norm': 0.5204625725746155, 'learning_rate': 2.714581603654609e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 374.45, 'epoch': 1.85}
 93%|████████████████████████████████████████████████████████████████████████████▊      | 18630/20117 [11:52:03<55:48,  2.25s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18631/20117 [11:52:05<55:34,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▊      | 18632/20117 [11:52:07<55:25,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18633/20117 [11:52:10<55:05,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18634/20117 [11:52:12<54:47,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18635/20117 [11:52:14<54:51,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18636/20117 [11:52:16<54:37,  2.21s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18637/20117 [11:52:19<55:07,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18638/20117 [11:52:21<55:01,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18639/20117 [11:52:23<54:39,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18640/20117 [11:52:25<54:47,  2.23s/it]                                                                                                                                 {'loss': 0.1432, 'grad_norm': 0.5703979134559631, 'learning_rate': 2.6783811023145977e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 393.22, 'epoch': 1.85}
 93%|████████████████████████████████████████████████████████████████████████████▉      | 18640/20117 [11:52:25<54:47,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18641/20117 [11:52:27<54:22,  2.21s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18642/20117 [11:52:30<54:18,  2.21s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18643/20117 [11:52:32<54:03,  2.20s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18644/20117 [11:52:34<54:05,  2.20s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18645/20117 [11:52:36<54:31,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18646/20117 [11:52:39<54:57,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18647/20117 [11:52:41<54:37,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18648/20117 [11:52:43<54:21,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18649/20117 [11:52:45<54:40,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18650/20117 [11:52:47<54:16,  2.22s/it]                                                                                                                                 {'loss': 0.1862, 'grad_norm': 1.1127846240997314, 'learning_rate': 2.6424203246913194e-06, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 385.6, 'epoch': 1.85}
 93%|████████████████████████████████████████████████████████████████████████████▉      | 18650/20117 [11:52:47<54:16,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18651/20117 [11:52:50<54:20,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18652/20117 [11:52:52<54:20,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18653/20117 [11:52:54<54:10,  2.22s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18654/20117 [11:52:56<54:20,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18655/20117 [11:52:59<54:41,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18656/20117 [11:53:01<54:27,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18657/20117 [11:53:03<54:16,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18658/20117 [11:53:05<54:34,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18659/20117 [11:53:08<54:24,  2.24s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18660/20117 [11:53:10<54:09,  2.23s/it]                                                                                                                                 {'loss': 0.1694, 'grad_norm': 0.37595993280410767, 'learning_rate': 2.6066993593637844e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 387.27, 'epoch': 1.86}
 93%|████████████████████████████████████████████████████████████████████████████▉      | 18660/20117 [11:53:10<54:09,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18661/20117 [11:53:12<54:02,  2.23s/it] 93%|████████████████████████████████████████████████████████████████████████████▉      | 18662/20117 [11:53:14<54:28,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18663/20117 [11:53:16<53:57,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18664/20117 [11:53:19<53:52,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18665/20117 [11:53:21<53:46,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18666/20117 [11:53:23<54:19,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18667/20117 [11:53:25<54:07,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18668/20117 [11:53:28<53:54,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18669/20117 [11:53:30<53:58,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18670/20117 [11:53:32<53:59,  2.24s/it]                                                                                                                                 {'loss': 0.1011, 'grad_norm': 0.35933300852775574, 'learning_rate': 2.571218294320266e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.13, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████      | 18670/20117 [11:53:32<53:59,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18671/20117 [11:53:34<53:50,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18672/20117 [11:53:37<53:30,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18673/20117 [11:53:39<53:30,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18674/20117 [11:53:41<53:26,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18675/20117 [11:53:43<53:50,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18676/20117 [11:53:45<53:23,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18677/20117 [11:53:48<53:21,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18678/20117 [11:53:50<55:12,  2.30s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18679/20117 [11:53:52<54:29,  2.27s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18680/20117 [11:53:55<53:56,  2.25s/it]                                                                                                                                 {'loss': 0.1991, 'grad_norm': 0.5620598196983337, 'learning_rate': 2.5359772169581297e-06, 'memory/max_active (GiB)': 19.7, 'memory/max_allocated (GiB)': 19.7, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 389.06, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████      | 18680/20117 [11:53:55<53:56,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18681/20117 [11:53:57<53:44,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18682/20117 [11:53:59<53:53,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18683/20117 [11:54:01<53:52,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18684/20117 [11:54:04<53:45,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18685/20117 [11:54:06<53:11,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18686/20117 [11:54:08<53:19,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18687/20117 [11:54:10<52:59,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18688/20117 [11:54:12<52:59,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18689/20117 [11:54:15<52:59,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18690/20117 [11:54:17<53:14,  2.24s/it]                                                                                                                                 {'loss': 0.1497, 'grad_norm': 0.29674777388572693, 'learning_rate': 2.5009762140835947e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.51, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████      | 18690/20117 [11:54:17<53:14,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18691/20117 [11:54:19<53:08,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18692/20117 [11:54:21<52:58,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████      | 18693/20117 [11:54:24<52:36,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18694/20117 [11:54:26<52:52,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18695/20117 [11:54:28<52:45,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18696/20117 [11:54:30<52:28,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18697/20117 [11:54:32<52:31,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18698/20117 [11:54:35<52:26,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18699/20117 [11:54:37<52:26,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18700/20117 [11:54:39<52:23,  2.22s/it]                                                                                                                                 {'loss': 0.1743, 'grad_norm': 0.4018515348434448, 'learning_rate': 2.4662153719115398e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 373.54, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18700/20117 [11:54:39<52:23,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18701/20117 [11:54:41<52:20,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18702/20117 [11:54:44<51:58,  2.20s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18703/20117 [11:54:46<52:06,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18704/20117 [11:54:48<52:12,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18705/20117 [11:54:50<52:29,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18706/20117 [11:54:52<52:17,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18707/20117 [11:54:55<51:54,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18708/20117 [11:54:57<52:00,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18709/20117 [11:54:59<52:08,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18710/20117 [11:55:01<51:55,  2.21s/it]                                                                                                                                 {'loss': 0.1169, 'grad_norm': 0.5389109253883362, 'learning_rate': 2.431694776065263e-06, 'memory/max_active (GiB)': 20.53, 'memory/max_allocated (GiB)': 20.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.4, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18710/20117 [11:55:01<51:55,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18711/20117 [11:55:04<52:30,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18712/20117 [11:55:06<52:38,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18713/20117 [11:55:08<52:18,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18714/20117 [11:55:10<52:03,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18715/20117 [11:55:12<51:48,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18716/20117 [11:55:15<51:34,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18717/20117 [11:55:17<51:20,  2.20s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18718/20117 [11:55:19<51:40,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18719/20117 [11:55:21<52:04,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18720/20117 [11:55:24<51:42,  2.22s/it]                                                                                                                                 {'loss': 0.1841, 'grad_norm': 0.526465654373169, 'learning_rate': 2.397414511576268e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 348.59, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18720/20117 [11:55:24<51:42,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18721/20117 [11:55:26<51:30,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18722/20117 [11:55:28<51:18,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▏     | 18723/20117 [11:55:30<51:21,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18724/20117 [11:55:32<51:11,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18725/20117 [11:55:35<51:17,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18726/20117 [11:55:37<51:38,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18727/20117 [11:55:39<51:24,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18728/20117 [11:55:41<51:21,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18729/20117 [11:55:43<51:33,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18730/20117 [11:55:46<53:18,  2.31s/it]                                                                                                                                 {'loss': 0.1741, 'grad_norm': 0.3093705177307129, 'learning_rate': 2.3633746628841325e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 313.63, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18730/20117 [11:55:46<53:18,  2.31s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18731/20117 [11:55:48<52:37,  2.28s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18732/20117 [11:55:50<52:08,  2.26s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18733/20117 [11:55:53<52:12,  2.26s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18734/20117 [11:55:55<51:37,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18735/20117 [11:55:57<51:52,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18736/20117 [11:55:59<51:44,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18737/20117 [11:56:02<51:44,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18738/20117 [11:56:04<51:39,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18739/20117 [11:56:06<51:28,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18740/20117 [11:56:08<51:10,  2.23s/it]                                                                                                                                 {'loss': 0.2199, 'grad_norm': 0.9190280437469482, 'learning_rate': 2.329575313836152e-06, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 400.19, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18740/20117 [11:56:08<51:10,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18741/20117 [11:56:11<50:50,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18742/20117 [11:56:13<50:55,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18743/20117 [11:56:15<51:15,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18744/20117 [11:56:17<51:09,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18745/20117 [11:56:19<50:46,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18746/20117 [11:56:22<50:44,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18747/20117 [11:56:24<50:48,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18748/20117 [11:56:26<50:37,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18749/20117 [11:56:28<51:08,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18750/20117 [11:56:31<50:43,  2.23s/it]                                                                                                                                 {'loss': 0.133, 'grad_norm': 0.5890634655952454, 'learning_rate': 2.2960165476873076e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.18, 'epoch': 1.86}
 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18750/20117 [11:56:31<50:43,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18751/20117 [11:56:33<50:40,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18752/20117 [11:56:35<50:24,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▎     | 18753/20117 [11:56:37<50:33,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18754/20117 [11:56:39<50:22,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18755/20117 [11:56:42<50:11,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18756/20117 [11:56:44<50:09,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18757/20117 [11:56:46<50:14,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18758/20117 [11:56:48<50:12,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18759/20117 [11:56:51<50:44,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18760/20117 [11:56:53<50:26,  2.23s/it]                                                                                                                                 {'loss': 0.1459, 'grad_norm': 0.7027098536491394, 'learning_rate': 2.262698447099898e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 288.79, 'epoch': 1.87}
 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18760/20117 [11:56:53<50:26,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18761/20117 [11:56:55<50:27,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18762/20117 [11:56:57<50:16,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18763/20117 [11:57:00<50:32,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18764/20117 [11:57:02<50:37,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18765/20117 [11:57:04<50:19,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18766/20117 [11:57:06<50:11,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18767/20117 [11:57:08<50:30,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18768/20117 [11:57:11<50:22,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18769/20117 [11:57:13<50:34,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18770/20117 [11:57:15<50:24,  2.25s/it]                                                                                                                                 {'loss': 0.1699, 'grad_norm': 0.7816129922866821, 'learning_rate': 2.2296210941434745e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.86, 'epoch': 1.87}
 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18770/20117 [11:57:15<50:24,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18771/20117 [11:57:17<50:14,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18772/20117 [11:57:20<50:23,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18773/20117 [11:57:22<50:19,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18774/20117 [11:57:24<50:27,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18775/20117 [11:57:26<50:23,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18776/20117 [11:57:29<50:01,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18777/20117 [11:57:31<49:42,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18778/20117 [11:57:33<49:32,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18779/20117 [11:57:35<49:40,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18780/20117 [11:57:38<49:43,  2.23s/it]                                                                                                                                 {'loss': 0.1324, 'grad_norm': 0.4923355281352997, 'learning_rate': 2.19678457029453e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 316.49, 'epoch': 1.87}
 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18780/20117 [11:57:38<49:43,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18781/20117 [11:57:40<49:52,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18782/20117 [11:57:42<49:46,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▍     | 18783/20117 [11:57:44<49:40,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18784/20117 [11:57:47<49:48,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18785/20117 [11:57:49<51:35,  2.32s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18786/20117 [11:57:51<50:49,  2.29s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18787/20117 [11:57:54<50:48,  2.29s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18788/20117 [11:57:56<50:14,  2.27s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18789/20117 [11:57:58<49:46,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18790/20117 [11:58:00<49:52,  2.26s/it]                                                                                                                                 {'loss': 0.126, 'grad_norm': 0.27154669165611267, 'learning_rate': 2.164188956436386e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.75, 'epoch': 1.87}
 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18790/20117 [11:58:00<49:52,  2.26s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18791/20117 [11:58:03<49:57,  2.26s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18792/20117 [11:58:05<49:35,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18793/20117 [11:58:07<49:09,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18794/20117 [11:58:09<48:56,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18795/20117 [11:58:11<49:06,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18796/20117 [11:58:14<49:34,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18797/20117 [11:58:16<49:11,  2.24s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18798/20117 [11:58:18<48:57,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18799/20117 [11:58:20<48:39,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18800/20117 [11:58:23<48:40,  2.22s/it]                                                                                                                                 {'loss': 0.161, 'grad_norm': 0.5623666644096375, 'learning_rate': 2.1318343328588953e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.31, 'epoch': 1.87}
 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18800/20117 [11:58:23<48:40,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18801/20117 [11:58:25<49:19,  2.25s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18802/20117 [11:58:27<48:56,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18803/20117 [11:58:29<48:38,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18804/20117 [11:58:31<48:30,  2.22s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18805/20117 [11:58:34<48:15,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18806/20117 [11:58:36<48:19,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18807/20117 [11:58:38<48:11,  2.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18808/20117 [11:58:40<48:35,  2.23s/it] 93%|█████████████████████████████████████████████████████████████████████████████▌     | 18809/20117 [11:58:42<48:21,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▌     | 18810/20117 [11:58:45<48:50,  2.24s/it]                                                                                                                                 {'loss': 0.1738, 'grad_norm': 0.5379459261894226, 'learning_rate': 2.099720779258352e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 410.92, 'epoch': 1.87}
 94%|█████████████████████████████████████████████████████████████████████████████▌     | 18810/20117 [11:58:45<48:50,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▌     | 18811/20117 [11:58:47<48:27,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▌     | 18812/20117 [11:58:49<48:44,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▌     | 18813/20117 [11:58:51<48:21,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▌     | 18814/20117 [11:58:54<48:26,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18815/20117 [11:58:56<48:28,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18816/20117 [11:58:58<48:13,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18817/20117 [11:59:00<48:02,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18818/20117 [11:59:03<47:49,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18819/20117 [11:59:05<47:54,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18820/20117 [11:59:07<47:57,  2.22s/it]                                                                                                                                 {'loss': 0.1164, 'grad_norm': 0.37512362003326416, 'learning_rate': 2.0678483747372247e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.41, 'epoch': 1.87}
 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18820/20117 [11:59:07<47:57,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18821/20117 [11:59:09<48:02,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18822/20117 [11:59:11<47:47,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18823/20117 [11:59:14<47:36,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18824/20117 [11:59:16<47:45,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18825/20117 [11:59:18<47:55,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18826/20117 [11:59:20<47:47,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18827/20117 [11:59:23<47:55,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18828/20117 [11:59:25<47:45,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18829/20117 [11:59:27<47:57,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18830/20117 [11:59:29<47:48,  2.23s/it]                                                                                                                                 {'loss': 0.1444, 'grad_norm': 0.6244227886199951, 'learning_rate': 2.03621719780398e-06, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.33, 'epoch': 1.87}
 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18830/20117 [11:59:29<47:48,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18831/20117 [11:59:31<48:02,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18832/20117 [11:59:34<47:55,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18833/20117 [11:59:36<47:30,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18834/20117 [11:59:38<47:23,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18835/20117 [11:59:40<47:14,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18836/20117 [11:59:43<49:23,  2.31s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18837/20117 [11:59:45<48:34,  2.28s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18838/20117 [11:59:47<48:18,  2.27s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18839/20117 [11:59:49<47:48,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18840/20117 [11:59:52<47:33,  2.23s/it]                                                                                                                                 {'loss': 0.1737, 'grad_norm': 0.3308027386665344, 'learning_rate': 2.0048273263729046e-06, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.54, 'epoch': 1.87}
 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18840/20117 [11:59:52<47:33,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18841/20117 [11:59:54<47:26,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18842/20117 [11:59:56<47:14,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18843/20117 [11:59:58<47:24,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▋     | 18844/20117 [12:00:01<47:17,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18845/20117 [12:00:03<47:11,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18846/20117 [12:00:05<47:28,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18847/20117 [12:00:07<47:07,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18848/20117 [12:00:09<46:56,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18849/20117 [12:00:12<47:00,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18850/20117 [12:00:14<47:27,  2.25s/it]                                                                                                                                 {'loss': 0.1432, 'grad_norm': 0.2430552840232849, 'learning_rate': 1.9736788377638705e-06, 'memory/max_active (GiB)': 19.21, 'memory/max_allocated (GiB)': 19.21, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 310.33, 'epoch': 1.87}
 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18850/20117 [12:00:14<47:27,  2.25s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18851/20117 [12:00:16<47:00,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18852/20117 [12:00:18<47:01,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18853/20117 [12:00:21<46:44,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18854/20117 [12:00:23<46:26,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18855/20117 [12:00:25<46:16,  2.20s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18856/20117 [12:00:27<46:22,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18857/20117 [12:00:29<46:16,  2.20s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18858/20117 [12:00:32<46:08,  2.20s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18859/20117 [12:00:34<46:09,  2.20s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18860/20117 [12:00:36<46:02,  2.20s/it]                                                                                                                                 {'loss': 0.1859, 'grad_norm': 0.4946417212486267, 'learning_rate': 1.942771808702204e-06, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.75, 'epoch': 1.87}
 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18860/20117 [12:00:36<46:02,  2.20s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18861/20117 [12:00:38<46:24,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18862/20117 [12:00:40<46:14,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18863/20117 [12:00:43<46:19,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18864/20117 [12:00:45<46:05,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18865/20117 [12:00:47<45:51,  2.20s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18866/20117 [12:00:49<45:58,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18867/20117 [12:00:52<46:07,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18868/20117 [12:00:54<46:02,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18869/20117 [12:00:56<46:03,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18870/20117 [12:00:58<45:54,  2.21s/it]                                                                                                                                 {'loss': 0.1639, 'grad_norm': 0.5932151079177856, 'learning_rate': 1.9121063153184293e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 378.07, 'epoch': 1.88}
 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18870/20117 [12:00:58<45:54,  2.21s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18871/20117 [12:01:00<46:04,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18872/20117 [12:01:03<46:02,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18873/20117 [12:01:05<46:32,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▊     | 18874/20117 [12:01:07<46:07,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18875/20117 [12:01:09<46:20,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18876/20117 [12:01:12<46:23,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18877/20117 [12:01:14<46:02,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18878/20117 [12:01:16<46:13,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18879/20117 [12:01:18<46:01,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18880/20117 [12:01:20<45:51,  2.22s/it]                                                                                                                                 {'loss': 0.1325, 'grad_norm': 0.6450536847114563, 'learning_rate': 1.8816824331481575e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 353.93, 'epoch': 1.88}
 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18880/20117 [12:01:20<45:51,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18881/20117 [12:01:23<45:38,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18882/20117 [12:01:25<46:00,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18883/20117 [12:01:27<46:15,  2.25s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18884/20117 [12:01:29<45:59,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18885/20117 [12:01:32<45:38,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18886/20117 [12:01:34<45:37,  2.22s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18887/20117 [12:01:36<45:45,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18888/20117 [12:01:38<46:01,  2.25s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18889/20117 [12:01:41<47:50,  2.34s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18890/20117 [12:01:43<47:02,  2.30s/it]                                                                                                                                 {'loss': 0.1227, 'grad_norm': 0.498829185962677, 'learning_rate': 1.8515002371318312e-06, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.94, 'epoch': 1.88}
 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18890/20117 [12:01:43<47:02,  2.30s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18891/20117 [12:01:45<46:42,  2.29s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18892/20117 [12:01:48<46:19,  2.27s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18893/20117 [12:01:50<46:00,  2.26s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18894/20117 [12:01:52<45:29,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18895/20117 [12:01:54<45:19,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18896/20117 [12:01:57<45:33,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18897/20117 [12:01:59<45:38,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18898/20117 [12:02:01<45:27,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18899/20117 [12:02:03<45:27,  2.24s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18900/20117 [12:02:06<45:40,  2.25s/it]                                                                                                                                 {'loss': 0.202, 'grad_norm': 0.5611813068389893, 'learning_rate': 1.8215598016145807e-06, 'memory/max_active (GiB)': 19.11, 'memory/max_allocated (GiB)': 19.11, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 402.84, 'epoch': 1.88}
 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18900/20117 [12:02:06<45:40,  2.25s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18901/20117 [12:02:08<45:49,  2.26s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18902/20117 [12:02:10<45:30,  2.25s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18903/20117 [12:02:12<45:05,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18904/20117 [12:02:14<45:03,  2.23s/it] 94%|█████████████████████████████████████████████████████████████████████████████▉     | 18905/20117 [12:02:17<45:06,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18906/20117 [12:02:19<45:09,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18907/20117 [12:02:21<45:08,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18908/20117 [12:02:23<45:01,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18909/20117 [12:02:26<44:59,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18910/20117 [12:02:28<45:10,  2.25s/it]                                                                                                                                 {'loss': 0.1108, 'grad_norm': 0.42488303780555725, 'learning_rate': 1.7918612003460234e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 336.67, 'epoch': 1.88}
 94%|██████████████████████████████████████████████████████████████████████████████     | 18910/20117 [12:02:28<45:10,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18911/20117 [12:02:30<45:15,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18912/20117 [12:02:32<45:20,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18913/20117 [12:02:35<45:02,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18914/20117 [12:02:37<45:02,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18915/20117 [12:02:39<45:10,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18916/20117 [12:02:41<45:02,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18917/20117 [12:02:44<44:56,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18918/20117 [12:02:46<44:56,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18919/20117 [12:02:48<44:39,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18920/20117 [12:02:50<44:25,  2.23s/it]                                                                                                                                 {'loss': 0.1591, 'grad_norm': 0.7830858826637268, 'learning_rate': 1.7624045064800975e-06, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.72, 'epoch': 1.88}
 94%|██████████████████████████████████████████████████████████████████████████████     | 18920/20117 [12:02:50<44:25,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18921/20117 [12:02:53<44:53,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18922/20117 [12:02:55<44:49,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18923/20117 [12:02:57<44:28,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18924/20117 [12:02:59<44:26,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18925/20117 [12:03:02<44:27,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18926/20117 [12:03:04<44:38,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18927/20117 [12:03:06<44:32,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18928/20117 [12:03:08<44:38,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18929/20117 [12:03:11<44:46,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18930/20117 [12:03:13<44:31,  2.25s/it]                                                                                                                                 {'loss': 0.1812, 'grad_norm': 0.4499165415763855, 'learning_rate': 1.7331897925748518e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.35, 'epoch': 1.88}
 94%|██████████████████████████████████████████████████████████████████████████████     | 18930/20117 [12:03:13<44:31,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18931/20117 [12:03:15<44:18,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18932/20117 [12:03:17<44:07,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18933/20117 [12:03:19<43:51,  2.22s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18934/20117 [12:03:22<43:36,  2.21s/it] 94%|██████████████████████████████████████████████████████████████████████████████     | 18935/20117 [12:03:24<43:38,  2.22s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18936/20117 [12:03:26<43:46,  2.22s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18937/20117 [12:03:28<43:47,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18938/20117 [12:03:31<43:54,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18939/20117 [12:03:33<44:11,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18940/20117 [12:03:35<44:12,  2.25s/it]                                                                                                                                 {'loss': 0.0976, 'grad_norm': 0.6066138744354248, 'learning_rate': 1.7042171305923115e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 327.67, 'epoch': 1.88}
 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18940/20117 [12:03:35<44:12,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18941/20117 [12:03:37<43:53,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18942/20117 [12:03:40<43:41,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18943/20117 [12:03:42<43:34,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18944/20117 [12:03:44<45:56,  2.35s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18945/20117 [12:03:47<45:24,  2.32s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18946/20117 [12:03:49<44:50,  2.30s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18947/20117 [12:03:51<44:26,  2.28s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18948/20117 [12:03:53<44:06,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18949/20117 [12:03:56<43:54,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18950/20117 [12:03:58<43:38,  2.24s/it]                                                                                                                                 {'loss': 0.1591, 'grad_norm': 0.30252575874328613, 'learning_rate': 1.6754865918982677e-06, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 301.08, 'epoch': 1.88}
 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18950/20117 [12:03:58<43:38,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18951/20117 [12:04:00<43:41,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18952/20117 [12:04:02<43:48,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18953/20117 [12:04:05<43:38,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18954/20117 [12:04:07<43:17,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18955/20117 [12:04:09<43:26,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18956/20117 [12:04:11<43:22,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18957/20117 [12:04:14<43:09,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18958/20117 [12:04:16<42:54,  2.22s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18959/20117 [12:04:18<43:25,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18960/20117 [12:04:20<43:06,  2.24s/it]                                                                                                                                 {'loss': 0.146, 'grad_norm': 0.575855016708374, 'learning_rate': 1.6469982472621103e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 341.39, 'epoch': 1.88}
 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18960/20117 [12:04:20<43:06,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18961/20117 [12:04:22<42:51,  2.22s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18962/20117 [12:04:25<43:02,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18963/20117 [12:04:27<43:20,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18964/20117 [12:04:29<43:04,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▏    | 18965/20117 [12:04:32<43:20,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18966/20117 [12:04:34<43:03,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18967/20117 [12:04:36<42:49,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18968/20117 [12:04:38<43:10,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18969/20117 [12:04:40<42:55,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18970/20117 [12:04:43<42:50,  2.24s/it]                                                                                                                                 {'loss': 0.1789, 'grad_norm': 0.7902649641036987, 'learning_rate': 1.6187521668566518e-06, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 370.72, 'epoch': 1.89}
 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18970/20117 [12:04:43<42:50,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18971/20117 [12:04:45<42:49,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18972/20117 [12:04:47<42:34,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18973/20117 [12:04:49<42:17,  2.22s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18974/20117 [12:04:52<42:28,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18975/20117 [12:04:54<42:41,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18976/20117 [12:04:56<42:32,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18977/20117 [12:04:58<42:37,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18978/20117 [12:05:01<42:34,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18979/20117 [12:05:03<42:29,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18980/20117 [12:05:05<43:05,  2.27s/it]                                                                                                                                 {'loss': 0.1358, 'grad_norm': 0.6112450957298279, 'learning_rate': 1.5907484202579482e-06, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.31, 'epoch': 1.89}
 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18980/20117 [12:05:05<43:05,  2.27s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18981/20117 [12:05:07<43:08,  2.28s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18982/20117 [12:05:10<43:07,  2.28s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18983/20117 [12:05:12<42:41,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18984/20117 [12:05:14<42:45,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18985/20117 [12:05:16<42:36,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18986/20117 [12:05:19<42:37,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18987/20117 [12:05:21<42:22,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18988/20117 [12:05:23<42:14,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18989/20117 [12:05:25<42:00,  2.23s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18990/20117 [12:05:28<42:10,  2.24s/it]                                                                                                                                 {'loss': 0.1604, 'grad_norm': 0.3243188261985779, 'learning_rate': 1.562987076445177e-06, 'memory/max_active (GiB)': 21.54, 'memory/max_allocated (GiB)': 21.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 383.91, 'epoch': 1.89}
 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18990/20117 [12:05:28<42:10,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18991/20117 [12:05:30<42:07,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18992/20117 [12:05:32<42:19,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18993/20117 [12:05:34<42:06,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18994/20117 [12:05:37<42:01,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18995/20117 [12:05:39<42:08,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▎    | 18996/20117 [12:05:41<42:23,  2.27s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 18997/20117 [12:05:44<43:37,  2.34s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 18998/20117 [12:05:46<42:58,  2.30s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 18999/20117 [12:05:48<42:41,  2.29s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19000/20117 [12:05:51<42:30,  2.28s/it]                                                                                                                                 {'loss': 0.1408, 'grad_norm': 0.3223695158958435, 'learning_rate': 1.53546820380035e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.51, 'epoch': 1.89}
 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19000/20117 [12:05:51<42:30,  2.28s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19001/20117 [12:05:53<42:23,  2.28s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19002/20117 [12:05:55<42:10,  2.27s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19003/20117 [12:05:57<42:16,  2.28s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19004/20117 [12:06:00<41:59,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19005/20117 [12:06:02<42:02,  2.27s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19006/20117 [12:06:04<41:42,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19007/20117 [12:06:06<41:31,  2.24s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19008/20117 [12:06:09<41:30,  2.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19009/20117 [12:06:11<41:40,  2.26s/it] 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19010/20117 [12:06:13<41:35,  2.25s/it]                                                                                                                                 {'loss': 0.1379, 'grad_norm': 0.49110448360443115, 'learning_rate': 1.508191870108311e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 399.45, 'epoch': 1.89}
 94%|██████████████████████████████████████████████████████████████████████████████▍    | 19010/20117 [12:06:13<41:35,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19011/20117 [12:06:15<41:19,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19012/20117 [12:06:17<41:04,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19013/20117 [12:06:20<41:20,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19014/20117 [12:06:22<41:15,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19015/20117 [12:06:24<41:23,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19016/20117 [12:06:26<41:08,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19017/20117 [12:06:29<41:07,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19018/20117 [12:06:31<41:22,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19019/20117 [12:06:33<41:10,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19020/20117 [12:06:36<41:07,  2.25s/it]                                                                                                                                 {'loss': 0.1755, 'grad_norm': 0.3531555235385895, 'learning_rate': 1.4811581425563936e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 360.95, 'epoch': 1.89}
 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19020/20117 [12:06:36<41:07,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19021/20117 [12:06:38<41:13,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19022/20117 [12:06:40<41:03,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19023/20117 [12:06:42<41:27,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19024/20117 [12:06:45<41:13,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19025/20117 [12:06:47<40:58,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▍    | 19026/20117 [12:06:49<40:43,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19027/20117 [12:06:51<40:40,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19028/20117 [12:06:54<40:42,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19029/20117 [12:06:56<40:31,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19030/20117 [12:06:58<40:26,  2.23s/it]                                                                                                                                 {'loss': 0.1661, 'grad_norm': 0.3664921522140503, 'learning_rate': 1.4543670877344207e-06, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.9, 'epoch': 1.89}
 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19030/20117 [12:06:58<40:26,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19031/20117 [12:07:00<40:29,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19032/20117 [12:07:02<40:16,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19033/20117 [12:07:05<40:36,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19034/20117 [12:07:07<40:20,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19035/20117 [12:07:09<39:57,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19036/20117 [12:07:11<40:02,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19037/20117 [12:07:14<40:09,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19038/20117 [12:07:16<40:31,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19039/20117 [12:07:18<40:17,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19040/20117 [12:07:20<40:00,  2.23s/it]                                                                                                                                 {'loss': 0.174, 'grad_norm': 0.6378931403160095, 'learning_rate': 1.4278187716344039e-06, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 417.45, 'epoch': 1.89}
 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19040/20117 [12:07:20<40:00,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19041/20117 [12:07:22<39:44,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19042/20117 [12:07:25<39:51,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19043/20117 [12:07:27<39:55,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19044/20117 [12:07:29<39:51,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19045/20117 [12:07:31<39:53,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19046/20117 [12:07:34<39:32,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19047/20117 [12:07:36<39:46,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19048/20117 [12:07:38<40:07,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19049/20117 [12:07:41<41:21,  2.32s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19050/20117 [12:07:43<40:45,  2.29s/it]                                                                                                                                 {'loss': 0.1896, 'grad_norm': 0.1918243020772934, 'learning_rate': 1.4015132596504554e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 407.04, 'epoch': 1.89}
 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19050/20117 [12:07:43<40:45,  2.29s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19051/20117 [12:07:45<40:37,  2.29s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19052/20117 [12:07:47<40:19,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19053/20117 [12:07:50<40:07,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19054/20117 [12:07:52<39:48,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19055/20117 [12:07:54<39:41,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▌    | 19056/20117 [12:07:56<39:53,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19057/20117 [12:07:59<39:50,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19058/20117 [12:08:01<39:37,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19059/20117 [12:08:03<39:34,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19060/20117 [12:08:05<39:37,  2.25s/it]                                                                                                                                 {'loss': 0.1789, 'grad_norm': 0.5390235781669617, 'learning_rate': 1.3754506165786108e-06, 'memory/max_active (GiB)': 19.77, 'memory/max_allocated (GiB)': 19.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 441.99, 'epoch': 1.89}
 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19060/20117 [12:08:05<39:37,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19061/20117 [12:08:08<39:53,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19062/20117 [12:08:10<39:45,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19063/20117 [12:08:12<39:25,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19064/20117 [12:08:14<39:04,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19065/20117 [12:08:16<38:55,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19066/20117 [12:08:19<39:16,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19067/20117 [12:08:21<39:21,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19068/20117 [12:08:23<39:19,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19069/20117 [12:08:26<39:22,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19070/20117 [12:08:28<39:36,  2.27s/it]                                                                                                                                 {'loss': 0.145, 'grad_norm': 0.5331469178199768, 'learning_rate': 1.3496309066166724e-06, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 392.93, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19070/20117 [12:08:28<39:36,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19071/20117 [12:08:30<39:22,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19072/20117 [12:08:32<39:16,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19073/20117 [12:08:35<39:09,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19074/20117 [12:08:37<39:02,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19075/20117 [12:08:39<38:46,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19076/20117 [12:08:41<38:43,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19077/20117 [12:08:43<38:37,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19078/20117 [12:08:46<38:40,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19079/20117 [12:08:48<38:30,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19080/20117 [12:08:50<38:41,  2.24s/it]                                                                                                                                 {'loss': 0.159, 'grad_norm': 0.48252302408218384, 'learning_rate': 1.3240541933640439e-06, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 330.47, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19080/20117 [12:08:50<38:41,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19081/20117 [12:08:52<38:46,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19082/20117 [12:08:55<38:47,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19083/20117 [12:08:57<38:40,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19084/20117 [12:08:59<38:30,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19085/20117 [12:09:01<38:15,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▋    | 19086/20117 [12:09:04<38:26,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19087/20117 [12:09:06<38:35,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19088/20117 [12:09:08<38:34,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19089/20117 [12:09:10<38:33,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19090/20117 [12:09:13<38:30,  2.25s/it]                                                                                                                                 {'loss': 0.1252, 'grad_norm': 0.5958520174026489, 'learning_rate': 1.298720539821563e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.74, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19090/20117 [12:09:13<38:30,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19091/20117 [12:09:15<38:16,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19092/20117 [12:09:17<38:24,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19093/20117 [12:09:19<38:34,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19094/20117 [12:09:22<38:53,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19095/20117 [12:09:24<38:40,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19096/20117 [12:09:26<38:28,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19097/20117 [12:09:28<38:19,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19098/20117 [12:09:31<38:14,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19099/20117 [12:09:33<38:02,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19100/20117 [12:09:35<39:26,  2.33s/it]                                                                                                                                 {'loss': 0.1731, 'grad_norm': 0.675596296787262, 'learning_rate': 1.273630008391402e-06, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 356.72, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19100/20117 [12:09:35<39:26,  2.33s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19101/20117 [12:09:38<39:05,  2.31s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19102/20117 [12:09:40<38:58,  2.30s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19103/20117 [12:09:42<38:29,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19104/20117 [12:09:45<38:26,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19105/20117 [12:09:47<38:11,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19106/20117 [12:09:49<38:26,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19107/20117 [12:09:51<38:06,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19108/20117 [12:09:54<37:46,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19109/20117 [12:09:56<37:36,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19110/20117 [12:09:58<37:50,  2.25s/it]                                                                                                                                 {'loss': 0.149, 'grad_norm': 0.5923382043838501, 'learning_rate': 1.2487826608768127e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 323.65, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19110/20117 [12:09:58<37:50,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19111/20117 [12:10:00<37:40,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19112/20117 [12:10:03<37:43,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19113/20117 [12:10:05<37:48,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19114/20117 [12:10:07<37:41,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19115/20117 [12:10:09<37:35,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19116/20117 [12:10:12<37:53,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▊    | 19117/20117 [12:10:14<38:02,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19118/20117 [12:10:16<37:58,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19119/20117 [12:10:18<38:09,  2.29s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19120/20117 [12:10:21<37:52,  2.28s/it]                                                                                                                                 {'loss': 0.1449, 'grad_norm': 0.3214609920978546, 'learning_rate': 1.2241785584820808e-06, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 397.11, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19120/20117 [12:10:21<37:52,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19121/20117 [12:10:23<37:58,  2.29s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19122/20117 [12:10:25<37:52,  2.28s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19123/20117 [12:10:28<37:39,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19124/20117 [12:10:30<37:22,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19125/20117 [12:10:32<37:18,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19126/20117 [12:10:34<37:13,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19127/20117 [12:10:37<37:09,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19128/20117 [12:10:39<36:59,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19129/20117 [12:10:41<36:51,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19130/20117 [12:10:43<36:32,  2.22s/it]                                                                                                                                 {'loss': 0.0914, 'grad_norm': 0.47794198989868164, 'learning_rate': 1.199817761812294e-06, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 309.82, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19130/20117 [12:10:43<36:32,  2.22s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19131/20117 [12:10:45<36:37,  2.23s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19132/20117 [12:10:48<36:50,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19133/20117 [12:10:50<36:53,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19134/20117 [12:10:52<37:05,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19135/20117 [12:10:55<36:56,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19136/20117 [12:10:57<36:40,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19137/20117 [12:10:59<36:53,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19138/20117 [12:11:01<37:05,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19139/20117 [12:11:04<36:57,  2.27s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19140/20117 [12:11:06<36:40,  2.25s/it]                                                                                                                                 {'loss': 0.1356, 'grad_norm': 0.5440109372138977, 'learning_rate': 1.175700330873275e-06, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 385.7, 'epoch': 1.9}
 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19140/20117 [12:11:06<36:40,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19141/20117 [12:11:08<36:42,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19142/20117 [12:11:10<36:48,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19143/20117 [12:11:13<36:39,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19144/20117 [12:11:15<36:23,  2.24s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19145/20117 [12:11:17<36:32,  2.26s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19146/20117 [12:11:19<36:19,  2.25s/it] 95%|██████████████████████████████████████████████████████████████████████████████▉    | 19147/20117 [12:11:22<36:29,  2.26s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19148/20117 [12:11:24<36:39,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19149/20117 [12:11:26<36:52,  2.29s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19150/20117 [12:11:28<36:36,  2.27s/it]                                                                                                                                 {'loss': 0.1127, 'grad_norm': 0.5214424729347229, 'learning_rate': 1.1518263250713147e-06, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.47, 'epoch': 1.9}
 95%|███████████████████████████████████████████████████████████████████████████████    | 19150/20117 [12:11:28<36:36,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19151/20117 [12:11:31<36:18,  2.25s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19152/20117 [12:11:33<37:43,  2.35s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19153/20117 [12:11:36<37:26,  2.33s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19154/20117 [12:11:38<36:51,  2.30s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19155/20117 [12:11:40<36:36,  2.28s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19156/20117 [12:11:42<36:24,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19157/20117 [12:11:44<36:18,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19158/20117 [12:11:47<36:12,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19159/20117 [12:11:49<36:20,  2.28s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19160/20117 [12:11:51<36:03,  2.26s/it]                                                                                                                                 {'loss': 0.1387, 'grad_norm': 0.42965662479400635, 'learning_rate': 1.1281958032131611e-06, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.94, 'epoch': 1.9}
 95%|███████████████████████████████████████████████████████████████████████████████    | 19160/20117 [12:11:51<36:03,  2.26s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19161/20117 [12:11:54<36:10,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19162/20117 [12:11:56<36:12,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19163/20117 [12:11:58<36:02,  2.27s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19164/20117 [12:12:00<35:43,  2.25s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19165/20117 [12:12:03<35:36,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19166/20117 [12:12:05<35:21,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19167/20117 [12:12:07<35:23,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19168/20117 [12:12:09<35:24,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19169/20117 [12:12:11<35:25,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19170/20117 [12:12:14<35:20,  2.24s/it]                                                                                                                                 {'loss': 0.1781, 'grad_norm': 0.2793689966201782, 'learning_rate': 1.1048088235057762e-06, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 312.99, 'epoch': 1.91}
 95%|███████████████████████████████████████████████████████████████████████████████    | 19170/20117 [12:12:14<35:20,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19171/20117 [12:12:16<35:34,  2.26s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19172/20117 [12:12:18<35:20,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19173/20117 [12:12:20<35:13,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19174/20117 [12:12:23<34:55,  2.22s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19175/20117 [12:12:25<35:12,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19176/20117 [12:12:27<35:10,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████    | 19177/20117 [12:12:29<35:05,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19178/20117 [12:12:32<34:59,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19179/20117 [12:12:34<34:57,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19180/20117 [12:12:36<35:08,  2.25s/it]                                                                                                                                 {'loss': 0.1757, 'grad_norm': 0.8887554407119751, 'learning_rate': 1.0816654435562458e-06, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 381.31, 'epoch': 1.91}
 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19180/20117 [12:12:36<35:08,  2.25s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19181/20117 [12:12:38<35:05,  2.25s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19182/20117 [12:12:41<34:50,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19183/20117 [12:12:43<34:39,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19184/20117 [12:12:45<34:42,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19185/20117 [12:12:47<34:36,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19186/20117 [12:12:50<34:43,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19187/20117 [12:12:52<34:39,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19188/20117 [12:12:54<34:27,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19189/20117 [12:12:56<34:28,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19190/20117 [12:12:58<34:21,  2.22s/it]                                                                                                                                 {'loss': 0.1417, 'grad_norm': 0.4591097831726074, 'learning_rate': 1.0587657203715795e-06, 'memory/max_active (GiB)': 18.17, 'memory/max_allocated (GiB)': 18.17, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.93, 'epoch': 1.91}
 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19190/20117 [12:12:58<34:21,  2.22s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19191/20117 [12:13:01<34:05,  2.21s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19192/20117 [12:13:03<34:22,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19193/20117 [12:13:05<34:23,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19194/20117 [12:13:07<34:10,  2.22s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19195/20117 [12:13:10<34:07,  2.22s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19196/20117 [12:13:12<34:16,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19197/20117 [12:13:14<34:13,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19198/20117 [12:13:16<34:18,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19199/20117 [12:13:19<34:26,  2.25s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19200/20117 [12:13:21<34:11,  2.24s/it]                                                                                                                                 {'loss': 0.1935, 'grad_norm': 0.4055291414260864, 'learning_rate': 1.036109710358657e-06, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.7, 'epoch': 1.91}
 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19200/20117 [12:13:21<34:11,  2.24s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19201/20117 [12:13:23<34:06,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19202/20117 [12:13:25<33:57,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19203/20117 [12:13:27<33:49,  2.22s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19204/20117 [12:13:30<33:50,  2.22s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19205/20117 [12:13:32<33:53,  2.23s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19206/20117 [12:13:34<35:07,  2.31s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19207/20117 [12:13:37<34:44,  2.29s/it] 95%|███████████████████████████████████████████████████████████████████████████████▏   | 19208/20117 [12:13:39<34:31,  2.28s/it] 95%|███████████████████████████████████████████████████████████████████████████████▎   | 19209/20117 [12:13:41<34:28,  2.28s/it] 95%|███████████████████████████████████████████████████████████████████████████████▎   | 19210/20117 [12:13:43<34:09,  2.26s/it]                                                                                                                                 {'loss': 0.1551, 'grad_norm': 0.6825937628746033, 'learning_rate': 1.0136974693240153e-06, 'memory/max_active (GiB)': 17.41, 'memory/max_allocated (GiB)': 17.41, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 291.2, 'epoch': 1.91}
 95%|███████████████████████████████████████████████████████████████████████████████▎   | 19210/20117 [12:13:43<34:09,  2.26s/it] 95%|███████████████████████████████████████████████████████████████████████████████▎   | 19211/20117 [12:13:46<33:59,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19212/20117 [12:13:48<33:54,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19213/20117 [12:13:50<33:43,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19214/20117 [12:13:52<33:34,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19215/20117 [12:13:55<33:41,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19216/20117 [12:13:57<33:30,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19217/20117 [12:13:59<33:44,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19218/20117 [12:14:01<33:28,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19219/20117 [12:14:03<33:29,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19220/20117 [12:14:06<33:25,  2.24s/it]                                                                                                                                 {'loss': 0.1015, 'grad_norm': 0.43678414821624756, 'learning_rate': 9.915290524737274e-07, 'memory/max_active (GiB)': 19.69, 'memory/max_allocated (GiB)': 19.69, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 363.79, 'epoch': 1.91}
 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19220/20117 [12:14:06<33:25,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19221/20117 [12:14:08<33:20,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19222/20117 [12:14:10<33:36,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19223/20117 [12:14:12<33:32,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19224/20117 [12:14:15<33:23,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19225/20117 [12:14:17<33:08,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19226/20117 [12:14:19<33:05,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19227/20117 [12:14:21<32:54,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19228/20117 [12:14:24<32:44,  2.21s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19229/20117 [12:14:26<32:51,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19230/20117 [12:14:28<33:01,  2.23s/it]                                                                                                                                 {'loss': 0.1326, 'grad_norm': 0.24454043805599213, 'learning_rate': 9.696045144133136e-07, 'memory/max_active (GiB)': 18.19, 'memory/max_allocated (GiB)': 18.19, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.11, 'epoch': 1.91}
 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19230/20117 [12:14:28<33:01,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19231/20117 [12:14:30<33:28,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19232/20117 [12:14:33<33:09,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19233/20117 [12:14:35<33:11,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19234/20117 [12:14:37<33:02,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19235/20117 [12:14:39<33:07,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19236/20117 [12:14:42<33:20,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19237/20117 [12:14:44<33:13,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▎   | 19238/20117 [12:14:46<33:00,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19239/20117 [12:14:48<32:43,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19240/20117 [12:14:51<32:44,  2.24s/it]                                                                                                                                 {'loss': 0.1666, 'grad_norm': 0.40022847056388855, 'learning_rate': 9.4792390914753e-07, 'memory/max_active (GiB)': 20.65, 'memory/max_allocated (GiB)': 20.65, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 377.05, 'epoch': 1.91}
 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19240/20117 [12:14:51<32:44,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19241/20117 [12:14:53<32:59,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19242/20117 [12:14:55<32:58,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19243/20117 [12:14:57<32:55,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19244/20117 [12:15:00<32:46,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19245/20117 [12:15:02<32:49,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19246/20117 [12:15:04<32:41,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19247/20117 [12:15:06<32:32,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19248/20117 [12:15:09<32:45,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19249/20117 [12:15:11<32:50,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19250/20117 [12:15:13<32:36,  2.26s/it]                                                                                                                                 {'loss': 0.2076, 'grad_norm': 0.4418630003929138, 'learning_rate': 9.264872900802912e-07, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 374.03, 'epoch': 1.91}
 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19250/20117 [12:15:13<32:36,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19251/20117 [12:15:15<32:20,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19252/20117 [12:15:18<32:23,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19253/20117 [12:15:20<32:31,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19254/20117 [12:15:22<32:41,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19255/20117 [12:15:25<32:39,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19256/20117 [12:15:27<32:35,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19257/20117 [12:15:29<33:40,  2.35s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19258/20117 [12:15:32<33:11,  2.32s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19259/20117 [12:15:34<32:46,  2.29s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19260/20117 [12:15:36<32:27,  2.27s/it]                                                                                                                                 {'loss': 0.1309, 'grad_norm': 0.1951112598180771, 'learning_rate': 9.052947100145149e-07, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 372.81, 'epoch': 1.91}
 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19260/20117 [12:15:36<32:27,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19261/20117 [12:15:38<32:25,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19262/20117 [12:15:41<32:15,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19263/20117 [12:15:43<32:17,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19264/20117 [12:15:45<31:58,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19265/20117 [12:15:47<31:46,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19266/20117 [12:15:49<31:53,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19267/20117 [12:15:52<32:02,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▍   | 19268/20117 [12:15:54<31:53,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19269/20117 [12:15:56<31:52,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19270/20117 [12:15:58<31:37,  2.24s/it]                                                                                                                                 {'loss': 0.159, 'grad_norm': 0.4262109100818634, 'learning_rate': 8.843462211520215e-07, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.44, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19270/20117 [12:15:58<31:37,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19271/20117 [12:16:01<31:55,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19272/20117 [12:16:03<31:42,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19273/20117 [12:16:05<31:56,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19274/20117 [12:16:08<31:57,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19275/20117 [12:16:10<31:53,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19276/20117 [12:16:12<31:53,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19277/20117 [12:16:14<31:43,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19278/20117 [12:16:17<31:40,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19279/20117 [12:16:19<31:19,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19280/20117 [12:16:21<31:11,  2.24s/it]                                                                                                                                 {'loss': 0.1805, 'grad_norm': 0.35612645745277405, 'learning_rate': 8.636418750933461e-07, 'memory/max_active (GiB)': 20.78, 'memory/max_allocated (GiB)': 20.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 423.92, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19280/20117 [12:16:21<31:11,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19281/20117 [12:16:23<31:32,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19282/20117 [12:16:26<31:15,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19283/20117 [12:16:28<31:02,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19284/20117 [12:16:30<31:03,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19285/20117 [12:16:32<30:52,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19286/20117 [12:16:34<30:42,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19287/20117 [12:16:37<30:52,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19288/20117 [12:16:39<30:39,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19289/20117 [12:16:41<30:41,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19290/20117 [12:16:43<30:51,  2.24s/it]                                                                                                                                 {'loss': 0.151, 'grad_norm': 0.4936760365962982, 'learning_rate': 8.431817228376937e-07, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.85, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19290/20117 [12:16:43<30:51,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19291/20117 [12:16:46<30:46,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19292/20117 [12:16:48<30:50,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19293/20117 [12:16:50<30:56,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19294/20117 [12:16:53<31:08,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19295/20117 [12:16:55<30:57,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19296/20117 [12:16:57<30:52,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19297/20117 [12:16:59<30:58,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▌   | 19298/20117 [12:17:01<30:40,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19299/20117 [12:17:04<30:47,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19300/20117 [12:17:06<30:39,  2.25s/it]                                                                                                                                 {'loss': 0.1807, 'grad_norm': 0.6932761669158936, 'learning_rate': 8.229658147827169e-07, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 369.43, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19300/20117 [12:17:06<30:39,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19301/20117 [12:17:08<30:42,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19302/20117 [12:17:11<30:34,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19303/20117 [12:17:13<30:24,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19304/20117 [12:17:15<30:23,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19305/20117 [12:17:17<30:10,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19306/20117 [12:17:19<29:59,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19307/20117 [12:17:22<30:05,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19308/20117 [12:17:24<30:01,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19309/20117 [12:17:26<30:00,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19310/20117 [12:17:28<29:50,  2.22s/it]                                                                                                                                 {'loss': 0.0993, 'grad_norm': 0.40733301639556885, 'learning_rate': 8.02994200724494e-07, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 290.15, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19310/20117 [12:17:28<29:50,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19311/20117 [12:17:31<31:09,  2.32s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19312/20117 [12:17:33<30:55,  2.31s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19313/20117 [12:17:35<30:33,  2.28s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19314/20117 [12:17:38<30:29,  2.28s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19315/20117 [12:17:40<30:24,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19316/20117 [12:17:42<30:19,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19317/20117 [12:17:44<30:08,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19318/20117 [12:17:47<29:58,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19319/20117 [12:17:49<29:51,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19320/20117 [12:17:51<29:51,  2.25s/it]                                                                                                                                 {'loss': 0.1942, 'grad_norm': 0.49551132321357727, 'learning_rate': 7.83266929857307e-07, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.74, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19320/20117 [12:17:51<29:51,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19321/20117 [12:17:53<29:53,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19322/20117 [12:17:56<29:52,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19323/20117 [12:17:58<29:38,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19324/20117 [12:18:00<29:33,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19325/20117 [12:18:02<29:27,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19326/20117 [12:18:05<29:33,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19327/20117 [12:18:07<29:21,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19328/20117 [12:18:09<29:19,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▋   | 19329/20117 [12:18:11<29:19,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19330/20117 [12:18:13<29:16,  2.23s/it]                                                                                                                                 {'loss': 0.2019, 'grad_norm': 0.5282984972000122, 'learning_rate': 7.637840507736194e-07, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.99, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19330/20117 [12:18:13<29:16,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19331/20117 [12:18:16<29:09,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19332/20117 [12:18:18<29:07,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19333/20117 [12:18:20<29:12,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19334/20117 [12:18:22<29:02,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19335/20117 [12:18:25<28:55,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19336/20117 [12:18:27<28:52,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19337/20117 [12:18:29<29:08,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19338/20117 [12:18:31<29:00,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19339/20117 [12:18:33<28:50,  2.22s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19340/20117 [12:18:36<28:52,  2.23s/it]                                                                                                                                 {'loss': 0.1387, 'grad_norm': 0.6770442724227905, 'learning_rate': 7.445456114638539e-07, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 297.34, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19340/20117 [12:18:36<28:52,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19341/20117 [12:18:38<29:00,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19342/20117 [12:18:40<29:00,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19343/20117 [12:18:42<28:51,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19344/20117 [12:18:45<28:50,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19345/20117 [12:18:47<28:55,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19346/20117 [12:18:49<28:45,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19347/20117 [12:18:51<28:57,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19348/20117 [12:18:54<28:53,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19349/20117 [12:18:56<28:58,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19350/20117 [12:18:58<28:53,  2.26s/it]                                                                                                                                 {'loss': 0.1626, 'grad_norm': 0.35707518458366394, 'learning_rate': 7.255516593163703e-07, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.77, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19350/20117 [12:18:58<28:53,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19351/20117 [12:19:00<28:41,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19352/20117 [12:19:03<28:31,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19353/20117 [12:19:05<28:25,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19354/20117 [12:19:07<28:20,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19355/20117 [12:19:09<28:17,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19356/20117 [12:19:12<28:30,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19357/20117 [12:19:14<28:25,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19358/20117 [12:19:16<28:30,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▊   | 19359/20117 [12:19:18<28:35,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19360/20117 [12:19:21<28:32,  2.26s/it]                                                                                                                                 {'loss': 0.19, 'grad_norm': 0.5072949528694153, 'learning_rate': 7.06802241117288e-07, 'memory/max_active (GiB)': 20.45, 'memory/max_allocated (GiB)': 20.45, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 382.78, 'epoch': 1.92}
 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19360/20117 [12:19:21<28:32,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19361/20117 [12:19:23<28:19,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19362/20117 [12:19:25<28:13,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19363/20117 [12:19:27<28:06,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19364/20117 [12:19:30<29:26,  2.35s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19365/20117 [12:19:32<29:11,  2.33s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19366/20117 [12:19:34<28:38,  2.29s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19367/20117 [12:19:37<28:22,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19368/20117 [12:19:39<28:03,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19369/20117 [12:19:41<27:59,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19370/20117 [12:19:43<28:08,  2.26s/it]                                                                                                                                 {'loss': 0.1228, 'grad_norm': 0.5360376834869385, 'learning_rate': 6.882974030503863e-07, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.51, 'epoch': 1.93}
 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19370/20117 [12:19:43<28:08,  2.26s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19371/20117 [12:19:46<28:01,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19372/20117 [12:19:48<28:10,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19373/20117 [12:19:50<28:06,  2.27s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19374/20117 [12:19:52<27:51,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19375/20117 [12:19:55<27:41,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19376/20117 [12:19:57<27:46,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19377/20117 [12:19:59<27:44,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19378/20117 [12:20:01<27:40,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19379/20117 [12:20:04<27:30,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19380/20117 [12:20:06<27:31,  2.24s/it]                                                                                                                                 {'loss': 0.1511, 'grad_norm': 0.5876972079277039, 'learning_rate': 6.700371906969815e-07, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 409.23, 'epoch': 1.93}
 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19380/20117 [12:20:06<27:31,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19381/20117 [12:20:08<27:26,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19382/20117 [12:20:10<27:36,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19383/20117 [12:20:13<27:32,  2.25s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19384/20117 [12:20:15<27:17,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19385/20117 [12:20:17<27:18,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19386/20117 [12:20:19<27:15,  2.24s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19387/20117 [12:20:22<27:06,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19388/20117 [12:20:24<27:05,  2.23s/it] 96%|███████████████████████████████████████████████████████████████████████████████▉   | 19389/20117 [12:20:26<26:57,  2.22s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19390/20117 [12:20:28<26:51,  2.22s/it]                                                                                                                                 {'loss': 0.1781, 'grad_norm': 0.5715491771697998, 'learning_rate': 6.520216490358388e-07, 'memory/max_active (GiB)': 18.16, 'memory/max_allocated (GiB)': 18.16, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 394.12, 'epoch': 1.93}
 96%|████████████████████████████████████████████████████████████████████████████████   | 19390/20117 [12:20:28<26:51,  2.22s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19391/20117 [12:20:30<26:51,  2.22s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19392/20117 [12:20:33<26:58,  2.23s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19393/20117 [12:20:35<27:07,  2.25s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19394/20117 [12:20:37<26:59,  2.24s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19395/20117 [12:20:39<26:48,  2.23s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19396/20117 [12:20:42<27:07,  2.26s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19397/20117 [12:20:44<27:02,  2.25s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19398/20117 [12:20:46<26:59,  2.25s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19399/20117 [12:20:48<26:52,  2.25s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19400/20117 [12:20:51<26:58,  2.26s/it]                                                                                                                                 {'loss': 0.1633, 'grad_norm': 0.47656625509262085, 'learning_rate': 6.342508224430499e-07, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 406.05, 'epoch': 1.93}
 96%|████████████████████████████████████████████████████████████████████████████████   | 19400/20117 [12:20:51<26:58,  2.26s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19401/20117 [12:20:53<26:55,  2.26s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19402/20117 [12:20:55<26:44,  2.24s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19403/20117 [12:20:57<26:45,  2.25s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19404/20117 [12:21:00<26:48,  2.26s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19405/20117 [12:21:02<26:40,  2.25s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19406/20117 [12:21:04<26:32,  2.24s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19407/20117 [12:21:06<26:22,  2.23s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19408/20117 [12:21:09<26:15,  2.22s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19409/20117 [12:21:11<26:20,  2.23s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19410/20117 [12:21:13<26:21,  2.24s/it]                                                                                                                                 {'loss': 0.1678, 'grad_norm': 0.531021237373352, 'learning_rate': 6.167247546919219e-07, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 428.32, 'epoch': 1.93}
 96%|████████████████████████████████████████████████████████████████████████████████   | 19410/20117 [12:21:13<26:21,  2.24s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19411/20117 [12:21:15<26:12,  2.23s/it] 96%|████████████████████████████████████████████████████████████████████████████████   | 19412/20117 [12:21:18<26:16,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19413/20117 [12:21:20<26:21,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19414/20117 [12:21:22<26:11,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19415/20117 [12:21:24<26:11,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19416/20117 [12:21:26<26:05,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19417/20117 [12:21:29<26:02,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19418/20117 [12:21:31<26:18,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19419/20117 [12:21:34<27:18,  2.35s/it] 97%|████████████████████████████████████████████████████████████████████████████████   | 19420/20117 [12:21:36<26:47,  2.31s/it]                                                                                                                                 {'loss': 0.1532, 'grad_norm': 0.5396085381507874, 'learning_rate': 5.994434889528556e-07, 'memory/max_active (GiB)': 21.4, 'memory/max_allocated (GiB)': 21.4, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 393.0, 'epoch': 1.93}
 97%|████████████████████████████████████████████████████████████████████████████████   | 19420/20117 [12:21:36<26:47,  2.31s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19421/20117 [12:21:38<26:27,  2.28s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19422/20117 [12:21:40<26:18,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19423/20117 [12:21:43<26:16,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19424/20117 [12:21:45<26:07,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19425/20117 [12:21:47<26:07,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19426/20117 [12:21:49<25:55,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19427/20117 [12:21:51<25:43,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19428/20117 [12:21:54<25:51,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19429/20117 [12:21:56<25:46,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19430/20117 [12:21:58<25:51,  2.26s/it]                                                                                                                                 {'loss': 0.1298, 'grad_norm': 0.33545199036598206, 'learning_rate': 5.824070677932558e-07, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.14, 'epoch': 1.93}
 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19430/20117 [12:21:58<25:51,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19431/20117 [12:22:00<25:39,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19432/20117 [12:22:03<25:40,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19433/20117 [12:22:05<25:38,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19434/20117 [12:22:07<25:37,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19435/20117 [12:22:09<25:28,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19436/20117 [12:22:12<25:21,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19437/20117 [12:22:14<25:28,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19438/20117 [12:22:16<25:25,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19439/20117 [12:22:18<25:23,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19440/20117 [12:22:21<25:29,  2.26s/it]                                                                                                                                 {'loss': 0.155, 'grad_norm': 0.3300628066062927, 'learning_rate': 5.656155331774437e-07, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 350.34, 'epoch': 1.93}
 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19440/20117 [12:22:21<25:29,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19441/20117 [12:22:23<25:17,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19442/20117 [12:22:25<25:19,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19443/20117 [12:22:27<25:05,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19444/20117 [12:22:30<25:04,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19445/20117 [12:22:32<25:01,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19446/20117 [12:22:34<24:50,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19447/20117 [12:22:36<24:55,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19448/20117 [12:22:39<25:07,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19449/20117 [12:22:41<25:03,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19450/20117 [12:22:43<24:54,  2.24s/it]                                                                                                                                 {'loss': 0.1298, 'grad_norm': 0.46381351351737976, 'learning_rate': 5.490689264665117e-07, 'memory/max_active (GiB)': 21.48, 'memory/max_allocated (GiB)': 21.48, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 357.6, 'epoch': 1.93}
 97%|████████████████████████████████████████████████████████████████████████████████▏  | 19450/20117 [12:22:43<24:54,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19451/20117 [12:22:45<24:53,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19452/20117 [12:22:48<24:50,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19453/20117 [12:22:50<24:57,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19454/20117 [12:22:52<24:49,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19455/20117 [12:22:54<24:46,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19456/20117 [12:22:57<24:41,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19457/20117 [12:22:59<24:37,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19458/20117 [12:23:01<24:39,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19459/20117 [12:23:03<24:33,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19460/20117 [12:23:05<24:24,  2.23s/it]                                                                                                                                 {'loss': 0.1173, 'grad_norm': 0.5388414263725281, 'learning_rate': 5.327672884182455e-07, 'memory/max_active (GiB)': 19.79, 'memory/max_allocated (GiB)': 19.79, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.37, 'epoch': 1.93}
 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19460/20117 [12:23:05<24:24,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19461/20117 [12:23:08<24:37,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19462/20117 [12:23:10<24:31,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19463/20117 [12:23:12<24:37,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19464/20117 [12:23:15<24:30,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19465/20117 [12:23:17<24:17,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19466/20117 [12:23:19<24:16,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19467/20117 [12:23:21<24:17,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19468/20117 [12:23:23<24:15,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19469/20117 [12:23:26<24:08,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19470/20117 [12:23:28<25:01,  2.32s/it]                                                                                                                                 {'loss': 0.1224, 'grad_norm': 0.3915001153945923, 'learning_rate': 5.167106591870252e-07, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 298.76, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19470/20117 [12:23:28<25:01,  2.32s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19471/20117 [12:23:30<24:50,  2.31s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19472/20117 [12:23:33<24:26,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19473/20117 [12:23:35<24:10,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19474/20117 [12:23:37<24:17,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19475/20117 [12:23:39<24:01,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19476/20117 [12:23:42<23:54,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19477/20117 [12:23:44<23:49,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19478/20117 [12:23:46<24:04,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19479/20117 [12:23:48<23:59,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19480/20117 [12:23:51<23:55,  2.25s/it]                                                                                                                                 {'loss': 0.1175, 'grad_norm': 0.3780313730239868, 'learning_rate': 5.008990783237244e-07, 'memory/max_active (GiB)': 21.52, 'memory/max_allocated (GiB)': 21.52, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 300.87, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▎  | 19480/20117 [12:23:51<23:55,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19481/20117 [12:23:53<24:03,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19482/20117 [12:23:55<23:59,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19483/20117 [12:23:57<23:54,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19484/20117 [12:24:00<23:41,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19485/20117 [12:24:02<23:29,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19486/20117 [12:24:04<23:32,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19487/20117 [12:24:06<23:25,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19488/20117 [12:24:09<23:27,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19489/20117 [12:24:11<23:44,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19490/20117 [12:24:13<23:33,  2.25s/it]                                                                                                                                 {'loss': 0.144, 'grad_norm': 0.5083069801330566, 'learning_rate': 4.853325847755997e-07, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 391.62, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19490/20117 [12:24:13<23:33,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19491/20117 [12:24:15<23:20,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19492/20117 [12:24:18<23:22,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19493/20117 [12:24:20<23:35,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19494/20117 [12:24:22<23:30,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19495/20117 [12:24:25<23:38,  2.28s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19496/20117 [12:24:27<23:28,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19497/20117 [12:24:29<23:25,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19498/20117 [12:24:31<23:26,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19499/20117 [12:24:34<23:12,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19500/20117 [12:24:36<22:59,  2.24s/it]                                                                                                                                 {'loss': 0.1368, 'grad_norm': 0.8569310307502747, 'learning_rate': 4.700112168862347e-07, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 364.34, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19500/20117 [12:24:36<22:59,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19501/20117 [12:24:38<23:03,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19502/20117 [12:24:40<22:55,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19503/20117 [12:24:42<22:54,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19504/20117 [12:24:45<23:01,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19505/20117 [12:24:47<22:56,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19506/20117 [12:24:49<23:06,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19507/20117 [12:24:52<22:57,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19508/20117 [12:24:54<22:52,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19509/20117 [12:24:56<22:44,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19510/20117 [12:24:58<22:36,  2.23s/it]                                                                                                                                 {'loss': 0.1282, 'grad_norm': 0.6087594032287598, 'learning_rate': 4.549350123953855e-07, 'memory/max_active (GiB)': 20.77, 'memory/max_allocated (GiB)': 20.77, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.22, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19510/20117 [12:24:58<22:36,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▍  | 19511/20117 [12:25:00<22:30,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19512/20117 [12:25:03<22:33,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19513/20117 [12:25:05<22:42,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19514/20117 [12:25:07<22:39,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19515/20117 [12:25:09<22:26,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19516/20117 [12:25:12<22:21,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19517/20117 [12:25:14<22:21,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19518/20117 [12:25:16<22:26,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19519/20117 [12:25:18<22:30,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19520/20117 [12:25:21<22:17,  2.24s/it]                                                                                                                                 {'loss': 0.1441, 'grad_norm': 0.5610264539718628, 'learning_rate': 4.4010400843892407e-07, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.44, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19520/20117 [12:25:21<22:17,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19521/20117 [12:25:23<22:17,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19522/20117 [12:25:25<23:12,  2.34s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19523/20117 [12:25:28<22:58,  2.32s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19524/20117 [12:25:30<22:45,  2.30s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19525/20117 [12:25:32<22:32,  2.28s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19526/20117 [12:25:34<22:21,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19527/20117 [12:25:37<22:11,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19528/20117 [12:25:39<22:00,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19529/20117 [12:25:41<21:56,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19530/20117 [12:25:43<21:46,  2.23s/it]                                                                                                                                 {'loss': 0.1601, 'grad_norm': 0.3209472894668579, 'learning_rate': 4.255182415487613e-07, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.35, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19530/20117 [12:25:43<21:46,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19531/20117 [12:25:46<21:50,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19532/20117 [12:25:48<21:42,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19533/20117 [12:25:50<21:43,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19534/20117 [12:25:52<21:35,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19535/20117 [12:25:54<21:39,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19536/20117 [12:25:57<21:32,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19537/20117 [12:25:59<21:23,  2.21s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19538/20117 [12:26:01<21:21,  2.21s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19539/20117 [12:26:03<21:15,  2.21s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19540/20117 [12:26:06<21:27,  2.23s/it]                                                                                                                                 {'loss': 0.1846, 'grad_norm': 0.34549176692962646, 'learning_rate': 4.1117774765270235e-07, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 392.28, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19540/20117 [12:26:06<21:27,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▌  | 19541/20117 [12:26:08<21:28,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19542/20117 [12:26:10<21:22,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19543/20117 [12:26:12<21:19,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19544/20117 [12:26:14<21:11,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19545/20117 [12:26:17<21:15,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19546/20117 [12:26:19<21:11,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19547/20117 [12:26:21<21:06,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19548/20117 [12:26:23<21:10,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19549/20117 [12:26:26<21:06,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19550/20117 [12:26:28<21:02,  2.23s/it]                                                                                                                                 {'loss': 0.1392, 'grad_norm': 0.610714852809906, 'learning_rate': 3.970825620744467e-07, 'memory/max_active (GiB)': 18.83, 'memory/max_allocated (GiB)': 18.83, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 300.36, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19550/20117 [12:26:28<21:02,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19551/20117 [12:26:30<21:06,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19552/20117 [12:26:32<20:58,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19553/20117 [12:26:35<20:50,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19554/20117 [12:26:37<20:41,  2.21s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19555/20117 [12:26:39<20:43,  2.21s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19556/20117 [12:26:41<20:47,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19557/20117 [12:26:43<20:48,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19558/20117 [12:26:46<20:41,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19559/20117 [12:26:48<20:50,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19560/20117 [12:26:50<20:55,  2.25s/it]                                                                                                                                 {'loss': 0.2258, 'grad_norm': 0.42442601919174194, 'learning_rate': 3.8323271953338844e-07, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 434.15, 'epoch': 1.94}
 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19560/20117 [12:26:50<20:55,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19561/20117 [12:26:52<20:53,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19562/20117 [12:26:55<21:01,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19563/20117 [12:26:57<21:02,  2.28s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19564/20117 [12:26:59<20:48,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19565/20117 [12:27:02<20:45,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19566/20117 [12:27:04<20:51,  2.27s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19567/20117 [12:27:06<20:40,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19568/20117 [12:27:08<20:34,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19569/20117 [12:27:11<20:35,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19570/20117 [12:27:13<20:46,  2.28s/it]                                                                                                                                 {'loss': 0.1478, 'grad_norm': 0.30905285477638245, 'learning_rate': 3.696282541446272e-07, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 368.05, 'epoch': 1.95}
 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19570/20117 [12:27:13<20:46,  2.28s/it] 97%|████████████████████████████████████████████████████████████████████████████████▋  | 19571/20117 [12:27:15<20:33,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19572/20117 [12:27:17<20:29,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19573/20117 [12:27:20<20:23,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19574/20117 [12:27:22<20:18,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19575/20117 [12:27:24<20:13,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19576/20117 [12:27:26<20:16,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19577/20117 [12:27:29<21:09,  2.35s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19578/20117 [12:27:31<21:01,  2.34s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19579/20117 [12:27:33<20:47,  2.32s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19580/20117 [12:27:36<20:37,  2.30s/it]                                                                                                                                 {'loss': 0.1548, 'grad_norm': 0.29837194085121155, 'learning_rate': 3.56269199418835e-07, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 314.0, 'epoch': 1.95}
 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19580/20117 [12:27:36<20:37,  2.30s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19581/20117 [12:27:38<20:22,  2.28s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19582/20117 [12:27:40<20:10,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19583/20117 [12:27:42<20:03,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19584/20117 [12:27:45<20:12,  2.28s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19585/20117 [12:27:47<20:16,  2.29s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19586/20117 [12:27:49<20:02,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19587/20117 [12:27:51<19:53,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19588/20117 [12:27:54<19:39,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19589/20117 [12:27:56<19:42,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19590/20117 [12:27:58<19:42,  2.24s/it]                                                                                                                                 {'loss': 0.1615, 'grad_norm': 0.2777169346809387, 'learning_rate': 3.431555882621895e-07, 'memory/max_active (GiB)': 20.63, 'memory/max_allocated (GiB)': 20.63, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 391.63, 'epoch': 1.95}
 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19590/20117 [12:27:58<19:42,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19591/20117 [12:28:00<19:44,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19592/20117 [12:28:03<19:36,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19593/20117 [12:28:05<19:37,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19594/20117 [12:28:07<19:24,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19595/20117 [12:28:09<19:32,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19596/20117 [12:28:12<19:31,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19597/20117 [12:28:14<19:22,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19598/20117 [12:28:16<19:18,  2.23s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19599/20117 [12:28:18<19:12,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19600/20117 [12:28:20<19:06,  2.22s/it]                                                                                                                                 {'loss': 0.1598, 'grad_norm': 0.7126250267028809, 'learning_rate': 3.302874529762745e-07, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 376.12, 'epoch': 1.95}
 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19600/20117 [12:28:20<19:06,  2.22s/it] 97%|████████████████████████████████████████████████████████████████████████████████▊  | 19601/20117 [12:28:23<19:21,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19602/20117 [12:28:25<19:17,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19603/20117 [12:28:27<19:21,  2.26s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19604/20117 [12:28:30<19:13,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19605/20117 [12:28:32<19:05,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19606/20117 [12:28:34<19:10,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19607/20117 [12:28:36<19:07,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19608/20117 [12:28:39<19:03,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19609/20117 [12:28:41<19:02,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19610/20117 [12:28:43<18:59,  2.25s/it]                                                                                                                                 {'loss': 0.1509, 'grad_norm': 0.5951281785964966, 'learning_rate': 3.176648252580461e-07, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.14, 'epoch': 1.95}
 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19610/20117 [12:28:43<18:59,  2.25s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19611/20117 [12:28:45<18:55,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19612/20117 [12:28:47<18:49,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19613/20117 [12:28:50<18:48,  2.24s/it] 97%|████████████████████████████████████████████████████████████████████████████████▉  | 19614/20117 [12:28:52<18:42,  2.23s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19615/20117 [12:28:54<18:33,  2.22s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19616/20117 [12:28:56<18:30,  2.22s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19617/20117 [12:28:59<18:37,  2.23s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19618/20117 [12:29:01<18:37,  2.24s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19619/20117 [12:29:03<18:33,  2.24s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19620/20117 [12:29:05<18:39,  2.25s/it]                                                                                                                                 {'loss': 0.1458, 'grad_norm': 0.5065922737121582, 'learning_rate': 3.0528773619969977e-07, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.47, 'epoch': 1.95}
 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19620/20117 [12:29:05<18:39,  2.25s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19621/20117 [12:29:08<18:43,  2.26s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19622/20117 [12:29:10<18:39,  2.26s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19623/20117 [12:29:12<18:41,  2.27s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19624/20117 [12:29:15<18:41,  2.28s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19625/20117 [12:29:17<18:35,  2.27s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19626/20117 [12:29:19<18:34,  2.27s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19627/20117 [12:29:21<18:23,  2.25s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19628/20117 [12:29:23<18:11,  2.23s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19629/20117 [12:29:26<18:05,  2.22s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19630/20117 [12:29:28<17:58,  2.21s/it]                                                                                                                                 {'loss': 0.1204, 'grad_norm': 0.463041216135025, 'learning_rate': 2.9315621628860366e-07, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 303.97, 'epoch': 1.95}
 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19630/20117 [12:29:28<17:58,  2.21s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19631/20117 [12:29:30<18:42,  2.31s/it] 98%|████████████████████████████████████████████████████████████████████████████████▉  | 19632/20117 [12:29:33<18:22,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19633/20117 [12:29:35<18:15,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19634/20117 [12:29:37<18:03,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19635/20117 [12:29:39<17:54,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19636/20117 [12:29:41<17:55,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19637/20117 [12:29:44<17:46,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19638/20117 [12:29:46<17:47,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19639/20117 [12:29:48<17:44,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19640/20117 [12:29:50<17:41,  2.23s/it]                                                                                                                                 {'loss': 0.1774, 'grad_norm': 0.7519829869270325, 'learning_rate': 2.812702954072877e-07, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 321.53, 'epoch': 1.95}
 98%|█████████████████████████████████████████████████████████████████████████████████  | 19640/20117 [12:29:50<17:41,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19641/20117 [12:29:53<17:39,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19642/20117 [12:29:55<17:29,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19643/20117 [12:29:57<17:22,  2.20s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19644/20117 [12:29:59<17:28,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19645/20117 [12:30:01<17:31,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19646/20117 [12:30:04<17:37,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19647/20117 [12:30:06<17:29,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19648/20117 [12:30:08<17:18,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19649/20117 [12:30:10<17:17,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19650/20117 [12:30:13<17:20,  2.23s/it]                                                                                                                                 {'loss': 0.1509, 'grad_norm': 0.52646404504776, 'learning_rate': 2.6963000283325434e-07, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 311.88, 'epoch': 1.95}
 98%|█████████████████████████████████████████████████████████████████████████████████  | 19650/20117 [12:30:13<17:20,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19651/20117 [12:30:15<17:25,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19652/20117 [12:30:17<17:18,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19653/20117 [12:30:19<17:15,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19654/20117 [12:30:22<17:16,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19655/20117 [12:30:24<17:07,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19656/20117 [12:30:26<16:58,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19657/20117 [12:30:28<16:50,  2.20s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19658/20117 [12:30:30<16:51,  2.20s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19659/20117 [12:30:32<16:50,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19660/20117 [12:30:35<16:48,  2.21s/it]                                                                                                                                 {'loss': 0.1782, 'grad_norm': 0.4693813920021057, 'learning_rate': 2.5823536723902364e-07, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 393.51, 'epoch': 1.95}
 98%|█████████████████████████████████████████████████████████████████████████████████  | 19660/20117 [12:30:35<16:48,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19661/20117 [12:30:37<16:50,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████  | 19662/20117 [12:30:39<16:49,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19663/20117 [12:30:41<16:46,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19664/20117 [12:30:44<16:45,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19665/20117 [12:30:46<16:39,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19666/20117 [12:30:48<16:38,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19667/20117 [12:30:50<16:41,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19668/20117 [12:30:53<16:42,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19669/20117 [12:30:55<16:50,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19670/20117 [12:30:57<16:42,  2.24s/it]                                                                                                                                 {'loss': 0.16, 'grad_norm': 0.5138804912567139, 'learning_rate': 2.470864166919884e-07, 'memory/max_active (GiB)': 20.54, 'memory/max_allocated (GiB)': 20.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 343.73, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19670/20117 [12:30:57<16:42,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19671/20117 [12:30:59<16:39,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19672/20117 [12:31:01<16:34,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19673/20117 [12:31:04<16:34,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19674/20117 [12:31:06<16:27,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19675/20117 [12:31:08<16:26,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19676/20117 [12:31:10<16:30,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19677/20117 [12:31:13<16:29,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19678/20117 [12:31:15<16:36,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19679/20117 [12:31:17<16:28,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19680/20117 [12:31:20<16:26,  2.26s/it]                                                                                                                                 {'loss': 0.0975, 'grad_norm': 0.5430607795715332, 'learning_rate': 2.361831786543589e-07, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 321.72, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19680/20117 [12:31:20<16:26,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19681/20117 [12:31:22<16:24,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19682/20117 [12:31:24<16:18,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19683/20117 [12:31:27<16:57,  2.34s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19684/20117 [12:31:29<16:41,  2.31s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19685/20117 [12:31:31<16:27,  2.29s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19686/20117 [12:31:33<16:18,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19687/20117 [12:31:35<16:08,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19688/20117 [12:31:38<16:05,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19689/20117 [12:31:40<16:07,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19690/20117 [12:31:42<16:00,  2.25s/it]                                                                                                                                 {'loss': 0.1112, 'grad_norm': 0.39533936977386475, 'learning_rate': 2.2552567998312957e-07, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 342.61, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19690/20117 [12:31:42<16:00,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19691/20117 [12:31:44<15:48,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▏ | 19692/20117 [12:31:47<15:49,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19693/20117 [12:31:49<15:48,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19694/20117 [12:31:51<15:43,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19695/20117 [12:31:53<15:47,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19696/20117 [12:31:56<15:54,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19697/20117 [12:31:58<15:52,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19698/20117 [12:32:00<15:44,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19699/20117 [12:32:03<15:56,  2.29s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19700/20117 [12:32:05<15:54,  2.29s/it]                                                                                                                                 {'loss': 0.1444, 'grad_norm': 0.4892284870147705, 'learning_rate': 2.151139469299679e-07, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 337.57, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19700/20117 [12:32:05<15:54,  2.29s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19701/20117 [12:32:07<15:42,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19702/20117 [12:32:09<15:41,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19703/20117 [12:32:12<15:36,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19704/20117 [12:32:14<15:28,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19705/20117 [12:32:16<15:23,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19706/20117 [12:32:18<15:16,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19707/20117 [12:32:20<15:11,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19708/20117 [12:32:23<15:11,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19709/20117 [12:32:25<15:13,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19710/20117 [12:32:27<15:11,  2.24s/it]                                                                                                                                 {'loss': 0.1839, 'grad_norm': 0.6313555836677551, 'learning_rate': 2.0494800514117007e-07, 'memory/max_active (GiB)': 20.43, 'memory/max_allocated (GiB)': 20.43, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.84, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19710/20117 [12:32:27<15:11,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19711/20117 [12:32:29<15:10,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19712/20117 [12:32:32<15:11,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19713/20117 [12:32:34<15:06,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19714/20117 [12:32:36<14:58,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19715/20117 [12:32:38<14:53,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19716/20117 [12:32:41<14:47,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19717/20117 [12:32:43<14:52,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19718/20117 [12:32:45<14:45,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19719/20117 [12:32:47<14:38,  2.21s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19720/20117 [12:32:49<14:34,  2.20s/it]                                                                                                                                 {'loss': 0.1649, 'grad_norm': 0.3936193585395813, 'learning_rate': 1.950278796576055e-07, 'memory/max_active (GiB)': 21.42, 'memory/max_allocated (GiB)': 21.42, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 352.38, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19720/20117 [12:32:49<14:34,  2.20s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19721/20117 [12:32:52<14:30,  2.20s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19722/20117 [12:32:54<14:38,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▎ | 19723/20117 [12:32:56<14:36,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19724/20117 [12:32:58<14:37,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19725/20117 [12:33:01<14:37,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19726/20117 [12:33:03<14:33,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19727/20117 [12:33:05<14:28,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19728/20117 [12:33:07<14:22,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19729/20117 [12:33:09<14:20,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19730/20117 [12:33:12<14:24,  2.23s/it]                                                                                                                                 {'loss': 0.1267, 'grad_norm': 0.4717642068862915, 'learning_rate': 1.8535359491462789e-07, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 297.29, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19730/20117 [12:33:12<14:24,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19731/20117 [12:33:14<14:58,  2.33s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19732/20117 [12:33:17<15:47,  2.46s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19733/20117 [12:33:20<16:19,  2.55s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19734/20117 [12:33:22<16:36,  2.60s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19735/20117 [12:33:25<17:19,  2.72s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19736/20117 [12:33:28<17:08,  2.70s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19737/20117 [12:33:31<16:48,  2.65s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19738/20117 [12:33:33<16:07,  2.55s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19739/20117 [12:33:35<15:40,  2.49s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19740/20117 [12:33:38<15:08,  2.41s/it]                                                                                                                                 {'loss': 0.1551, 'grad_norm': 0.574255108833313, 'learning_rate': 1.7592517474205317e-07, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 367.19, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19740/20117 [12:33:38<15:08,  2.41s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19741/20117 [12:33:40<14:55,  2.38s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19742/20117 [12:33:42<15:16,  2.44s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19743/20117 [12:33:45<15:31,  2.49s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19744/20117 [12:33:48<15:36,  2.51s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19745/20117 [12:33:50<16:04,  2.59s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19746/20117 [12:33:53<16:17,  2.64s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19747/20117 [12:33:56<16:26,  2.67s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19748/20117 [12:33:59<16:30,  2.69s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19749/20117 [12:34:01<16:32,  2.70s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19750/20117 [12:34:04<16:27,  2.69s/it]                                                                                                                                 {'loss': 0.1482, 'grad_norm': 0.6956222057342529, 'learning_rate': 1.6674264236408165e-07, 'memory/max_active (GiB)': 20.57, 'memory/max_allocated (GiB)': 20.57, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 278.73, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19750/20117 [12:34:04<16:27,  2.69s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19751/20117 [12:34:07<16:17,  2.67s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19752/20117 [12:34:09<16:15,  2.67s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▍ | 19753/20117 [12:34:12<16:14,  2.68s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19754/20117 [12:34:15<16:13,  2.68s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19755/20117 [12:34:17<16:12,  2.69s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19756/20117 [12:34:20<16:14,  2.70s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19757/20117 [12:34:23<16:18,  2.72s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19758/20117 [12:34:26<16:16,  2.72s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19759/20117 [12:34:28<16:17,  2.73s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19760/20117 [12:34:31<16:19,  2.74s/it]                                                                                                                                 {'loss': 0.1813, 'grad_norm': 0.5839533805847168, 'learning_rate': 1.5780602039920932e-07, 'memory/max_active (GiB)': 21.54, 'memory/max_allocated (GiB)': 21.54, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.8, 'epoch': 1.96}
 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19760/20117 [12:34:31<16:19,  2.74s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19761/20117 [12:34:34<16:06,  2.71s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19762/20117 [12:34:37<16:04,  2.72s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19763/20117 [12:34:39<15:56,  2.70s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19764/20117 [12:34:42<15:52,  2.70s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19765/20117 [12:34:45<15:49,  2.70s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19766/20117 [12:34:47<15:45,  2.69s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19767/20117 [12:34:50<15:30,  2.66s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19768/20117 [12:34:53<15:34,  2.68s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19769/20117 [12:34:55<15:13,  2.62s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19770/20117 [12:34:58<15:04,  2.61s/it]                                                                                                                                 {'loss': 0.1851, 'grad_norm': 0.4014669954776764, 'learning_rate': 1.4911533086024997e-07, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 318.26, 'epoch': 1.97}
 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19770/20117 [12:34:58<15:04,  2.61s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19771/20117 [12:35:00<14:45,  2.56s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19772/20117 [12:35:02<14:23,  2.50s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19773/20117 [12:35:05<14:07,  2.46s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19774/20117 [12:35:07<13:52,  2.43s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19775/20117 [12:35:10<13:47,  2.42s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19776/20117 [12:35:12<13:23,  2.36s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19777/20117 [12:35:14<13:07,  2.32s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19778/20117 [12:35:16<12:52,  2.28s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19779/20117 [12:35:18<12:42,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19780/20117 [12:35:21<12:33,  2.24s/it]                                                                                                                                 {'loss': 0.1547, 'grad_norm': 0.33930712938308716, 'learning_rate': 1.406705951541909e-07, 'memory/max_active (GiB)': 21.41, 'memory/max_allocated (GiB)': 21.41, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 422.7, 'epoch': 1.97}
 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19780/20117 [12:35:21<12:33,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19781/20117 [12:35:23<12:37,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19782/20117 [12:35:25<12:34,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 19783/20117 [12:35:27<12:28,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19784/20117 [12:35:30<12:26,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19785/20117 [12:35:32<12:22,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19786/20117 [12:35:34<12:16,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19787/20117 [12:35:36<12:17,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19788/20117 [12:35:39<12:45,  2.33s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19789/20117 [12:35:41<12:30,  2.29s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19790/20117 [12:35:43<12:23,  2.27s/it]                                                                                                                                 {'loss': 0.1737, 'grad_norm': 0.7784512639045715, 'learning_rate': 1.32471834082204e-07, 'memory/max_active (GiB)': 21.38, 'memory/max_allocated (GiB)': 21.38, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 379.19, 'epoch': 1.97}
 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19790/20117 [12:35:43<12:23,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19791/20117 [12:35:46<12:20,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19792/20117 [12:35:48<12:12,  2.25s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19793/20117 [12:35:50<12:11,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19794/20117 [12:35:52<12:16,  2.28s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19795/20117 [12:35:55<12:10,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19796/20117 [12:35:57<12:12,  2.28s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19797/20117 [12:35:59<12:11,  2.29s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19798/20117 [12:36:01<12:03,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19799/20117 [12:36:04<12:05,  2.28s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19800/20117 [12:36:06<11:59,  2.27s/it]                                                                                                                                 {'loss': 0.1701, 'grad_norm': 0.5413442850112915, 'learning_rate': 1.2451906783957912e-07, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 413.24, 'epoch': 1.97}
 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19800/20117 [12:36:06<11:59,  2.27s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19801/20117 [12:36:08<11:53,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19802/20117 [12:36:10<11:51,  2.26s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19803/20117 [12:36:13<11:43,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19804/20117 [12:36:15<11:40,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19805/20117 [12:36:17<11:34,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19806/20117 [12:36:19<11:34,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19807/20117 [12:36:22<11:30,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19808/20117 [12:36:24<11:28,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19809/20117 [12:36:26<11:23,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19810/20117 [12:36:28<11:25,  2.23s/it]                                                                                                                                 {'loss': 0.1503, 'grad_norm': 0.49978771805763245, 'learning_rate': 1.1681231601564647e-07, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 325.42, 'epoch': 1.97}
 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19810/20117 [12:36:28<11:25,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19811/20117 [12:36:30<11:26,  2.24s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19812/20117 [12:36:33<11:18,  2.22s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19813/20117 [12:36:35<11:19,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▋ | 19814/20117 [12:36:37<11:15,  2.23s/it] 98%|█████████████████████████████████████████████████████████████████████████████████▊ | 19815/20117 [12:36:39<11:10,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19816/20117 [12:36:42<11:10,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19817/20117 [12:36:44<11:12,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19818/20117 [12:36:46<11:09,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19819/20117 [12:36:48<11:14,  2.26s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19820/20117 [12:36:51<11:14,  2.27s/it]                                                                                                                                 {'loss': 0.1421, 'grad_norm': 0.5488161444664001, 'learning_rate': 1.0935159759378755e-07, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.6, 'epoch': 1.97}
 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19820/20117 [12:36:51<11:14,  2.27s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19821/20117 [12:36:53<11:06,  2.25s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19822/20117 [12:36:55<11:10,  2.27s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19823/20117 [12:36:57<11:03,  2.26s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19824/20117 [12:37:00<10:56,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19825/20117 [12:37:02<10:48,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19826/20117 [12:37:04<10:43,  2.21s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19827/20117 [12:37:06<10:38,  2.20s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19828/20117 [12:37:08<10:38,  2.21s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19829/20117 [12:37:11<10:35,  2.21s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19830/20117 [12:37:13<10:34,  2.21s/it]                                                                                                                                 {'loss': 0.1309, 'grad_norm': 0.36043041944503784, 'learning_rate': 1.0213693095130206e-07, 'memory/max_active (GiB)': 19.22, 'memory/max_allocated (GiB)': 19.22, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 286.15, 'epoch': 1.97}
 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19830/20117 [12:37:13<10:34,  2.21s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19831/20117 [12:37:15<10:32,  2.21s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19832/20117 [12:37:17<10:34,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19833/20117 [12:37:20<10:33,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19834/20117 [12:37:22<10:31,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19835/20117 [12:37:24<10:28,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19836/20117 [12:37:26<10:22,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19837/20117 [12:37:28<10:21,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19838/20117 [12:37:31<10:25,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19839/20117 [12:37:33<10:20,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19840/20117 [12:37:35<10:25,  2.26s/it]                                                                                                                                 {'loss': 0.1344, 'grad_norm': 0.30269375443458557, 'learning_rate': 9.516833385945224e-08, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 411.14, 'epoch': 1.97}
 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19840/20117 [12:37:35<10:25,  2.26s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19841/20117 [12:37:37<10:19,  2.25s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19842/20117 [12:37:40<10:46,  2.35s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19843/20117 [12:37:42<10:31,  2.30s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▊ | 19844/20117 [12:37:44<10:21,  2.28s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19845/20117 [12:37:47<10:17,  2.27s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19846/20117 [12:37:49<10:13,  2.26s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19847/20117 [12:37:51<10:06,  2.25s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19848/20117 [12:37:53<10:01,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19849/20117 [12:37:56<09:56,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19850/20117 [12:37:58<09:55,  2.23s/it]                                                                                                                                 {'loss': 0.1754, 'grad_norm': 0.6402490735054016, 'learning_rate': 8.844582348336294e-08, 'memory/max_active (GiB)': 20.58, 'memory/max_allocated (GiB)': 20.58, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 333.73, 'epoch': 1.97}
 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19850/20117 [12:37:58<09:55,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19851/20117 [12:38:00<09:52,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19852/20117 [12:38:02<09:50,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19853/20117 [12:38:05<09:56,  2.26s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19854/20117 [12:38:07<09:49,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19855/20117 [12:38:09<09:50,  2.25s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19856/20117 [12:38:11<09:46,  2.25s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19857/20117 [12:38:14<09:42,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19858/20117 [12:38:16<09:38,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19859/20117 [12:38:18<09:35,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19860/20117 [12:38:20<09:35,  2.24s/it]                                                                                                                                 {'loss': 0.1245, 'grad_norm': 0.3337123394012451, 'learning_rate': 8.196941638199951e-08, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.58, 'epoch': 1.97}
 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19860/20117 [12:38:20<09:35,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19861/20117 [12:38:22<09:31,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19862/20117 [12:38:25<09:31,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19863/20117 [12:38:27<09:26,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19864/20117 [12:38:29<09:24,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19865/20117 [12:38:31<09:22,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19866/20117 [12:38:34<09:20,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19867/20117 [12:38:36<09:14,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19868/20117 [12:38:38<09:11,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19869/20117 [12:38:40<09:14,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19870/20117 [12:38:43<09:10,  2.23s/it]                                                                                                                                 {'loss': 0.1501, 'grad_norm': 0.4003879427909851, 'learning_rate': 7.573912850812326e-08, 'memory/max_active (GiB)': 21.39, 'memory/max_allocated (GiB)': 21.39, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 344.89, 'epoch': 1.98}
 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19870/20117 [12:38:43<09:10,  2.23s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19871/20117 [12:38:45<09:10,  2.24s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19872/20117 [12:38:47<09:05,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19873/20117 [12:38:49<09:00,  2.22s/it] 99%|█████████████████████████████████████████████████████████████████████████████████▉ | 19874/20117 [12:38:51<09:00,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19875/20117 [12:38:54<09:03,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19876/20117 [12:38:56<08:59,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19877/20117 [12:38:58<08:56,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19878/20117 [12:39:00<08:52,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19879/20117 [12:39:03<08:51,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19880/20117 [12:39:05<08:48,  2.23s/it]                                                                                                                                 {'loss': 0.1815, 'grad_norm': 0.21886324882507324, 'learning_rate': 6.975497520824715e-08, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 419.3, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████ | 19880/20117 [12:39:05<08:48,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19881/20117 [12:39:07<08:47,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19882/20117 [12:39:09<08:44,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19883/20117 [12:39:12<08:40,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19884/20117 [12:39:14<08:35,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19885/20117 [12:39:16<08:33,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19886/20117 [12:39:18<08:31,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19887/20117 [12:39:20<08:27,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19888/20117 [12:39:23<08:24,  2.20s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19889/20117 [12:39:25<08:22,  2.20s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19890/20117 [12:39:27<08:19,  2.20s/it]                                                                                                                                 {'loss': 0.1328, 'grad_norm': 0.7370038628578186, 'learning_rate': 6.401697122260241e-08, 'memory/max_active (GiB)': 18.18, 'memory/max_allocated (GiB)': 18.18, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 332.95, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████ | 19890/20117 [12:39:27<08:19,  2.20s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19891/20117 [12:39:29<08:16,  2.20s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19892/20117 [12:39:31<08:22,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19893/20117 [12:39:34<08:16,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19894/20117 [12:39:36<08:33,  2.30s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19895/20117 [12:39:38<08:23,  2.27s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19896/20117 [12:39:40<08:17,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19897/20117 [12:39:43<08:10,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19898/20117 [12:39:45<08:10,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19899/20117 [12:39:47<08:04,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19900/20117 [12:39:49<08:03,  2.23s/it]                                                                                                                                 {'loss': 0.1096, 'grad_norm': 0.46660006046295166, 'learning_rate': 5.852513068511645e-08, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 298.67, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████ | 19900/20117 [12:39:49<08:03,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19901/20117 [12:39:52<08:00,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19902/20117 [12:39:54<07:59,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19903/20117 [12:39:56<07:58,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████ | 19904/20117 [12:39:58<07:57,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19905/20117 [12:40:01<07:52,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19906/20117 [12:40:03<07:48,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19907/20117 [12:40:05<07:47,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19908/20117 [12:40:07<07:44,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19909/20117 [12:40:09<07:44,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19910/20117 [12:40:12<07:41,  2.23s/it]                                                                                                                                 {'loss': 0.1374, 'grad_norm': 0.5484210848808289, 'learning_rate': 5.3279467123346086e-08, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 340.44, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19910/20117 [12:40:12<07:41,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19911/20117 [12:40:14<07:44,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19912/20117 [12:40:16<07:40,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19913/20117 [12:40:18<07:37,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19914/20117 [12:40:21<07:33,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19915/20117 [12:40:23<07:28,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19916/20117 [12:40:25<07:27,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19917/20117 [12:40:27<07:24,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19918/20117 [12:40:30<07:32,  2.27s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19919/20117 [12:40:32<07:27,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19920/20117 [12:40:34<07:22,  2.25s/it]                                                                                                                                 {'loss': 0.1858, 'grad_norm': 0.4510185122489929, 'learning_rate': 4.827999345846657e-08, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 407.5, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19920/20117 [12:40:34<07:22,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19921/20117 [12:40:36<07:16,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19922/20117 [12:40:39<07:13,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19923/20117 [12:40:41<07:13,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19924/20117 [12:40:43<07:08,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19925/20117 [12:40:45<07:08,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19926/20117 [12:40:47<07:04,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19927/20117 [12:40:50<07:03,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19928/20117 [12:40:52<07:00,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19929/20117 [12:40:54<06:56,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19930/20117 [12:40:56<06:52,  2.21s/it]                                                                                                                                 {'loss': 0.1675, 'grad_norm': 0.4410419464111328, 'learning_rate': 4.352672200523822e-08, 'memory/max_active (GiB)': 19.8, 'memory/max_allocated (GiB)': 19.8, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 308.63, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19930/20117 [12:40:56<06:52,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19931/20117 [12:40:59<06:53,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19932/20117 [12:41:01<06:52,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19933/20117 [12:41:03<06:49,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19934/20117 [12:41:05<06:46,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▏| 19935/20117 [12:41:07<06:44,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19936/20117 [12:41:10<06:41,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19937/20117 [12:41:12<06:39,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19938/20117 [12:41:14<06:35,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19939/20117 [12:41:16<06:33,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19940/20117 [12:41:18<06:31,  2.21s/it]                                                                                                                                 {'loss': 0.1349, 'grad_norm': 0.5038071274757385, 'learning_rate': 3.901966447197314e-08, 'memory/max_active (GiB)': 19.78, 'memory/max_allocated (GiB)': 19.78, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.5, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19940/20117 [12:41:18<06:31,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19941/20117 [12:41:21<06:29,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19942/20117 [12:41:23<06:27,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19943/20117 [12:41:25<06:25,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19944/20117 [12:41:27<06:21,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19945/20117 [12:41:30<06:21,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19946/20117 [12:41:32<06:20,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19947/20117 [12:41:34<06:34,  2.32s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19948/20117 [12:41:37<06:30,  2.31s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19949/20117 [12:41:39<06:23,  2.28s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19950/20117 [12:41:41<06:17,  2.26s/it]                                                                                                                                 {'loss': 0.1633, 'grad_norm': 1.0024924278259277, 'learning_rate': 3.4758831960524095e-08, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 326.94, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19950/20117 [12:41:41<06:17,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19951/20117 [12:41:43<06:14,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19952/20117 [12:41:46<06:13,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19953/20117 [12:41:48<06:08,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19954/20117 [12:41:50<06:04,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19955/20117 [12:41:52<06:00,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19956/20117 [12:41:54<05:56,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19957/20117 [12:41:57<05:55,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19958/20117 [12:41:59<05:54,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19959/20117 [12:42:01<05:52,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19960/20117 [12:42:03<05:48,  2.22s/it]                                                                                                                                 {'loss': 0.1853, 'grad_norm': 0.5408278107643127, 'learning_rate': 3.0744234966195715e-08, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 380.23, 'epoch': 1.98}
 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19960/20117 [12:42:03<05:48,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19961/20117 [12:42:06<05:52,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19962/20117 [12:42:08<05:50,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19963/20117 [12:42:10<05:46,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19964/20117 [12:42:12<05:42,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▎| 19965/20117 [12:42:15<05:37,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19966/20117 [12:42:17<05:33,  2.21s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19967/20117 [12:42:19<05:34,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19968/20117 [12:42:21<05:31,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19969/20117 [12:42:23<05:29,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19970/20117 [12:42:26<05:29,  2.24s/it]                                                                                                                                 {'loss': 0.159, 'grad_norm': 0.6154715418815613, 'learning_rate': 2.6975883377799993e-08, 'memory/max_active (GiB)': 21.5, 'memory/max_allocated (GiB)': 21.5, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 345.31, 'epoch': 1.99}
 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19970/20117 [12:42:26<05:29,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19971/20117 [12:42:28<05:25,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19972/20117 [12:42:30<05:23,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19973/20117 [12:42:32<05:21,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19974/20117 [12:42:35<05:20,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19975/20117 [12:42:37<05:19,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19976/20117 [12:42:39<05:16,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19977/20117 [12:42:41<05:17,  2.27s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19978/20117 [12:42:44<05:13,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19979/20117 [12:42:46<05:12,  2.27s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19980/20117 [12:42:48<05:08,  2.25s/it]                                                                                                                                 {'loss': 0.1489, 'grad_norm': 0.52059006690979, 'learning_rate': 2.3453786477589668e-08, 'memory/max_active (GiB)': 20.64, 'memory/max_allocated (GiB)': 20.64, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 365.37, 'epoch': 1.99}
 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19980/20117 [12:42:48<05:08,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19981/20117 [12:42:50<05:04,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19982/20117 [12:42:53<05:02,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19983/20117 [12:42:55<05:00,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19984/20117 [12:42:57<04:56,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19985/20117 [12:42:59<04:53,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19986/20117 [12:43:02<04:52,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19987/20117 [12:43:04<04:51,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19988/20117 [12:43:06<04:47,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19989/20117 [12:43:08<04:44,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19990/20117 [12:43:10<04:42,  2.23s/it]                                                                                                                                 {'loss': 0.1679, 'grad_norm': 0.5241625905036926, 'learning_rate': 2.0177952941224932e-08, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 361.81, 'epoch': 1.99}
 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19990/20117 [12:43:10<04:42,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19991/20117 [12:43:13<04:39,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19992/20117 [12:43:15<04:37,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19993/20117 [12:43:17<04:38,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19994/20117 [12:43:19<04:37,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▍| 19995/20117 [12:43:22<04:34,  2.25s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 19996/20117 [12:43:24<04:33,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 19997/20117 [12:43:26<04:28,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 19998/20117 [12:43:28<04:25,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 19999/20117 [12:43:31<04:23,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20000/20117 [12:43:33<04:21,  2.23s/it]                                                                                                                                 {'loss': 0.1612, 'grad_norm': 0.5427475571632385, 'learning_rate': 1.7148390837784523e-08, 'memory/max_active (GiB)': 20.75, 'memory/max_allocated (GiB)': 20.75, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 405.51, 'epoch': 1.99}
 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20000/20117 [12:43:33<04:21,  2.23s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20001/20117 [12:43:35<04:28,  2.32s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20002/20117 [12:43:38<04:23,  2.29s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20003/20117 [12:43:40<04:20,  2.28s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20004/20117 [12:43:42<04:16,  2.27s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20005/20117 [12:43:44<04:13,  2.27s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20006/20117 [12:43:47<04:13,  2.28s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20007/20117 [12:43:49<04:08,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20008/20117 [12:43:51<04:04,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20009/20117 [12:43:53<04:02,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20010/20117 [12:43:56<04:02,  2.26s/it]                                                                                                                                 {'loss': 0.1309, 'grad_norm': 0.4156535267829895, 'learning_rate': 1.4365107629710218e-08, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 371.64, 'epoch': 1.99}
 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20010/20117 [12:43:56<04:02,  2.26s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20011/20117 [12:43:58<03:57,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20012/20117 [12:44:00<03:55,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20013/20117 [12:44:02<03:52,  2.24s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20014/20117 [12:44:05<03:48,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20015/20117 [12:44:07<03:46,  2.22s/it] 99%|██████████████████████████████████████████████████████████████████████████████████▌| 20016/20117 [12:44:09<03:46,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20017/20117 [12:44:11<03:42,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20018/20117 [12:44:13<03:40,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20019/20117 [12:44:16<03:38,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20020/20117 [12:44:18<03:38,  2.26s/it]                                                                                                                                 {'loss': 0.1373, 'grad_norm': 0.5698441863059998, 'learning_rate': 1.182811017281793e-08, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 339.51, 'epoch': 1.99}
100%|██████████████████████████████████████████████████████████████████████████████████▌| 20020/20117 [12:44:18<03:38,  2.26s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20021/20117 [12:44:20<03:34,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20022/20117 [12:44:22<03:31,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20023/20117 [12:44:25<03:28,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20024/20117 [12:44:27<03:27,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20025/20117 [12:44:29<03:25,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▌| 20026/20117 [12:44:31<03:23,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20027/20117 [12:44:34<03:21,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20028/20117 [12:44:36<03:18,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20029/20117 [12:44:38<03:15,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20030/20117 [12:44:40<03:13,  2.22s/it]                                                                                                                                 {'loss': 0.1489, 'grad_norm': 0.7525358200073242, 'learning_rate': 9.537404716286613e-09, 'memory/max_active (GiB)': 19.67, 'memory/max_allocated (GiB)': 19.67, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.13, 'epoch': 1.99}
100%|██████████████████████████████████████████████████████████████████████████████████▋| 20030/20117 [12:44:40<03:13,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20031/20117 [12:44:42<03:10,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20032/20117 [12:44:45<03:08,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20033/20117 [12:44:47<03:07,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20034/20117 [12:44:49<03:04,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20035/20117 [12:44:51<03:02,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20036/20117 [12:44:54<03:00,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20037/20117 [12:44:56<02:59,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20038/20117 [12:44:58<02:56,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20039/20117 [12:45:00<02:56,  2.26s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20040/20117 [12:45:03<02:52,  2.24s/it]                                                                                                                                 {'loss': 0.1278, 'grad_norm': 0.6561192870140076, 'learning_rate': 7.49299690258054e-09, 'memory/max_active (GiB)': 21.51, 'memory/max_allocated (GiB)': 21.51, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 405.65, 'epoch': 1.99}
100%|██████████████████████████████████████████████████████████████████████████████████▋| 20040/20117 [12:45:03<02:52,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20041/20117 [12:45:05<02:49,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20042/20117 [12:45:07<02:47,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20043/20117 [12:45:09<02:45,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20044/20117 [12:45:12<02:43,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20045/20117 [12:45:14<02:40,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20046/20117 [12:45:16<02:38,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20047/20117 [12:45:18<02:36,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20048/20117 [12:45:20<02:33,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20049/20117 [12:45:23<02:32,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20050/20117 [12:45:25<02:29,  2.23s/it]                                                                                                                                 {'loss': 0.1298, 'grad_norm': 0.28374719619750977, 'learning_rate': 5.694891767527022e-09, 'memory/max_active (GiB)': 20.76, 'memory/max_allocated (GiB)': 20.76, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 347.78, 'epoch': 1.99}
100%|██████████████████████████████████████████████████████████████████████████████████▋| 20050/20117 [12:45:25<02:29,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20051/20117 [12:45:27<02:27,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20052/20117 [12:45:30<02:30,  2.31s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20053/20117 [12:45:32<02:26,  2.28s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20054/20117 [12:45:34<02:21,  2.25s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20055/20117 [12:45:36<02:19,  2.26s/it]100%|██████████████████████████████████████████████████████████████████████████████████▋| 20056/20117 [12:45:38<02:16,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20057/20117 [12:45:41<02:13,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20058/20117 [12:45:43<02:11,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20059/20117 [12:45:45<02:11,  2.26s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20060/20117 [12:45:47<02:08,  2.25s/it]                                                                                                                                 {'loss': 0.105, 'grad_norm': 0.46046513319015503, 'learning_rate': 4.1430937402275885e-09, 'memory/max_active (GiB)': 21.53, 'memory/max_allocated (GiB)': 21.53, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 354.72, 'epoch': 1.99}
100%|██████████████████████████████████████████████████████████████████████████████████▊| 20060/20117 [12:45:47<02:08,  2.25s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20061/20117 [12:45:50<02:05,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20062/20117 [12:45:52<02:02,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20063/20117 [12:45:54<01:59,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20064/20117 [12:45:56<01:58,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20065/20117 [12:45:59<01:55,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20066/20117 [12:46:01<01:53,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20067/20117 [12:46:03<01:50,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20068/20117 [12:46:05<01:49,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20069/20117 [12:46:08<01:47,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20070/20117 [12:46:10<01:44,  2.22s/it]                                                                                                                                 {'loss': 0.1675, 'grad_norm': 0.6896146535873413, 'learning_rate': 2.837606643102397e-09, 'memory/max_active (GiB)': 20.55, 'memory/max_allocated (GiB)': 20.55, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 322.76, 'epoch': 2.0}
100%|██████████████████████████████████████████████████████████████████████████████████▊| 20070/20117 [12:46:10<01:44,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20071/20117 [12:46:12<01:41,  2.21s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20072/20117 [12:46:14<01:39,  2.21s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20073/20117 [12:46:16<01:37,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20074/20117 [12:46:19<01:35,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20075/20117 [12:46:21<01:33,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20076/20117 [12:46:23<01:31,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20077/20117 [12:46:25<01:29,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20078/20117 [12:46:28<01:27,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20079/20117 [12:46:30<01:24,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20080/20117 [12:46:32<01:22,  2.24s/it]                                                                                                                                 {'loss': 0.1681, 'grad_norm': 0.29641345143318176, 'learning_rate': 1.7784336918347244e-09, 'memory/max_active (GiB)': 21.49, 'memory/max_allocated (GiB)': 21.49, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 359.15, 'epoch': 2.0}
100%|██████████████████████████████████████████████████████████████████████████████████▊| 20080/20117 [12:46:32<01:22,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20081/20117 [12:46:34<01:20,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20082/20117 [12:46:36<01:18,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20083/20117 [12:46:39<01:15,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20084/20117 [12:46:41<01:13,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20085/20117 [12:46:43<01:11,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▊| 20086/20117 [12:46:45<01:09,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20087/20117 [12:46:48<01:07,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20088/20117 [12:46:50<01:04,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20089/20117 [12:46:52<01:02,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20090/20117 [12:46:54<00:59,  2.22s/it]                                                                                                                                 {'loss': 0.164, 'grad_norm': 0.6377694010734558, 'learning_rate': 9.655774953931662e-10, 'memory/max_active (GiB)': 20.73, 'memory/max_allocated (GiB)': 20.73, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 351.54, 'epoch': 2.0}
100%|██████████████████████████████████████████████████████████████████████████████████▉| 20090/20117 [12:46:54<00:59,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20091/20117 [12:46:56<00:57,  2.20s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20092/20117 [12:46:59<00:55,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20093/20117 [12:47:01<00:52,  2.21s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20094/20117 [12:47:03<00:50,  2.21s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20095/20117 [12:47:05<00:48,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20096/20117 [12:47:08<00:46,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20097/20117 [12:47:10<00:44,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20098/20117 [12:47:12<00:42,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20099/20117 [12:47:14<00:39,  2.20s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20100/20117 [12:47:16<00:37,  2.20s/it]                                                                                                                                 {'loss': 0.1053, 'grad_norm': 0.5432741641998291, 'learning_rate': 3.990400559983343e-10, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 346.46, 'epoch': 2.0}
100%|██████████████████████████████████████████████████████████████████████████████████▉| 20100/20117 [12:47:16<00:37,  2.20s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20101/20117 [12:47:19<00:35,  2.19s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20102/20117 [12:47:21<00:33,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20103/20117 [12:47:23<00:30,  2.21s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20104/20117 [12:47:25<00:29,  2.29s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20105/20117 [12:47:28<00:27,  2.28s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20106/20117 [12:47:30<00:24,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20107/20117 [12:47:32<00:22,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20108/20117 [12:47:34<00:20,  2.24s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20109/20117 [12:47:37<00:17,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20110/20117 [12:47:39<00:15,  2.22s/it]                                                                                                                                 {'loss': 0.2105, 'grad_norm': 0.35591670870780945, 'learning_rate': 7.882276917836606e-11, 'memory/max_active (GiB)': 20.56, 'memory/max_allocated (GiB)': 20.56, 'memory/device_reserved (GiB)': 22.49, 'tokens_per_second_per_gpu': 408.89, 'epoch': 2.0}
100%|██████████████████████████████████████████████████████████████████████████████████▉| 20110/20117 [12:47:39<00:15,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20111/20117 [12:47:41<00:13,  2.22s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20112/20117 [12:47:43<00:11,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20113/20117 [12:47:46<00:08,  2.25s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20114/20117 [12:47:48<00:06,  2.23s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20115/20117 [12:47:50<00:04,  2.21s/it]100%|██████████████████████████████████████████████████████████████████████████████████▉| 20116/20117 [12:47:52<00:02,  2.22s/it]100%|███████████████████████████████████████████████████████████████████████████████████| 20117/20117 [12:47:54<00:00,  2.21s/it][2026-04-16 03:51:09,681] [INFO] [axolotl.core.trainers.base._save:671] [PID:2788] Saving model checkpoint to ./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent/checkpoint-20117
                                                                                                                                 {'train_runtime': 46075.9978, 'train_samples_per_second': 3.493, 'train_steps_per_second': 0.437, 'train_loss': 0.20420050134228362, 'memory/max_active (GiB)': 20.74, 'memory/max_allocated (GiB)': 20.74, 'memory/device_reserved (GiB)': 22.49, 'epoch': 2.0}
100%|███████████████████████████████████████████████████████████████████████████████████| 20117/20117 [12:47:55<00:00,  2.21s/it]100%|███████████████████████████████████████████████████████████████████████████████████| 20117/20117 [12:47:56<00:00,  2.29s/it]
[2026-04-16 03:51:11,057] [INFO] [axolotl.train.save_trained_model:218] [PID:2788] Training completed! Saving trained model to ./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent.
[2026-04-16 03:51:11,530] [INFO] [axolotl.train.save_trained_model:336] [PID:2788] Model successfully saved to ./outputs/Qwen2.5-Coder-3B-Instruct-coding-agent