QuangDuy commited on
Commit
ae7fe00
·
verified ·
1 Parent(s): 2f920b7

Upload checkpoint-23100

Browse files
checkpoints/checkpoint-23100/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoints/checkpoint-23100/README.md ADDED
@@ -0,0 +1,906 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:985664
9
+ - loss:MatryoshkaLoss
10
+ - loss:MultipleNegativesRankingLoss
11
+ base_model: QuangDuy/bert-base-stage2-hf
12
+ widget:
13
+ - source_sentence: gần là gì
14
+ sentences:
15
+ - Thủ thuật tạo dòng chảy gần động mạch là một phương pháp thay thế phù hợp cho
16
+ kỹ thuật DRIL để điều trị bệnh nhân ARI. Kỹ thuật được mô tả thích hợp hơn cho
17
+ các bác sĩ phẫu thuật không muốn nối động mạch trục.
18
+ - Trang thảo luận có thể chứa các đề xuất. (Tháng 9 năm 2010). Polyurethane nhiệt
19
+ dẻo (TPU) là bất kỳ loại nhựa polyurethane nào với nhiều đặc tính, bao gồm độ
20
+ đàn hồi, độ trong suốt và khả năng chống dầu, mỡ và mài mòn. PU có nhiều ứng dụng
21
+ bao gồm tấm thiết bị ô tô, bánh xe đẩy, dụng cụ điện, đồ thể thao, thiết bị y
22
+ tế, dây đai truyền động, giày dép, bè bơm hơi và nhiều ứng dụng màng, tấm và hồ
23
+ sơ ép đùn.
24
+ - Bởi vì cách tiếp cận như vậy, về nguyên tắc, có vẻ không được mong muốn, chúng
25
+ tôi đã phát triển và áp dụng một kỹ thuật thay thế được gọi là phương pháp tiếp
26
+ cận gần dòng vào động mạch (PAI). Quy trình này chuyển đổi nguồn cung cấp động
27
+ mạch của đường tiếp cận động mạch đến động mạch gần hơn có công suất cao hơn bằng
28
+ cách sử dụng mảnh ghép polytetrafluoroethylen cỡ nhỏ làm bộ phận trung chuyển.
29
+ - source_sentence: npat là gì
30
+ sentences:
31
+ - 'là gì: zantac, dùng để làm gì; và cái gì ... 59. cái gì là: pepcid, dùng để;
32
+ và là gì ... 60. là gì: bi-citra, dùng để làm gì; và cái gì ... 61. là gì: glucagon,
33
+ dùng để làm gì; và cái gì ... 62. là gì: humulin, dùng để làm gì; và cái gì ...'
34
+ - NMIMS-NPAT (Bài kiểm tra quốc gia cho các chương trình sau 12 tuổi) là bài kiểm
35
+ tra đầu vào chính thức để tuyển sinh vào các Chương trình cấp bằng đại học và
36
+ bằng cấp tích hợp được cung cấp bởi các trường thành viên của Viện Nghiên cứu
37
+ Quản lý Narsee Monjee của SVKM (Được coi là Đại học) tại Cơ sở Mumbai & Shirpur.
38
+ andidates có thể đăng ký trực tuyến trước ngày 30 tháng 4 năm 2015 cho NMIMS-NPAT
39
+ (Các chương trình sau ngày 12) dự kiến vào ngày 9 và 10 tháng 5 năm 2015 tại các
40
+ trung tâm trên toàn quốc.
41
+ - Kali là một nguyên tố tuyệt vời được sử dụng trong vườn vì kali là chìa khóa để
42
+ giúp cây ra trái trong khi nitơ n tăng tán lá và sự phát triển của cây và p phốt
43
+ pho giúp cấu trúc rễ.
44
+ - source_sentence: mtr là gì
45
+ sentences:
46
+ - Cài đặt mtr trong hệ điều hành yêu thích của bạn. mtr có thể được cài đặt trong
47
+ Ubuntu linux bằng cách cài đặt gói MTR. Một mtr cài đặt apt-get đơn giản sẽ làm
48
+ được điều đó, hoặc nếu bạn thích sử dụng trình quản lý gói để cài đặt chương trình.
49
+ - Cập nhật ngày 09 tháng 8 năm 2016. Phát triển Nguồn nhân lực (HRD) là khuôn khổ
50
+ để giúp nhân viên phát triển các kỹ năng, kiến thức và khả năng cá nhân và tổ
51
+ chức của họ.
52
+ - Từ Wikipedia, bách khoa toàn thư miễn phí. Đường sắt vận chuyển hàng loạt hoặc
53
+ MTR (bằng tiếng Trung, é¦Ââ „¢港éµ路æœÂâ €°éÂâ" ¢ÂॠÂâ € ¦¬Ã ¥ ¸, nghĩa đen
54
+ là Công ty Đường sắt Hồng Kông; hoặc 港éµ) là hệ thống đường sắt vận chuyển nhanh
55
+ chính ở Hồng Kông. Kể từ khi dịch vụ MTR lần đầu tiên được khai trương vào năm
56
+ 1979, mạng lưới đã phát triển lên 150 trạm. Được xây dựng và điều hành bởi MTR
57
+ Corporation Limited, hệ thống MTR là phương tiện giao thông công cộng rất phổ
58
+ biến ở Hồng Kông, với khoảng 2,46 triệu hành khách đi lại mỗi ngày.
59
+ - source_sentence: snp là gì?
60
+ sentences:
61
+ - Grid là một phần lớn của hệ thống Tron, được lập trình bởi Kevin Flynn. Thường
62
+ được Flynn gọi là biên giới kỹ thuật số của mình, Grid được tạo ra để cung cấp
63
+ một nền tảng thử nghiệm nơi tất cả các dạng ...
64
+ - Đa hình nucleotide đơn hay SNP (phát âm là đoạn cắt) là một biến thể trình tự
65
+ DNA xảy ra khi một nucleotide đơn - A, T, C hoặc G - trong bộ gen (hoặc trình
66
+ tự dùng chung khác) khác nhau giữa các thành viên của một loài (hoặc giữa các
67
+ nhiễm sắc thể được ghép đôi trong một cá nhân).
68
+ - 'Trước khi có thể mở tệp SNP, bạn sẽ cần tìm ra loại tệp mà phần mở rộng tệp SNP
69
+ đề cập đến. Mẹo: Lỗi liên kết tệp SNP không chính xác có thể là triệu chứng của
70
+ các sự cố cơ bản khác trong hệ điều hành Windows của bạn.'
71
+ - source_sentence: pcb là gì
72
+ sentences:
73
+ - thiết bị lưu trữ ảo (VSA) Thiết bị lưu trữ ảo (VSA) là một bộ điều khiển lưu trữ
74
+ chạy trên máy ảo (VM) để tạo lưu trữ dùng chung mà không cần thêm phần cứng. Tải
75
+ xuống hướng dẫn miễn phí này.
76
+ - PCB là từ viết tắt của bảng mạch in. Nó là một bảng có các đường và miếng đệm
77
+ kết nối các điểm khác nhau với nhau. Trong hình trên, có những dấu vết kết nối
78
+ điện của các đầu nối và thành phần khác nhau với nhau. PCB cho phép chuyển tín
79
+ hiệu và nguồn điện giữa các thiết bị vật lý.
80
+ - Thuốc hàn là một dạng chất hàn được sử dụng trong lắp ráp PCB, và bao gồm cả lắp
81
+ ráp PCB nguyên mẫu, đặc biệt là khi sử dụng kỹ thuật hàn nóng chảy lại. Thuốc
82
+ hàn là hỗn hợp giữa các khối cầu hàn và một dạng chất trợ dung chuyên dụng. hướng
83
+ dẫn, thông tin, bài viết về keo hàn là gì, cách sử dụng keo hàn và đạt được nhiều
84
+ nhất từ quá trình lắp ráp PCB. Hướng dẫn Quy trình hàn Bao gồm.
85
+ pipeline_tag: sentence-similarity
86
+ library_name: sentence-transformers
87
+ ---
88
+
89
+ # SentenceTransformer based on QuangDuy/bert-base-stage2-hf
90
+
91
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [QuangDuy/bert-base-stage2-hf](https://huggingface.co/QuangDuy/bert-base-stage2-hf). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
92
+
93
+ ## Model Details
94
+
95
+ ### Model Description
96
+ - **Model Type:** Sentence Transformer
97
+ - **Base model:** [QuangDuy/bert-base-stage2-hf](https://huggingface.co/QuangDuy/bert-base-stage2-hf) <!-- at revision 6a6ac1ff59259c4fe29b121488afa79d0bfe3e6a -->
98
+ - **Maximum Sequence Length:** 512 tokens
99
+ - **Output Dimensionality:** 768 dimensions
100
+ - **Similarity Function:** Cosine Similarity
101
+ <!-- - **Training Dataset:** Unknown -->
102
+ <!-- - **Language:** Unknown -->
103
+ <!-- - **License:** Unknown -->
104
+
105
+ ### Model Sources
106
+
107
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
108
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
109
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
110
+
111
+ ### Full Model Architecture
112
+
113
+ ```
114
+ SentenceTransformer(
115
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
116
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
117
+ (2): Normalize()
118
+ )
119
+ ```
120
+
121
+ ## Usage
122
+
123
+ ### Direct Usage (Sentence Transformers)
124
+
125
+ First install the Sentence Transformers library:
126
+
127
+ ```bash
128
+ pip install -U sentence-transformers
129
+ ```
130
+
131
+ Then you can load this model and run inference.
132
+ ```python
133
+ from sentence_transformers import SentenceTransformer
134
+
135
+ # Download from the 🤗 Hub
136
+ model = SentenceTransformer("sentence_transformers_model_id")
137
+ # Run inference
138
+ sentences = [
139
+ 'pcb là gì',
140
+ 'PCB là từ viết tắt của bảng mạch in. Nó là một bảng có các đường và miếng đệm kết nối các điểm khác nhau với nhau. Trong hình trên, có những dấu vết kết nối điện của các đầu nối và thành phần khác nhau với nhau. PCB cho phép chuyển tín hiệu và nguồn điện giữa các thiết bị vật lý.',
141
+ 'Thuốc hàn là một dạng chất hàn được sử dụng trong lắp ráp PCB, và bao gồm cả lắp ráp PCB nguyên mẫu, đặc biệt là khi sử dụng kỹ thuật hàn nóng chảy lại. Thuốc hàn là hỗn hợp giữa các khối cầu hàn và một dạng chất trợ dung chuyên dụng. hướng dẫn, thông tin, bài viết về keo hàn là gì, cách sử dụng keo hàn và đạt được nhiều nhất từ quá trình lắp ráp PCB. Hướng dẫn Quy trình hàn Bao gồm.',
142
+ ]
143
+ embeddings = model.encode(sentences)
144
+ print(embeddings.shape)
145
+ # [3, 768]
146
+
147
+ # Get the similarity scores for the embeddings
148
+ similarities = model.similarity(embeddings, embeddings)
149
+ print(similarities)
150
+ # tensor([[1.0000, 0.6234, 0.1118],
151
+ # [0.6234, 1.0000, 0.3754],
152
+ # [0.1118, 0.3754, 1.0000]])
153
+ ```
154
+
155
+ <!--
156
+ ### Direct Usage (Transformers)
157
+
158
+ <details><summary>Click to see the direct usage in Transformers</summary>
159
+
160
+ </details>
161
+ -->
162
+
163
+ <!--
164
+ ### Downstream Usage (Sentence Transformers)
165
+
166
+ You can finetune this model on your own dataset.
167
+
168
+ <details><summary>Click to expand</summary>
169
+
170
+ </details>
171
+ -->
172
+
173
+ <!--
174
+ ### Out-of-Scope Use
175
+
176
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
177
+ -->
178
+
179
+ <!--
180
+ ## Bias, Risks and Limitations
181
+
182
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
183
+ -->
184
+
185
+ <!--
186
+ ### Recommendations
187
+
188
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
189
+ -->
190
+
191
+ ## Training Details
192
+
193
+ ### Training Dataset
194
+
195
+ #### Unnamed Dataset
196
+
197
+ * Size: 985,664 training samples
198
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
199
+ * Approximate statistics based on the first 1000 samples:
200
+ | | sentence_0 | sentence_1 | sentence_2 |
201
+ |:--------|:--------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
202
+ | type | string | string | string |
203
+ | details | <ul><li>min: 4 tokens</li><li>mean: 5.64 tokens</li><li>max: 8 tokens</li></ul> | <ul><li>min: 22 tokens</li><li>mean: 89.52 tokens</li><li>max: 252 tokens</li></ul> | <ul><li>min: 23 tokens</li><li>mean: 92.4 tokens</li><li>max: 314 tokens</li></ul> |
204
+ * Samples:
205
+ | sentence_0 | sentence_1 | sentence_2 |
206
+ |:------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
207
+ | <code>afm là gì</code> | <code>AFM là gì? Kính hiển vi lực nguyên tử (AFM) là một loại kính hiển vi thăm dò quét (SPM). SPM được thiết kế để đo các đặc tính cục bộ, chẳng hạn như chiều cao, ma sát, từ tính, bằng một đầu dò. Để thu được hình ảnh, SPM raster quét đầu dò trên một khu vực nhỏ của mẫu, đo đặc tính cục bộ đồng thời.</code> | <code>Mica trần là một chất nền tốt và đặc trưng có thể được sử dụng để cung cấp hình ảnh rõ ràng, sắc nét của DNA. DNA có thể được chụp bằng AFM trên mica khô trong điều kiện môi trường xung quanh hoặc trong bộ đệm sinh lý. Hình ảnh AFM Hình 3.FM AFM là kỹ thuật quét không phá hủy trong đó đầu của đầu dò AFM được sử dụng để quét bề mặt của mẫu. Các đầu d�� AFM cực kỳ sắc nét (thường theo thứ tự hàng chục nanomet), vì vậy AFM thường xuyên có thể hình ảnh các tính năng cực kỳ nhỏ trên bề mặt mẫu.</code> |
208
+ | <code>xốp là gì</code> | <code>Thuật ngữ "Styrofoam" thực sự là một dạng cách nhiệt bọt polystyrene đã được đăng ký nhãn hiệu của Dow Chemical Co. Còn được gọi là xốp Polystrene mở rộng (EPS), về cơ bản, Styrofoam là một dạng của nhựa polystyrene. Đổi lại, nhựa polystyrene thường được mã hóa là nhựa # 6. Xốp được sử dụng rộng rãi trên toàn thế giới cho nhiều mục đích khác nhau bao gồm đóng gói, tách cà phê, đĩa, khay đựng thức ăn, chế tạo các bộ phận xe hơi, v.v.</code> | <code>Liên quan: tấm xốp máy cắt xốp tấm xốp xốp lớn xốp xốp khối xốp polystyrene khối xốp dây nóng máy cắt xốp quả bóng xốp polystyrene. Đây là giá trung bình dựa trên doanh số bán sản phẩm này trong cùng tình trạng từ tất cả các danh sách trên ebay.com trong 14 qua ngày hoặc nếu không có đủ số lượng danh sách cho một phép tính có ý nghĩa, trong 90 ngày qua.</code> |
209
+ | <code>ugc là gì?</code> | <code>Nội dung do người dùng tạo. Nội dung do người dùng tạo (UGC), còn được gọi là nội dung do người dùng tạo (UCC), là bất kỳ dạng nội dung nào được tạo bởi người dùng của một hệ thống hoặc dịch vụ và được cung cấp công khai trên hệ thống đó.</code> | <code>Tìm kiếm định nghĩa của UGC? Tìm hiểu ý nghĩa đầy đủ của UGC trên Abbreviations.com là gì! 'Ủy ban Tài trợ Đại học' là một lựa chọn - hãy truy cập để xem thêm @ Tài nguyên từ viết tắt và từ viết tắt lớn nhất và có thẩm quyền nhất của Web.</code> |
210
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
211
+ ```json
212
+ {
213
+ "loss": "MultipleNegativesRankingLoss",
214
+ "matryoshka_dims": [
215
+ 768,
216
+ 512,
217
+ 256,
218
+ 128
219
+ ],
220
+ "matryoshka_weights": [
221
+ 1,
222
+ 1,
223
+ 1,
224
+ 1
225
+ ],
226
+ "n_dims_per_step": -1
227
+ }
228
+ ```
229
+
230
+ ### Training Hyperparameters
231
+ #### Non-Default Hyperparameters
232
+
233
+ - `eval_strategy`: steps
234
+ - `per_device_train_batch_size`: 64
235
+ - `per_device_eval_batch_size`: 64
236
+ - `weight_decay`: 0.01
237
+ - `num_train_epochs`: 5
238
+ - `warmup_steps`: 7701
239
+ - `bf16`: True
240
+ - `gradient_checkpointing`: True
241
+ - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
242
+ - `batch_sampler`: no_duplicates
243
+
244
+ #### All Hyperparameters
245
+ <details><summary>Click to expand</summary>
246
+
247
+ - `overwrite_output_dir`: False
248
+ - `do_predict`: False
249
+ - `eval_strategy`: steps
250
+ - `prediction_loss_only`: True
251
+ - `per_device_train_batch_size`: 64
252
+ - `per_device_eval_batch_size`: 64
253
+ - `per_gpu_train_batch_size`: None
254
+ - `per_gpu_eval_batch_size`: None
255
+ - `gradient_accumulation_steps`: 1
256
+ - `eval_accumulation_steps`: None
257
+ - `torch_empty_cache_steps`: None
258
+ - `learning_rate`: 5e-05
259
+ - `weight_decay`: 0.01
260
+ - `adam_beta1`: 0.9
261
+ - `adam_beta2`: 0.999
262
+ - `adam_epsilon`: 1e-08
263
+ - `max_grad_norm`: 1.0
264
+ - `num_train_epochs`: 5
265
+ - `max_steps`: -1
266
+ - `lr_scheduler_type`: linear
267
+ - `lr_scheduler_kwargs`: None
268
+ - `warmup_ratio`: 0.0
269
+ - `warmup_steps`: 7701
270
+ - `log_level`: passive
271
+ - `log_level_replica`: warning
272
+ - `log_on_each_node`: True
273
+ - `logging_nan_inf_filter`: True
274
+ - `save_safetensors`: True
275
+ - `save_on_each_node`: False
276
+ - `save_only_model`: False
277
+ - `restore_callback_states_from_checkpoint`: False
278
+ - `no_cuda`: False
279
+ - `use_cpu`: False
280
+ - `use_mps_device`: False
281
+ - `seed`: 42
282
+ - `data_seed`: None
283
+ - `jit_mode_eval`: False
284
+ - `bf16`: True
285
+ - `fp16`: False
286
+ - `fp16_opt_level`: O1
287
+ - `half_precision_backend`: auto
288
+ - `bf16_full_eval`: False
289
+ - `fp16_full_eval`: False
290
+ - `tf32`: None
291
+ - `local_rank`: 0
292
+ - `ddp_backend`: None
293
+ - `tpu_num_cores`: None
294
+ - `tpu_metrics_debug`: False
295
+ - `debug`: []
296
+ - `dataloader_drop_last`: True
297
+ - `dataloader_num_workers`: 0
298
+ - `dataloader_prefetch_factor`: None
299
+ - `past_index`: -1
300
+ - `disable_tqdm`: False
301
+ - `remove_unused_columns`: True
302
+ - `label_names`: None
303
+ - `load_best_model_at_end`: False
304
+ - `ignore_data_skip`: False
305
+ - `fsdp`: []
306
+ - `fsdp_min_num_params`: 0
307
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
308
+ - `fsdp_transformer_layer_cls_to_wrap`: None
309
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
310
+ - `parallelism_config`: None
311
+ - `deepspeed`: None
312
+ - `label_smoothing_factor`: 0.0
313
+ - `optim`: adamw_torch_fused
314
+ - `optim_args`: None
315
+ - `adafactor`: False
316
+ - `group_by_length`: False
317
+ - `length_column_name`: length
318
+ - `project`: huggingface
319
+ - `trackio_space_id`: trackio
320
+ - `ddp_find_unused_parameters`: None
321
+ - `ddp_bucket_cap_mb`: None
322
+ - `ddp_broadcast_buffers`: False
323
+ - `dataloader_pin_memory`: True
324
+ - `dataloader_persistent_workers`: False
325
+ - `skip_memory_metrics`: True
326
+ - `use_legacy_prediction_loop`: False
327
+ - `push_to_hub`: False
328
+ - `resume_from_checkpoint`: None
329
+ - `hub_model_id`: None
330
+ - `hub_strategy`: every_save
331
+ - `hub_private_repo`: None
332
+ - `hub_always_push`: False
333
+ - `hub_revision`: None
334
+ - `gradient_checkpointing`: True
335
+ - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
336
+ - `include_inputs_for_metrics`: False
337
+ - `include_for_metrics`: []
338
+ - `eval_do_concat_batches`: True
339
+ - `fp16_backend`: auto
340
+ - `push_to_hub_model_id`: None
341
+ - `push_to_hub_organization`: None
342
+ - `mp_parameters`:
343
+ - `auto_find_batch_size`: False
344
+ - `full_determinism`: False
345
+ - `torchdynamo`: None
346
+ - `ray_scope`: last
347
+ - `ddp_timeout`: 1800
348
+ - `torch_compile`: False
349
+ - `torch_compile_backend`: None
350
+ - `torch_compile_mode`: None
351
+ - `include_tokens_per_second`: False
352
+ - `include_num_input_tokens_seen`: no
353
+ - `neftune_noise_alpha`: None
354
+ - `optim_target_modules`: None
355
+ - `batch_eval_metrics`: False
356
+ - `eval_on_start`: False
357
+ - `use_liger_kernel`: False
358
+ - `liger_kernel_config`: None
359
+ - `eval_use_gather_object`: False
360
+ - `average_tokens_across_devices`: True
361
+ - `prompts`: None
362
+ - `batch_sampler`: no_duplicates
363
+ - `multi_dataset_batch_sampler`: proportional
364
+ - `router_mapping`: {}
365
+ - `learning_rate_mapping`: {}
366
+
367
+ </details>
368
+
369
+ ### Training Logs
370
+ <details><summary>Click to expand</summary>
371
+
372
+ | Epoch | Step | Training Loss | validation loss |
373
+ |:------:|:-----:|:-------------:|:---------------:|
374
+ | 0.0065 | 50 | 15.9471 | - |
375
+ | 0.0130 | 100 | 15.1591 | - |
376
+ | 0.0195 | 150 | 14.0548 | - |
377
+ | 0.0260 | 200 | 12.6234 | - |
378
+ | 0.0325 | 250 | 11.0546 | - |
379
+ | 0.0390 | 300 | 9.1498 | - |
380
+ | 0.0455 | 350 | 7.5429 | - |
381
+ | 0.0519 | 400 | 6.4324 | - |
382
+ | 0.0584 | 450 | 5.9025 | - |
383
+ | 0.0649 | 500 | 5.4181 | - |
384
+ | 0.0714 | 550 | 5.0231 | - |
385
+ | 0.0779 | 600 | 4.7729 | - |
386
+ | 0.0844 | 650 | 4.6554 | - |
387
+ | 0.0909 | 700 | 4.4429 | - |
388
+ | 0.0974 | 750 | 4.2589 | - |
389
+ | 0.1039 | 800 | 3.9485 | - |
390
+ | 0.1104 | 850 | 3.8368 | - |
391
+ | 0.1169 | 900 | 3.8319 | - |
392
+ | 0.1234 | 950 | 3.6684 | - |
393
+ | 0.1299 | 1000 | 3.5635 | - |
394
+ | 0.1364 | 1050 | 3.518 | - |
395
+ | 0.1429 | 1100 | 3.4108 | - |
396
+ | 0.1494 | 1150 | 3.4189 | - |
397
+ | 0.1558 | 1200 | 3.3312 | - |
398
+ | 0.1623 | 1250 | 3.3368 | - |
399
+ | 0.1688 | 1300 | 3.1654 | - |
400
+ | 0.1753 | 1350 | 3.1133 | - |
401
+ | 0.1818 | 1400 | 2.9981 | - |
402
+ | 0.1883 | 1450 | 2.9279 | - |
403
+ | 0.1948 | 1500 | 3.0855 | - |
404
+ | 0.2013 | 1550 | 2.8648 | - |
405
+ | 0.2078 | 1600 | 2.933 | - |
406
+ | 0.2143 | 1650 | 2.9586 | - |
407
+ | 0.2208 | 1700 | 2.8735 | - |
408
+ | 0.2273 | 1750 | 2.8262 | - |
409
+ | 0.2338 | 1800 | 2.8138 | - |
410
+ | 0.2403 | 1850 | 2.7759 | - |
411
+ | 0.2468 | 1900 | 2.7806 | - |
412
+ | 0.2532 | 1950 | 2.6917 | - |
413
+ | 0.2597 | 2000 | 2.781 | - |
414
+ | 0.2662 | 2050 | 2.6924 | - |
415
+ | 0.2727 | 2100 | 2.6548 | - |
416
+ | 0.2792 | 2150 | 2.5892 | - |
417
+ | 0.2857 | 2200 | 2.4805 | - |
418
+ | 0.2922 | 2250 | 2.5601 | - |
419
+ | 0.2987 | 2300 | 2.4803 | - |
420
+ | 0.3052 | 2350 | 2.4615 | - |
421
+ | 0.3117 | 2400 | 2.497 | - |
422
+ | 0.3182 | 2450 | 2.5528 | - |
423
+ | 0.3247 | 2500 | 2.3179 | - |
424
+ | 0.3312 | 2550 | 2.4226 | - |
425
+ | 0.3377 | 2600 | 2.3794 | - |
426
+ | 0.3442 | 2650 | 2.4812 | - |
427
+ | 0.3506 | 2700 | 2.3969 | - |
428
+ | 0.3571 | 2750 | 2.3512 | - |
429
+ | 0.3636 | 2800 | 2.3615 | - |
430
+ | 0.3701 | 2850 | 2.3175 | - |
431
+ | 0.3766 | 2900 | 2.3293 | - |
432
+ | 0.3831 | 2950 | 2.353 | - |
433
+ | 0.3896 | 3000 | 2.2407 | - |
434
+ | 0.3961 | 3050 | 2.3554 | - |
435
+ | 0.4026 | 3100 | 2.2079 | - |
436
+ | 0.4091 | 3150 | 2.3148 | - |
437
+ | 0.4156 | 3200 | 2.3595 | - |
438
+ | 0.4221 | 3250 | 2.3108 | - |
439
+ | 0.4286 | 3300 | 2.3236 | - |
440
+ | 0.4351 | 3350 | 2.1848 | - |
441
+ | 0.4416 | 3400 | 2.2364 | - |
442
+ | 0.4481 | 3450 | 2.2219 | - |
443
+ | 0.4545 | 3500 | 2.1346 | - |
444
+ | 0.4610 | 3550 | 2.2422 | - |
445
+ | 0.4675 | 3600 | 2.2226 | - |
446
+ | 0.4740 | 3650 | 2.1422 | - |
447
+ | 0.4805 | 3700 | 2.0838 | - |
448
+ | 0.4870 | 3750 | 2.1076 | - |
449
+ | 0.4935 | 3800 | 2.1266 | - |
450
+ | 0.5 | 3850 | 2.1255 | 5.7896 |
451
+ | 0.5065 | 3900 | 2.1261 | - |
452
+ | 0.5130 | 3950 | 2.1352 | - |
453
+ | 0.5195 | 4000 | 2.0632 | - |
454
+ | 0.5260 | 4050 | 2.1564 | - |
455
+ | 0.5325 | 4100 | 2.0435 | - |
456
+ | 0.5390 | 4150 | 2.0099 | - |
457
+ | 0.5455 | 4200 | 2.0602 | - |
458
+ | 0.5519 | 4250 | 2.0414 | - |
459
+ | 0.5584 | 4300 | 2.0636 | - |
460
+ | 0.5649 | 4350 | 2.0575 | - |
461
+ | 0.5714 | 4400 | 2.0827 | - |
462
+ | 0.5779 | 4450 | 1.9545 | - |
463
+ | 0.5844 | 4500 | 2.0735 | - |
464
+ | 0.5909 | 4550 | 1.9039 | - |
465
+ | 0.5974 | 4600 | 1.9605 | - |
466
+ | 0.6039 | 4650 | 2.0449 | - |
467
+ | 0.6104 | 4700 | 1.9651 | - |
468
+ | 0.6169 | 4750 | 1.9632 | - |
469
+ | 0.6234 | 4800 | 2.0401 | - |
470
+ | 0.6299 | 4850 | 2.0056 | - |
471
+ | 0.6364 | 4900 | 1.9399 | - |
472
+ | 0.6429 | 4950 | 1.8899 | - |
473
+ | 0.6494 | 5000 | 2.0154 | - |
474
+ | 0.6558 | 5050 | 1.9857 | - |
475
+ | 0.6623 | 5100 | 1.9239 | - |
476
+ | 0.6688 | 5150 | 1.9741 | - |
477
+ | 0.6753 | 5200 | 1.9258 | - |
478
+ | 0.6818 | 5250 | 1.9545 | - |
479
+ | 0.6883 | 5300 | 1.9496 | - |
480
+ | 0.6948 | 5350 | 1.9479 | - |
481
+ | 0.7013 | 5400 | 1.9068 | - |
482
+ | 0.7078 | 5450 | 1.847 | - |
483
+ | 0.7143 | 5500 | 1.8001 | - |
484
+ | 0.7208 | 5550 | 1.9119 | - |
485
+ | 0.7273 | 5600 | 1.8347 | - |
486
+ | 0.7338 | 5650 | 1.944 | - |
487
+ | 0.7403 | 5700 | 1.8523 | - |
488
+ | 0.7468 | 5750 | 1.7925 | - |
489
+ | 0.7532 | 5800 | 1.8803 | - |
490
+ | 0.7597 | 5850 | 1.8662 | - |
491
+ | 0.7662 | 5900 | 1.8841 | - |
492
+ | 0.7727 | 5950 | 1.7907 | - |
493
+ | 0.7792 | 6000 | 1.7876 | - |
494
+ | 0.7857 | 6050 | 1.8248 | - |
495
+ | 0.7922 | 6100 | 1.7473 | - |
496
+ | 0.7987 | 6150 | 1.8196 | - |
497
+ | 0.8052 | 6200 | 1.7241 | - |
498
+ | 0.8117 | 6250 | 1.8118 | - |
499
+ | 0.8182 | 6300 | 1.8037 | - |
500
+ | 0.8247 | 6350 | 1.7531 | - |
501
+ | 0.8312 | 6400 | 1.8088 | - |
502
+ | 0.8377 | 6450 | 1.7525 | - |
503
+ | 0.8442 | 6500 | 1.7287 | - |
504
+ | 0.8506 | 6550 | 1.756 | - |
505
+ | 0.8571 | 6600 | 1.7888 | - |
506
+ | 0.8636 | 6650 | 1.709 | - |
507
+ | 0.8701 | 6700 | 1.722 | - |
508
+ | 0.8766 | 6750 | 1.6935 | - |
509
+ | 0.8831 | 6800 | 1.7679 | - |
510
+ | 0.8896 | 6850 | 1.7438 | - |
511
+ | 0.8961 | 6900 | 1.6815 | - |
512
+ | 0.9026 | 6950 | 1.7269 | - |
513
+ | 0.9091 | 7000 | 1.7285 | - |
514
+ | 0.9156 | 7050 | 1.711 | - |
515
+ | 0.9221 | 7100 | 1.6137 | - |
516
+ | 0.9286 | 7150 | 1.6785 | - |
517
+ | 0.9351 | 7200 | 1.7142 | - |
518
+ | 0.9416 | 7250 | 1.7585 | - |
519
+ | 0.9481 | 7300 | 1.7268 | - |
520
+ | 0.9545 | 7350 | 1.6739 | - |
521
+ | 0.9610 | 7400 | 1.6462 | - |
522
+ | 0.9675 | 7450 | 1.7591 | - |
523
+ | 0.9740 | 7500 | 1.6058 | - |
524
+ | 0.9805 | 7550 | 1.6984 | - |
525
+ | 0.9870 | 7600 | 1.6609 | - |
526
+ | 0.9935 | 7650 | 1.6488 | - |
527
+ | 1.0 | 7700 | 1.7626 | 5.6307 |
528
+ | 1.0065 | 7750 | 1.6603 | - |
529
+ | 1.0130 | 7800 | 1.6979 | - |
530
+ | 1.0195 | 7850 | 1.6138 | - |
531
+ | 1.0260 | 7900 | 1.7008 | - |
532
+ | 1.0325 | 7950 | 1.6281 | - |
533
+ | 1.0390 | 8000 | 1.6029 | - |
534
+ | 1.0455 | 8050 | 1.6984 | - |
535
+ | 1.0519 | 8100 | 1.5865 | - |
536
+ | 1.0584 | 8150 | 1.6686 | - |
537
+ | 1.0649 | 8200 | 1.5427 | - |
538
+ | 1.0714 | 8250 | 1.5441 | - |
539
+ | 1.0779 | 8300 | 1.5524 | - |
540
+ | 1.0844 | 8350 | 1.5644 | - |
541
+ | 1.0909 | 8400 | 1.5964 | - |
542
+ | 1.0974 | 8450 | 1.5653 | - |
543
+ | 1.1039 | 8500 | 1.4443 | - |
544
+ | 1.1104 | 8550 | 1.5609 | - |
545
+ | 1.1169 | 8600 | 1.5097 | - |
546
+ | 1.1234 | 8650 | 1.5169 | - |
547
+ | 1.1299 | 8700 | 1.4992 | - |
548
+ | 1.1364 | 8750 | 1.5337 | - |
549
+ | 1.1429 | 8800 | 1.4971 | - |
550
+ | 1.1494 | 8850 | 1.5218 | - |
551
+ | 1.1558 | 8900 | 1.526 | - |
552
+ | 1.1623 | 8950 | 1.5531 | - |
553
+ | 1.1688 | 9000 | 1.5353 | - |
554
+ | 1.1753 | 9050 | 1.4596 | - |
555
+ | 1.1818 | 9100 | 1.434 | - |
556
+ | 1.1883 | 9150 | 1.4644 | - |
557
+ | 1.1948 | 9200 | 1.5245 | - |
558
+ | 1.2013 | 9250 | 1.3869 | - |
559
+ | 1.2078 | 9300 | 1.4361 | - |
560
+ | 1.2143 | 9350 | 1.4296 | - |
561
+ | 1.2208 | 9400 | 1.4338 | - |
562
+ | 1.2273 | 9450 | 1.383 | - |
563
+ | 1.2338 | 9500 | 1.4061 | - |
564
+ | 1.2403 | 9550 | 1.3497 | - |
565
+ | 1.2468 | 9600 | 1.4223 | - |
566
+ | 1.2532 | 9650 | 1.3842 | - |
567
+ | 1.2597 | 9700 | 1.3321 | - |
568
+ | 1.2662 | 9750 | 1.3396 | - |
569
+ | 1.2727 | 9800 | 1.3623 | - |
570
+ | 1.2792 | 9850 | 1.2811 | - |
571
+ | 1.2857 | 9900 | 1.2448 | - |
572
+ | 1.2922 | 9950 | 1.297 | - |
573
+ | 1.2987 | 10000 | 1.2812 | - |
574
+ | 1.3052 | 10050 | 1.2745 | - |
575
+ | 1.3117 | 10100 | 1.3114 | - |
576
+ | 1.3182 | 10150 | 1.3039 | - |
577
+ | 1.3247 | 10200 | 1.2243 | - |
578
+ | 1.3312 | 10250 | 1.2523 | - |
579
+ | 1.3377 | 10300 | 1.2213 | - |
580
+ | 1.3442 | 10350 | 1.3041 | - |
581
+ | 1.3506 | 10400 | 1.2164 | - |
582
+ | 1.3571 | 10450 | 1.2241 | - |
583
+ | 1.3636 | 10500 | 1.211 | - |
584
+ | 1.3701 | 10550 | 1.1839 | - |
585
+ | 1.3766 | 10600 | 1.1748 | - |
586
+ | 1.3831 | 10650 | 1.1886 | - |
587
+ | 1.3896 | 10700 | 1.1752 | - |
588
+ | 1.3961 | 10750 | 1.1968 | - |
589
+ | 1.4026 | 10800 | 1.1345 | - |
590
+ | 1.4091 | 10850 | 1.1882 | - |
591
+ | 1.4156 | 10900 | 1.2257 | - |
592
+ | 1.4221 | 10950 | 1.2079 | - |
593
+ | 1.4286 | 11000 | 1.1455 | - |
594
+ | 1.4351 | 11050 | 1.0684 | - |
595
+ | 1.4416 | 11100 | 1.0872 | - |
596
+ | 1.4481 | 11150 | 1.0888 | - |
597
+ | 1.4545 | 11200 | 1.0119 | - |
598
+ | 1.4610 | 11250 | 1.0735 | - |
599
+ | 1.4675 | 11300 | 1.0829 | - |
600
+ | 1.4740 | 11350 | 1.0539 | - |
601
+ | 1.4805 | 11400 | 1.0158 | - |
602
+ | 1.4870 | 11450 | 1.0087 | - |
603
+ | 1.4935 | 11500 | 1.0371 | - |
604
+ | 1.5 | 11550 | 1.0171 | 5.6201 |
605
+ | 1.5065 | 11600 | 1.0076 | - |
606
+ | 1.5130 | 11650 | 1.0063 | - |
607
+ | 1.5195 | 11700 | 1.0176 | - |
608
+ | 1.5260 | 11750 | 1.0075 | - |
609
+ | 1.5325 | 11800 | 0.9626 | - |
610
+ | 1.5390 | 11850 | 0.92 | - |
611
+ | 1.5455 | 11900 | 0.9672 | - |
612
+ | 1.5519 | 11950 | 0.9586 | - |
613
+ | 1.5584 | 12000 | 0.9745 | - |
614
+ | 1.5649 | 12050 | 0.9643 | - |
615
+ | 1.5714 | 12100 | 0.9866 | - |
616
+ | 1.5779 | 12150 | 0.9212 | - |
617
+ | 1.5844 | 12200 | 0.9023 | - |
618
+ | 1.5909 | 12250 | 0.8912 | - |
619
+ | 1.5974 | 12300 | 0.8844 | - |
620
+ | 1.6039 | 12350 | 0.9322 | - |
621
+ | 1.6104 | 12400 | 0.9373 | - |
622
+ | 1.6169 | 12450 | 0.892 | - |
623
+ | 1.6234 | 12500 | 0.9218 | - |
624
+ | 1.6299 | 12550 | 0.8738 | - |
625
+ | 1.6364 | 12600 | 0.8698 | - |
626
+ | 1.6429 | 12650 | 0.8722 | - |
627
+ | 1.6494 | 12700 | 0.8918 | - |
628
+ | 1.6558 | 12750 | 0.9164 | - |
629
+ | 1.6623 | 12800 | 0.8194 | - |
630
+ | 1.6688 | 12850 | 0.8228 | - |
631
+ | 1.6753 | 12900 | 0.8091 | - |
632
+ | 1.6818 | 12950 | 0.8162 | - |
633
+ | 1.6883 | 13000 | 0.8185 | - |
634
+ | 1.6948 | 13050 | 0.8196 | - |
635
+ | 1.7013 | 13100 | 0.7892 | - |
636
+ | 1.7078 | 13150 | 0.7315 | - |
637
+ | 1.7143 | 13200 | 0.7381 | - |
638
+ | 1.7208 | 13250 | 0.7465 | - |
639
+ | 1.7273 | 13300 | 0.7168 | - |
640
+ | 1.7338 | 13350 | 0.7449 | - |
641
+ | 1.7403 | 13400 | 0.7447 | - |
642
+ | 1.7468 | 13450 | 0.6669 | - |
643
+ | 1.7532 | 13500 | 0.7496 | - |
644
+ | 1.7597 | 13550 | 0.6783 | - |
645
+ | 1.7662 | 13600 | 0.7175 | - |
646
+ | 1.7727 | 13650 | 0.6728 | - |
647
+ | 1.7792 | 13700 | 0.6659 | - |
648
+ | 1.7857 | 13750 | 0.6861 | - |
649
+ | 1.7922 | 13800 | 0.6372 | - |
650
+ | 1.7987 | 13850 | 0.6679 | - |
651
+ | 1.8052 | 13900 | 0.6307 | - |
652
+ | 1.8117 | 13950 | 0.6307 | - |
653
+ | 1.8182 | 14000 | 0.622 | - |
654
+ | 1.8247 | 14050 | 0.6362 | - |
655
+ | 1.8312 | 14100 | 0.62 | - |
656
+ | 1.8377 | 14150 | 0.617 | - |
657
+ | 1.8442 | 14200 | 0.6069 | - |
658
+ | 1.8506 | 14250 | 0.5839 | - |
659
+ | 1.8571 | 14300 | 0.5974 | - |
660
+ | 1.8636 | 14350 | 0.5727 | - |
661
+ | 1.8701 | 14400 | 0.5629 | - |
662
+ | 1.8766 | 14450 | 0.56 | - |
663
+ | 1.8831 | 14500 | 0.5891 | - |
664
+ | 1.8896 | 14550 | 0.5899 | - |
665
+ | 1.8961 | 14600 | 0.5348 | - |
666
+ | 1.9026 | 14650 | 0.5753 | - |
667
+ | 1.9091 | 14700 | 0.5668 | - |
668
+ | 1.9156 | 14750 | 0.5778 | - |
669
+ | 1.9221 | 14800 | 0.5127 | - |
670
+ | 1.9286 | 14850 | 0.5291 | - |
671
+ | 1.9351 | 14900 | 0.5512 | - |
672
+ | 1.9416 | 14950 | 0.533 | - |
673
+ | 1.9481 | 15000 | 0.5455 | - |
674
+ | 1.9545 | 15050 | 0.511 | - |
675
+ | 1.9610 | 15100 | 0.4827 | - |
676
+ | 1.9675 | 15150 | 0.5358 | - |
677
+ | 1.9740 | 15200 | 0.4733 | - |
678
+ | 1.9805 | 15250 | 0.4979 | - |
679
+ | 1.9870 | 15300 | 0.4809 | - |
680
+ | 1.9935 | 15350 | 0.4783 | - |
681
+ | 2.0 | 15400 | 0.5226 | 5.8563 |
682
+ | 2.0065 | 15450 | 0.4851 | - |
683
+ | 2.0130 | 15500 | 0.5236 | - |
684
+ | 2.0195 | 15550 | 0.487 | - |
685
+ | 2.0260 | 15600 | 0.4984 | - |
686
+ | 2.0325 | 15650 | 0.493 | - |
687
+ | 2.0390 | 15700 | 0.4937 | - |
688
+ | 2.0455 | 15750 | 0.5143 | - |
689
+ | 2.0519 | 15800 | 0.4471 | - |
690
+ | 2.0584 | 15850 | 0.5013 | - |
691
+ | 2.0649 | 15900 | 0.4686 | - |
692
+ | 2.0714 | 15950 | 0.4246 | - |
693
+ | 2.0779 | 16000 | 0.4193 | - |
694
+ | 2.0844 | 16050 | 0.4438 | - |
695
+ | 2.0909 | 16100 | 0.4426 | - |
696
+ | 2.0974 | 16150 | 0.4511 | - |
697
+ | 2.1039 | 16200 | 0.3932 | - |
698
+ | 2.1104 | 16250 | 0.4574 | - |
699
+ | 2.1169 | 16300 | 0.4363 | - |
700
+ | 2.1234 | 16350 | 0.4181 | - |
701
+ | 2.1299 | 16400 | 0.4237 | - |
702
+ | 2.1364 | 16450 | 0.4611 | - |
703
+ | 2.1429 | 16500 | 0.4072 | - |
704
+ | 2.1494 | 16550 | 0.4382 | - |
705
+ | 2.1558 | 16600 | 0.4325 | - |
706
+ | 2.1623 | 16650 | 0.4315 | - |
707
+ | 2.1688 | 16700 | 0.4194 | - |
708
+ | 2.1753 | 16750 | 0.41 | - |
709
+ | 2.1818 | 16800 | 0.395 | - |
710
+ | 2.1883 | 16850 | 0.4141 | - |
711
+ | 2.1948 | 16900 | 0.4234 | - |
712
+ | 2.2013 | 16950 | 0.3706 | - |
713
+ | 2.2078 | 17000 | 0.375 | - |
714
+ | 2.2143 | 17050 | 0.3856 | - |
715
+ | 2.2208 | 17100 | 0.4104 | - |
716
+ | 2.2273 | 17150 | 0.3682 | - |
717
+ | 2.2338 | 17200 | 0.3849 | - |
718
+ | 2.2403 | 17250 | 0.3607 | - |
719
+ | 2.2468 | 17300 | 0.3821 | - |
720
+ | 2.2532 | 17350 | 0.3749 | - |
721
+ | 2.2597 | 17400 | 0.3548 | - |
722
+ | 2.2662 | 17450 | 0.3684 | - |
723
+ | 2.2727 | 17500 | 0.3649 | - |
724
+ | 2.2792 | 17550 | 0.3547 | - |
725
+ | 2.2857 | 17600 | 0.3308 | - |
726
+ | 2.2922 | 17650 | 0.3417 | - |
727
+ | 2.2987 | 17700 | 0.3414 | - |
728
+ | 2.3052 | 17750 | 0.3372 | - |
729
+ | 2.3117 | 17800 | 0.348 | - |
730
+ | 2.3182 | 17850 | 0.3391 | - |
731
+ | 2.3247 | 17900 | 0.3172 | - |
732
+ | 2.3312 | 17950 | 0.3336 | - |
733
+ | 2.3377 | 18000 | 0.3228 | - |
734
+ | 2.3442 | 18050 | 0.3643 | - |
735
+ | 2.3506 | 18100 | 0.3257 | - |
736
+ | 2.3571 | 18150 | 0.328 | - |
737
+ | 2.3636 | 18200 | 0.3218 | - |
738
+ | 2.3701 | 18250 | 0.3208 | - |
739
+ | 2.3766 | 18300 | 0.3085 | - |
740
+ | 2.3831 | 18350 | 0.3118 | - |
741
+ | 2.3896 | 18400 | 0.3165 | - |
742
+ | 2.3961 | 18450 | 0.3058 | - |
743
+ | 2.4026 | 18500 | 0.3082 | - |
744
+ | 2.4091 | 18550 | 0.3181 | - |
745
+ | 2.4156 | 18600 | 0.3269 | - |
746
+ | 2.4221 | 18650 | 0.3197 | - |
747
+ | 2.4286 | 18700 | 0.305 | - |
748
+ | 2.4351 | 18750 | 0.2837 | - |
749
+ | 2.4416 | 18800 | 0.2694 | - |
750
+ | 2.4481 | 18850 | 0.281 | - |
751
+ | 2.4545 | 18900 | 0.2493 | - |
752
+ | 2.4610 | 18950 | 0.279 | - |
753
+ | 2.4675 | 19000 | 0.2775 | - |
754
+ | 2.4740 | 19050 | 0.2566 | - |
755
+ | 2.4805 | 19100 | 0.2637 | - |
756
+ | 2.4870 | 19150 | 0.2639 | - |
757
+ | 2.4935 | 19200 | 0.2666 | - |
758
+ | 2.5 | 19250 | 0.2672 | 6.0399 |
759
+ | 2.5065 | 19300 | 0.2524 | - |
760
+ | 2.5130 | 19350 | 0.2544 | - |
761
+ | 2.5195 | 19400 | 0.2549 | - |
762
+ | 2.5260 | 19450 | 0.2531 | - |
763
+ | 2.5325 | 19500 | 0.2528 | - |
764
+ | 2.5390 | 19550 | 0.2325 | - |
765
+ | 2.5455 | 19600 | 0.2595 | - |
766
+ | 2.5519 | 19650 | 0.2441 | - |
767
+ | 2.5584 | 19700 | 0.2348 | - |
768
+ | 2.5649 | 19750 | 0.2393 | - |
769
+ | 2.5714 | 19800 | 0.2482 | - |
770
+ | 2.5779 | 19850 | 0.2389 | - |
771
+ | 2.5844 | 19900 | 0.2222 | - |
772
+ | 2.5909 | 19950 | 0.2316 | - |
773
+ | 2.5974 | 20000 | 0.2314 | - |
774
+ | 2.6039 | 20050 | 0.242 | - |
775
+ | 2.6104 | 20100 | 0.2445 | - |
776
+ | 2.6169 | 20150 | 0.2217 | - |
777
+ | 2.6234 | 20200 | 0.2276 | - |
778
+ | 2.6299 | 20250 | 0.231 | - |
779
+ | 2.6364 | 20300 | 0.2195 | - |
780
+ | 2.6429 | 20350 | 0.224 | - |
781
+ | 2.6494 | 20400 | 0.2224 | - |
782
+ | 2.6558 | 20450 | 0.2338 | - |
783
+ | 2.6623 | 20500 | 0.2017 | - |
784
+ | 2.6688 | 20550 | 0.2067 | - |
785
+ | 2.6753 | 20600 | 0.2019 | - |
786
+ | 2.6818 | 20650 | 0.204 | - |
787
+ | 2.6883 | 20700 | 0.1931 | - |
788
+ | 2.6948 | 20750 | 0.1968 | - |
789
+ | 2.7013 | 20800 | 0.19 | - |
790
+ | 2.7078 | 20850 | 0.1826 | - |
791
+ | 2.7143 | 20900 | 0.1962 | - |
792
+ | 2.7208 | 20950 | 0.1868 | - |
793
+ | 2.7273 | 21000 | 0.1757 | - |
794
+ | 2.7338 | 21050 | 0.1958 | - |
795
+ | 2.7403 | 21100 | 0.1832 | - |
796
+ | 2.7468 | 21150 | 0.1618 | - |
797
+ | 2.7532 | 21200 | 0.1919 | - |
798
+ | 2.7597 | 21250 | 0.1709 | - |
799
+ | 2.7662 | 21300 | 0.1815 | - |
800
+ | 2.7727 | 21350 | 0.1738 | - |
801
+ | 2.7792 | 21400 | 0.1631 | - |
802
+ | 2.7857 | 21450 | 0.1725 | - |
803
+ | 2.7922 | 21500 | 0.1681 | - |
804
+ | 2.7987 | 21550 | 0.1797 | - |
805
+ | 2.8052 | 21600 | 0.1653 | - |
806
+ | 2.8117 | 21650 | 0.1599 | - |
807
+ | 2.8182 | 21700 | 0.1625 | - |
808
+ | 2.8247 | 21750 | 0.1662 | - |
809
+ | 2.8312 | 21800 | 0.1499 | - |
810
+ | 2.8377 | 21850 | 0.1609 | - |
811
+ | 2.8442 | 21900 | 0.158 | - |
812
+ | 2.8506 | 21950 | 0.1525 | - |
813
+ | 2.8571 | 22000 | 0.1458 | - |
814
+ | 2.8636 | 22050 | 0.154 | - |
815
+ | 2.8701 | 22100 | 0.1453 | - |
816
+ | 2.8766 | 22150 | 0.1412 | - |
817
+ | 2.8831 | 22200 | 0.1572 | - |
818
+ | 2.8896 | 22250 | 0.1451 | - |
819
+ | 2.8961 | 22300 | 0.1502 | - |
820
+ | 2.9026 | 22350 | 0.1422 | - |
821
+ | 2.9091 | 22400 | 0.1495 | - |
822
+ | 2.9156 | 22450 | 0.1446 | - |
823
+ | 2.9221 | 22500 | 0.1422 | - |
824
+ | 2.9286 | 22550 | 0.1416 | - |
825
+ | 2.9351 | 22600 | 0.1592 | - |
826
+ | 2.9416 | 22650 | 0.1379 | - |
827
+ | 2.9481 | 22700 | 0.1412 | - |
828
+ | 2.9545 | 22750 | 0.1422 | - |
829
+ | 2.9610 | 22800 | 0.1251 | - |
830
+ | 2.9675 | 22850 | 0.1481 | - |
831
+ | 2.9740 | 22900 | 0.1256 | - |
832
+ | 2.9805 | 22950 | 0.1343 | - |
833
+ | 2.9870 | 23000 | 0.1304 | - |
834
+ | 2.9935 | 23050 | 0.1278 | - |
835
+ | 3.0 | 23100 | 0.1453 | 6.1832 |
836
+
837
+ </details>
838
+
839
+ ### Framework Versions
840
+ - Python: 3.11.15
841
+ - Sentence Transformers: 5.3.0
842
+ - Transformers: 4.57.6
843
+ - PyTorch: 2.11.0+cu130
844
+ - Accelerate: 1.13.0
845
+ - Datasets: 3.6.0
846
+ - Tokenizers: 0.22.2
847
+
848
+ ## Citation
849
+
850
+ ### BibTeX
851
+
852
+ #### Sentence Transformers
853
+ ```bibtex
854
+ @inproceedings{reimers-2019-sentence-bert,
855
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
856
+ author = "Reimers, Nils and Gurevych, Iryna",
857
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
858
+ month = "11",
859
+ year = "2019",
860
+ publisher = "Association for Computational Linguistics",
861
+ url = "https://arxiv.org/abs/1908.10084",
862
+ }
863
+ ```
864
+
865
+ #### MatryoshkaLoss
866
+ ```bibtex
867
+ @misc{kusupati2024matryoshka,
868
+ title={Matryoshka Representation Learning},
869
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
870
+ year={2024},
871
+ eprint={2205.13147},
872
+ archivePrefix={arXiv},
873
+ primaryClass={cs.LG}
874
+ }
875
+ ```
876
+
877
+ #### MultipleNegativesRankingLoss
878
+ ```bibtex
879
+ @misc{oord2019representationlearningcontrastivepredictive,
880
+ title={Representation Learning with Contrastive Predictive Coding},
881
+ author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
882
+ year={2019},
883
+ eprint={1807.03748},
884
+ archivePrefix={arXiv},
885
+ primaryClass={cs.LG},
886
+ url={https://arxiv.org/abs/1807.03748},
887
+ }
888
+ ```
889
+
890
+ <!--
891
+ ## Glossary
892
+
893
+ *Clearly define terms in order to be accessible across audiences.*
894
+ -->
895
+
896
+ <!--
897
+ ## Model Card Authors
898
+
899
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
900
+ -->
901
+
902
+ <!--
903
+ ## Model Card Contact
904
+
905
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
906
+ -->
checkpoints/checkpoint-23100/config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "classifier_activation": "silu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 0,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 3,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 768,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 160000.0,
29
+ "max_position_embeddings": 4096,
30
+ "mlp_bias": false,
31
+ "mlp_dropout": 0.0,
32
+ "model_type": "modernbert",
33
+ "norm_bias": false,
34
+ "norm_eps": 1e-05,
35
+ "num_attention_heads": 12,
36
+ "num_hidden_layers": 22,
37
+ "pad_token_id": 2,
38
+ "position_embedding_type": "absolute",
39
+ "repad_logits_with_grad": false,
40
+ "sep_token_id": 3,
41
+ "sparse_pred_ignore_index": -100,
42
+ "sparse_prediction": false,
43
+ "transformers_version": "4.57.6",
44
+ "vocab_size": 32064
45
+ }
checkpoints/checkpoint-23100/config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.3.0",
5
+ "transformers": "4.57.6",
6
+ "pytorch": "2.11.0+cu130"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
checkpoints/checkpoint-23100/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aec4a793896f7ca8d6d120922152718f3e06ca48a65fd8d117e023d54e002056
3
+ size 539840248
checkpoints/checkpoint-23100/modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
checkpoints/checkpoint-23100/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b140bcbeff7c754fe17ee35291815c5a0dde4da2c508535dd972bb64accc6b1
3
+ size 1079769611
checkpoints/checkpoint-23100/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef85c13e868e8864575de49cdfba530a4ca35426bfdd6ae4a1d85cfc67e18e24
3
+ size 14917
checkpoints/checkpoint-23100/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c40edc30c890d97da41dff1ecd681b9ce0d6a47d78562cb0e048c1fe568159d
3
+ size 14917
checkpoints/checkpoint-23100/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:524ab132d3dd67a62e91fef7ec4a89ad107a3c1376bd67c90825907407f79636
3
+ size 1465
checkpoints/checkpoint-23100/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoints/checkpoint-23100/special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
checkpoints/checkpoint-23100/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoints/checkpoint-23100/tokenizer_config.json ADDED
@@ -0,0 +1,569 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[CLS]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[MASK]",
13
+ "lstrip": true,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[PAD]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "32000": {
44
+ "content": "[unused1]",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "32001": {
52
+ "content": "[unused2]",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "32002": {
60
+ "content": "[unused3]",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "32003": {
68
+ "content": "[unused4]",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "32004": {
76
+ "content": "[unused5]",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "32005": {
84
+ "content": "[unused6]",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "32006": {
92
+ "content": "[unused7]",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "32007": {
100
+ "content": "[unused8]",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "32008": {
108
+ "content": "[unused9]",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "32009": {
116
+ "content": "[unused10]",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "32010": {
124
+ "content": "[unused11]",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "32011": {
132
+ "content": "[unused12]",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "32012": {
140
+ "content": "[unused13]",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "32013": {
148
+ "content": "[unused14]",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "32014": {
156
+ "content": "[unused15]",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "32015": {
164
+ "content": "[unused16]",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "32016": {
172
+ "content": "[unused17]",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "32017": {
180
+ "content": "[unused18]",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "32018": {
188
+ "content": "[unused19]",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "32019": {
196
+ "content": "[unused20]",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "32020": {
204
+ "content": "[unused21]",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "32021": {
212
+ "content": "[unused22]",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "32022": {
220
+ "content": "[unused23]",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "32023": {
228
+ "content": "[unused24]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "32024": {
236
+ "content": "[unused25]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "32025": {
244
+ "content": "[unused26]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "32026": {
252
+ "content": "[unused27]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "32027": {
260
+ "content": "[unused28]",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "32028": {
268
+ "content": "[unused29]",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "32029": {
276
+ "content": "[unused30]",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "32030": {
284
+ "content": "[unused31]",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "32031": {
292
+ "content": "[unused32]",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "32032": {
300
+ "content": "[unused33]",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "32033": {
308
+ "content": "[unused34]",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "32034": {
316
+ "content": "[unused35]",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "32035": {
324
+ "content": "[unused36]",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "32036": {
332
+ "content": "[unused37]",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "32037": {
340
+ "content": "[unused38]",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "32038": {
348
+ "content": "[unused39]",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "32039": {
356
+ "content": "[unused40]",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "32040": {
364
+ "content": "[unused41]",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "32041": {
372
+ "content": "[unused42]",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "32042": {
380
+ "content": "[unused43]",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "32043": {
388
+ "content": "[unused44]",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "32044": {
396
+ "content": "[unused45]",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "32045": {
404
+ "content": "[unused46]",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "32046": {
412
+ "content": "[unused47]",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "32047": {
420
+ "content": "[unused48]",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "32048": {
428
+ "content": "[unused49]",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "32049": {
436
+ "content": "[unused50]",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "32050": {
444
+ "content": "[unused51]",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "32051": {
452
+ "content": "[unused52]",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "32052": {
460
+ "content": "[unused53]",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "32053": {
468
+ "content": "[unused54]",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "32054": {
476
+ "content": "[unused55]",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "32055": {
484
+ "content": "[unused56]",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "32056": {
492
+ "content": "[unused57]",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "32057": {
500
+ "content": "[unused58]",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "32058": {
508
+ "content": "[unused59]",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "32059": {
516
+ "content": "[unused60]",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "32060": {
524
+ "content": "[unused61]",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "32061": {
532
+ "content": "[unused62]",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "32062": {
540
+ "content": "[unused63]",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "32063": {
548
+ "content": "[unused64]",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ }
555
+ },
556
+ "clean_up_tokenization_spaces": true,
557
+ "cls_token": "[CLS]",
558
+ "extra_special_tokens": {},
559
+ "mask_token": "[MASK]",
560
+ "model_input_names": [
561
+ "input_ids",
562
+ "attention_mask"
563
+ ],
564
+ "model_max_length": 512,
565
+ "pad_token": "[PAD]",
566
+ "sep_token": "[SEP]",
567
+ "tokenizer_class": "PreTrainedTokenizerFast",
568
+ "unk_token": "[UNK]"
569
+ }
checkpoints/checkpoint-23100/trainer_state.json ADDED
@@ -0,0 +1,3316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 11550,
3
+ "best_metric": 5.6200983687622905,
4
+ "best_model_checkpoint": "outputs/bert-base-stage2-sbert/checkpoints/checkpoint-11550",
5
+ "epoch": 3.0,
6
+ "eval_steps": 3850,
7
+ "global_step": 23100,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.006493506493506494,
14
+ "grad_norm": 52.61107635498047,
15
+ "learning_rate": 3.18140501233606e-07,
16
+ "loss": 15.9471,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.012987012987012988,
21
+ "grad_norm": 33.66294479370117,
22
+ "learning_rate": 6.427736657576938e-07,
23
+ "loss": 15.1591,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.01948051948051948,
28
+ "grad_norm": 38.61445236206055,
29
+ "learning_rate": 9.674068302817817e-07,
30
+ "loss": 14.0548,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.025974025974025976,
35
+ "grad_norm": 29.082921981811523,
36
+ "learning_rate": 1.2920399948058694e-06,
37
+ "loss": 12.6234,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.032467532467532464,
42
+ "grad_norm": 31.957590103149414,
43
+ "learning_rate": 1.6166731593299573e-06,
44
+ "loss": 11.0546,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 0.03896103896103896,
49
+ "grad_norm": 48.1808967590332,
50
+ "learning_rate": 1.941306323854045e-06,
51
+ "loss": 9.1498,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 0.045454545454545456,
56
+ "grad_norm": 29.543249130249023,
57
+ "learning_rate": 2.265939488378133e-06,
58
+ "loss": 7.5429,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 0.05194805194805195,
63
+ "grad_norm": 25.65778160095215,
64
+ "learning_rate": 2.5905726529022207e-06,
65
+ "loss": 6.4324,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 0.05844155844155844,
70
+ "grad_norm": 26.71251106262207,
71
+ "learning_rate": 2.915205817426308e-06,
72
+ "loss": 5.9025,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 0.06493506493506493,
77
+ "grad_norm": 24.228899002075195,
78
+ "learning_rate": 3.239838981950396e-06,
79
+ "loss": 5.4181,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 0.07142857142857142,
84
+ "grad_norm": 22.82256507873535,
85
+ "learning_rate": 3.564472146474484e-06,
86
+ "loss": 5.0231,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 0.07792207792207792,
91
+ "grad_norm": 22.72514533996582,
92
+ "learning_rate": 3.889105310998572e-06,
93
+ "loss": 4.7729,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 0.08441558441558442,
98
+ "grad_norm": 29.81377410888672,
99
+ "learning_rate": 4.21373847552266e-06,
100
+ "loss": 4.6554,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 0.09090909090909091,
105
+ "grad_norm": 29.481916427612305,
106
+ "learning_rate": 4.538371640046747e-06,
107
+ "loss": 4.4429,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 0.09740259740259741,
112
+ "grad_norm": 25.465042114257812,
113
+ "learning_rate": 4.863004804570835e-06,
114
+ "loss": 4.2589,
115
+ "step": 750
116
+ },
117
+ {
118
+ "epoch": 0.1038961038961039,
119
+ "grad_norm": 21.07049560546875,
120
+ "learning_rate": 5.187637969094923e-06,
121
+ "loss": 3.9485,
122
+ "step": 800
123
+ },
124
+ {
125
+ "epoch": 0.11038961038961038,
126
+ "grad_norm": 22.1170597076416,
127
+ "learning_rate": 5.512271133619011e-06,
128
+ "loss": 3.8368,
129
+ "step": 850
130
+ },
131
+ {
132
+ "epoch": 0.11688311688311688,
133
+ "grad_norm": 20.51886749267578,
134
+ "learning_rate": 5.836904298143099e-06,
135
+ "loss": 3.8319,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 0.12337662337662338,
140
+ "grad_norm": 15.242032051086426,
141
+ "learning_rate": 6.161537462667187e-06,
142
+ "loss": 3.6684,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 0.12987012987012986,
147
+ "grad_norm": 21.660362243652344,
148
+ "learning_rate": 6.486170627191274e-06,
149
+ "loss": 3.5635,
150
+ "step": 1000
151
+ },
152
+ {
153
+ "epoch": 0.13636363636363635,
154
+ "grad_norm": 18.671344757080078,
155
+ "learning_rate": 6.810803791715362e-06,
156
+ "loss": 3.518,
157
+ "step": 1050
158
+ },
159
+ {
160
+ "epoch": 0.14285714285714285,
161
+ "grad_norm": 18.749467849731445,
162
+ "learning_rate": 7.1354369562394496e-06,
163
+ "loss": 3.4108,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.14935064935064934,
168
+ "grad_norm": 17.582345962524414,
169
+ "learning_rate": 7.4600701207635375e-06,
170
+ "loss": 3.4189,
171
+ "step": 1150
172
+ },
173
+ {
174
+ "epoch": 0.15584415584415584,
175
+ "grad_norm": 19.34381675720215,
176
+ "learning_rate": 7.784703285287625e-06,
177
+ "loss": 3.3312,
178
+ "step": 1200
179
+ },
180
+ {
181
+ "epoch": 0.16233766233766234,
182
+ "grad_norm": 15.694629669189453,
183
+ "learning_rate": 8.109336449811714e-06,
184
+ "loss": 3.3368,
185
+ "step": 1250
186
+ },
187
+ {
188
+ "epoch": 0.16883116883116883,
189
+ "grad_norm": 15.755691528320312,
190
+ "learning_rate": 8.433969614335801e-06,
191
+ "loss": 3.1654,
192
+ "step": 1300
193
+ },
194
+ {
195
+ "epoch": 0.17532467532467533,
196
+ "grad_norm": 16.623849868774414,
197
+ "learning_rate": 8.758602778859888e-06,
198
+ "loss": 3.1133,
199
+ "step": 1350
200
+ },
201
+ {
202
+ "epoch": 0.18181818181818182,
203
+ "grad_norm": 19.204574584960938,
204
+ "learning_rate": 9.083235943383975e-06,
205
+ "loss": 2.9981,
206
+ "step": 1400
207
+ },
208
+ {
209
+ "epoch": 0.18831168831168832,
210
+ "grad_norm": 18.820144653320312,
211
+ "learning_rate": 9.407869107908064e-06,
212
+ "loss": 2.9279,
213
+ "step": 1450
214
+ },
215
+ {
216
+ "epoch": 0.19480519480519481,
217
+ "grad_norm": 29.148109436035156,
218
+ "learning_rate": 9.732502272432151e-06,
219
+ "loss": 3.0855,
220
+ "step": 1500
221
+ },
222
+ {
223
+ "epoch": 0.2012987012987013,
224
+ "grad_norm": 16.728069305419922,
225
+ "learning_rate": 1.005713543695624e-05,
226
+ "loss": 2.8648,
227
+ "step": 1550
228
+ },
229
+ {
230
+ "epoch": 0.2077922077922078,
231
+ "grad_norm": 20.17420768737793,
232
+ "learning_rate": 1.0381768601480329e-05,
233
+ "loss": 2.933,
234
+ "step": 1600
235
+ },
236
+ {
237
+ "epoch": 0.21428571428571427,
238
+ "grad_norm": 20.421876907348633,
239
+ "learning_rate": 1.0706401766004416e-05,
240
+ "loss": 2.9586,
241
+ "step": 1650
242
+ },
243
+ {
244
+ "epoch": 0.22077922077922077,
245
+ "grad_norm": 18.838062286376953,
246
+ "learning_rate": 1.1031034930528503e-05,
247
+ "loss": 2.8735,
248
+ "step": 1700
249
+ },
250
+ {
251
+ "epoch": 0.22727272727272727,
252
+ "grad_norm": 15.983686447143555,
253
+ "learning_rate": 1.135566809505259e-05,
254
+ "loss": 2.8262,
255
+ "step": 1750
256
+ },
257
+ {
258
+ "epoch": 0.23376623376623376,
259
+ "grad_norm": 16.44582748413086,
260
+ "learning_rate": 1.1680301259576679e-05,
261
+ "loss": 2.8138,
262
+ "step": 1800
263
+ },
264
+ {
265
+ "epoch": 0.24025974025974026,
266
+ "grad_norm": 14.9281587600708,
267
+ "learning_rate": 1.2004934424100766e-05,
268
+ "loss": 2.7759,
269
+ "step": 1850
270
+ },
271
+ {
272
+ "epoch": 0.24675324675324675,
273
+ "grad_norm": 16.259794235229492,
274
+ "learning_rate": 1.2329567588624855e-05,
275
+ "loss": 2.7806,
276
+ "step": 1900
277
+ },
278
+ {
279
+ "epoch": 0.2532467532467532,
280
+ "grad_norm": 14.834933280944824,
281
+ "learning_rate": 1.2654200753148942e-05,
282
+ "loss": 2.6917,
283
+ "step": 1950
284
+ },
285
+ {
286
+ "epoch": 0.2597402597402597,
287
+ "grad_norm": 11.276729583740234,
288
+ "learning_rate": 1.297883391767303e-05,
289
+ "loss": 2.781,
290
+ "step": 2000
291
+ },
292
+ {
293
+ "epoch": 0.2662337662337662,
294
+ "grad_norm": 16.30820083618164,
295
+ "learning_rate": 1.330346708219712e-05,
296
+ "loss": 2.6924,
297
+ "step": 2050
298
+ },
299
+ {
300
+ "epoch": 0.2727272727272727,
301
+ "grad_norm": 16.436046600341797,
302
+ "learning_rate": 1.3628100246721206e-05,
303
+ "loss": 2.6548,
304
+ "step": 2100
305
+ },
306
+ {
307
+ "epoch": 0.2792207792207792,
308
+ "grad_norm": 16.203256607055664,
309
+ "learning_rate": 1.3952733411245295e-05,
310
+ "loss": 2.5892,
311
+ "step": 2150
312
+ },
313
+ {
314
+ "epoch": 0.2857142857142857,
315
+ "grad_norm": 34.66154479980469,
316
+ "learning_rate": 1.4277366575769382e-05,
317
+ "loss": 2.4805,
318
+ "step": 2200
319
+ },
320
+ {
321
+ "epoch": 0.2922077922077922,
322
+ "grad_norm": 59.331016540527344,
323
+ "learning_rate": 1.4601999740293468e-05,
324
+ "loss": 2.5601,
325
+ "step": 2250
326
+ },
327
+ {
328
+ "epoch": 0.2987012987012987,
329
+ "grad_norm": 14.719599723815918,
330
+ "learning_rate": 1.4926632904817556e-05,
331
+ "loss": 2.4803,
332
+ "step": 2300
333
+ },
334
+ {
335
+ "epoch": 0.3051948051948052,
336
+ "grad_norm": 17.608966827392578,
337
+ "learning_rate": 1.5251266069341644e-05,
338
+ "loss": 2.4615,
339
+ "step": 2350
340
+ },
341
+ {
342
+ "epoch": 0.3116883116883117,
343
+ "grad_norm": 20.017019271850586,
344
+ "learning_rate": 1.557589923386573e-05,
345
+ "loss": 2.497,
346
+ "step": 2400
347
+ },
348
+ {
349
+ "epoch": 0.3181818181818182,
350
+ "grad_norm": 24.029024124145508,
351
+ "learning_rate": 1.590053239838982e-05,
352
+ "loss": 2.5528,
353
+ "step": 2450
354
+ },
355
+ {
356
+ "epoch": 0.3246753246753247,
357
+ "grad_norm": 15.325136184692383,
358
+ "learning_rate": 1.6225165562913908e-05,
359
+ "loss": 2.3179,
360
+ "step": 2500
361
+ },
362
+ {
363
+ "epoch": 0.33116883116883117,
364
+ "grad_norm": 17.913986206054688,
365
+ "learning_rate": 1.6549798727437997e-05,
366
+ "loss": 2.4226,
367
+ "step": 2550
368
+ },
369
+ {
370
+ "epoch": 0.33766233766233766,
371
+ "grad_norm": 12.388660430908203,
372
+ "learning_rate": 1.6874431891962082e-05,
373
+ "loss": 2.3794,
374
+ "step": 2600
375
+ },
376
+ {
377
+ "epoch": 0.34415584415584416,
378
+ "grad_norm": 17.384532928466797,
379
+ "learning_rate": 1.719906505648617e-05,
380
+ "loss": 2.4812,
381
+ "step": 2650
382
+ },
383
+ {
384
+ "epoch": 0.35064935064935066,
385
+ "grad_norm": 12.747121810913086,
386
+ "learning_rate": 1.752369822101026e-05,
387
+ "loss": 2.3969,
388
+ "step": 2700
389
+ },
390
+ {
391
+ "epoch": 0.35714285714285715,
392
+ "grad_norm": 16.645206451416016,
393
+ "learning_rate": 1.784833138553435e-05,
394
+ "loss": 2.3512,
395
+ "step": 2750
396
+ },
397
+ {
398
+ "epoch": 0.36363636363636365,
399
+ "grad_norm": 20.9782772064209,
400
+ "learning_rate": 1.8172964550058434e-05,
401
+ "loss": 2.3615,
402
+ "step": 2800
403
+ },
404
+ {
405
+ "epoch": 0.37012987012987014,
406
+ "grad_norm": 14.216375350952148,
407
+ "learning_rate": 1.8497597714582523e-05,
408
+ "loss": 2.3175,
409
+ "step": 2850
410
+ },
411
+ {
412
+ "epoch": 0.37662337662337664,
413
+ "grad_norm": 10.926807403564453,
414
+ "learning_rate": 1.882223087910661e-05,
415
+ "loss": 2.3293,
416
+ "step": 2900
417
+ },
418
+ {
419
+ "epoch": 0.38311688311688313,
420
+ "grad_norm": 12.280684471130371,
421
+ "learning_rate": 1.91468640436307e-05,
422
+ "loss": 2.353,
423
+ "step": 2950
424
+ },
425
+ {
426
+ "epoch": 0.38961038961038963,
427
+ "grad_norm": 11.477788925170898,
428
+ "learning_rate": 1.9471497208154786e-05,
429
+ "loss": 2.2407,
430
+ "step": 3000
431
+ },
432
+ {
433
+ "epoch": 0.3961038961038961,
434
+ "grad_norm": 16.57078742980957,
435
+ "learning_rate": 1.979613037267887e-05,
436
+ "loss": 2.3554,
437
+ "step": 3050
438
+ },
439
+ {
440
+ "epoch": 0.4025974025974026,
441
+ "grad_norm": 15.02242374420166,
442
+ "learning_rate": 2.012076353720296e-05,
443
+ "loss": 2.2079,
444
+ "step": 3100
445
+ },
446
+ {
447
+ "epoch": 0.4090909090909091,
448
+ "grad_norm": 16.45734977722168,
449
+ "learning_rate": 2.044539670172705e-05,
450
+ "loss": 2.3148,
451
+ "step": 3150
452
+ },
453
+ {
454
+ "epoch": 0.4155844155844156,
455
+ "grad_norm": 15.056404113769531,
456
+ "learning_rate": 2.0770029866251138e-05,
457
+ "loss": 2.3595,
458
+ "step": 3200
459
+ },
460
+ {
461
+ "epoch": 0.42207792207792205,
462
+ "grad_norm": 15.48352336883545,
463
+ "learning_rate": 2.1094663030775226e-05,
464
+ "loss": 2.3108,
465
+ "step": 3250
466
+ },
467
+ {
468
+ "epoch": 0.42857142857142855,
469
+ "grad_norm": 19.282623291015625,
470
+ "learning_rate": 2.1419296195299312e-05,
471
+ "loss": 2.3236,
472
+ "step": 3300
473
+ },
474
+ {
475
+ "epoch": 0.43506493506493504,
476
+ "grad_norm": 12.568426132202148,
477
+ "learning_rate": 2.17439293598234e-05,
478
+ "loss": 2.1848,
479
+ "step": 3350
480
+ },
481
+ {
482
+ "epoch": 0.44155844155844154,
483
+ "grad_norm": 13.575774192810059,
484
+ "learning_rate": 2.206856252434749e-05,
485
+ "loss": 2.2364,
486
+ "step": 3400
487
+ },
488
+ {
489
+ "epoch": 0.44805194805194803,
490
+ "grad_norm": 13.187265396118164,
491
+ "learning_rate": 2.2393195688871578e-05,
492
+ "loss": 2.2219,
493
+ "step": 3450
494
+ },
495
+ {
496
+ "epoch": 0.45454545454545453,
497
+ "grad_norm": 11.324552536010742,
498
+ "learning_rate": 2.2717828853395663e-05,
499
+ "loss": 2.1346,
500
+ "step": 3500
501
+ },
502
+ {
503
+ "epoch": 0.461038961038961,
504
+ "grad_norm": 11.320844650268555,
505
+ "learning_rate": 2.3042462017919752e-05,
506
+ "loss": 2.2422,
507
+ "step": 3550
508
+ },
509
+ {
510
+ "epoch": 0.4675324675324675,
511
+ "grad_norm": 13.834160804748535,
512
+ "learning_rate": 2.336709518244384e-05,
513
+ "loss": 2.2226,
514
+ "step": 3600
515
+ },
516
+ {
517
+ "epoch": 0.474025974025974,
518
+ "grad_norm": 9.438053131103516,
519
+ "learning_rate": 2.369172834696793e-05,
520
+ "loss": 2.1422,
521
+ "step": 3650
522
+ },
523
+ {
524
+ "epoch": 0.4805194805194805,
525
+ "grad_norm": 11.202554702758789,
526
+ "learning_rate": 2.4016361511492015e-05,
527
+ "loss": 2.0838,
528
+ "step": 3700
529
+ },
530
+ {
531
+ "epoch": 0.487012987012987,
532
+ "grad_norm": 12.432455062866211,
533
+ "learning_rate": 2.43409946760161e-05,
534
+ "loss": 2.1076,
535
+ "step": 3750
536
+ },
537
+ {
538
+ "epoch": 0.4935064935064935,
539
+ "grad_norm": 10.37647819519043,
540
+ "learning_rate": 2.466562784054019e-05,
541
+ "loss": 2.1266,
542
+ "step": 3800
543
+ },
544
+ {
545
+ "epoch": 0.5,
546
+ "grad_norm": 11.41846752166748,
547
+ "learning_rate": 2.4990261005064278e-05,
548
+ "loss": 2.1255,
549
+ "step": 3850
550
+ },
551
+ {
552
+ "epoch": 0.5,
553
+ "eval_runtime": 183.2913,
554
+ "eval_samples_per_second": 0.0,
555
+ "eval_steps_per_second": 0.0,
556
+ "eval_validation_loss": 5.789649796218682,
557
+ "step": 3850
558
+ },
559
+ {
560
+ "epoch": 0.5064935064935064,
561
+ "grad_norm": 11.20055866241455,
562
+ "learning_rate": 2.5314894169588367e-05,
563
+ "loss": 2.1261,
564
+ "step": 3900
565
+ },
566
+ {
567
+ "epoch": 0.512987012987013,
568
+ "grad_norm": 9.65162467956543,
569
+ "learning_rate": 2.5639527334112456e-05,
570
+ "loss": 2.1352,
571
+ "step": 3950
572
+ },
573
+ {
574
+ "epoch": 0.5194805194805194,
575
+ "grad_norm": 13.608521461486816,
576
+ "learning_rate": 2.596416049863654e-05,
577
+ "loss": 2.0632,
578
+ "step": 4000
579
+ },
580
+ {
581
+ "epoch": 0.525974025974026,
582
+ "grad_norm": 12.579031944274902,
583
+ "learning_rate": 2.6288793663160633e-05,
584
+ "loss": 2.1564,
585
+ "step": 4050
586
+ },
587
+ {
588
+ "epoch": 0.5324675324675324,
589
+ "grad_norm": 7.6413798332214355,
590
+ "learning_rate": 2.661342682768472e-05,
591
+ "loss": 2.0435,
592
+ "step": 4100
593
+ },
594
+ {
595
+ "epoch": 0.538961038961039,
596
+ "grad_norm": 12.862696647644043,
597
+ "learning_rate": 2.6938059992208804e-05,
598
+ "loss": 2.0099,
599
+ "step": 4150
600
+ },
601
+ {
602
+ "epoch": 0.5454545454545454,
603
+ "grad_norm": 13.818205833435059,
604
+ "learning_rate": 2.7262693156732893e-05,
605
+ "loss": 2.0602,
606
+ "step": 4200
607
+ },
608
+ {
609
+ "epoch": 0.551948051948052,
610
+ "grad_norm": 9.98502254486084,
611
+ "learning_rate": 2.7587326321256978e-05,
612
+ "loss": 2.0414,
613
+ "step": 4250
614
+ },
615
+ {
616
+ "epoch": 0.5584415584415584,
617
+ "grad_norm": 19.94890785217285,
618
+ "learning_rate": 2.791195948578107e-05,
619
+ "loss": 2.0636,
620
+ "step": 4300
621
+ },
622
+ {
623
+ "epoch": 0.564935064935065,
624
+ "grad_norm": 13.229655265808105,
625
+ "learning_rate": 2.8236592650305156e-05,
626
+ "loss": 2.0575,
627
+ "step": 4350
628
+ },
629
+ {
630
+ "epoch": 0.5714285714285714,
631
+ "grad_norm": 10.28844928741455,
632
+ "learning_rate": 2.8561225814829244e-05,
633
+ "loss": 2.0827,
634
+ "step": 4400
635
+ },
636
+ {
637
+ "epoch": 0.577922077922078,
638
+ "grad_norm": 22.461875915527344,
639
+ "learning_rate": 2.888585897935333e-05,
640
+ "loss": 1.9545,
641
+ "step": 4450
642
+ },
643
+ {
644
+ "epoch": 0.5844155844155844,
645
+ "grad_norm": 12.420343399047852,
646
+ "learning_rate": 2.9210492143877422e-05,
647
+ "loss": 2.0735,
648
+ "step": 4500
649
+ },
650
+ {
651
+ "epoch": 0.5909090909090909,
652
+ "grad_norm": 9.457930564880371,
653
+ "learning_rate": 2.9535125308401507e-05,
654
+ "loss": 1.9039,
655
+ "step": 4550
656
+ },
657
+ {
658
+ "epoch": 0.5974025974025974,
659
+ "grad_norm": 9.656899452209473,
660
+ "learning_rate": 2.9859758472925596e-05,
661
+ "loss": 1.9605,
662
+ "step": 4600
663
+ },
664
+ {
665
+ "epoch": 0.6038961038961039,
666
+ "grad_norm": 12.893436431884766,
667
+ "learning_rate": 3.018439163744968e-05,
668
+ "loss": 2.0449,
669
+ "step": 4650
670
+ },
671
+ {
672
+ "epoch": 0.6103896103896104,
673
+ "grad_norm": 14.826976776123047,
674
+ "learning_rate": 3.0509024801973774e-05,
675
+ "loss": 1.9651,
676
+ "step": 4700
677
+ },
678
+ {
679
+ "epoch": 0.6168831168831169,
680
+ "grad_norm": 11.291337013244629,
681
+ "learning_rate": 3.0833657966497856e-05,
682
+ "loss": 1.9632,
683
+ "step": 4750
684
+ },
685
+ {
686
+ "epoch": 0.6233766233766234,
687
+ "grad_norm": 10.019285202026367,
688
+ "learning_rate": 3.115829113102195e-05,
689
+ "loss": 2.0401,
690
+ "step": 4800
691
+ },
692
+ {
693
+ "epoch": 0.6298701298701299,
694
+ "grad_norm": 12.127004623413086,
695
+ "learning_rate": 3.148292429554603e-05,
696
+ "loss": 2.0056,
697
+ "step": 4850
698
+ },
699
+ {
700
+ "epoch": 0.6363636363636364,
701
+ "grad_norm": 11.649646759033203,
702
+ "learning_rate": 3.180755746007012e-05,
703
+ "loss": 1.9399,
704
+ "step": 4900
705
+ },
706
+ {
707
+ "epoch": 0.6428571428571429,
708
+ "grad_norm": 20.142322540283203,
709
+ "learning_rate": 3.213219062459421e-05,
710
+ "loss": 1.8899,
711
+ "step": 4950
712
+ },
713
+ {
714
+ "epoch": 0.6493506493506493,
715
+ "grad_norm": 11.562334060668945,
716
+ "learning_rate": 3.2456823789118296e-05,
717
+ "loss": 2.0154,
718
+ "step": 5000
719
+ },
720
+ {
721
+ "epoch": 0.6558441558441559,
722
+ "grad_norm": 10.831686973571777,
723
+ "learning_rate": 3.278145695364239e-05,
724
+ "loss": 1.9857,
725
+ "step": 5050
726
+ },
727
+ {
728
+ "epoch": 0.6623376623376623,
729
+ "grad_norm": 11.502790451049805,
730
+ "learning_rate": 3.3106090118166474e-05,
731
+ "loss": 1.9239,
732
+ "step": 5100
733
+ },
734
+ {
735
+ "epoch": 0.6688311688311688,
736
+ "grad_norm": 10.263900756835938,
737
+ "learning_rate": 3.343072328269056e-05,
738
+ "loss": 1.9741,
739
+ "step": 5150
740
+ },
741
+ {
742
+ "epoch": 0.6753246753246753,
743
+ "grad_norm": 10.325187683105469,
744
+ "learning_rate": 3.3755356447214645e-05,
745
+ "loss": 1.9258,
746
+ "step": 5200
747
+ },
748
+ {
749
+ "epoch": 0.6818181818181818,
750
+ "grad_norm": 12.289694786071777,
751
+ "learning_rate": 3.407998961173874e-05,
752
+ "loss": 1.9545,
753
+ "step": 5250
754
+ },
755
+ {
756
+ "epoch": 0.6883116883116883,
757
+ "grad_norm": 10.321048736572266,
758
+ "learning_rate": 3.440462277626282e-05,
759
+ "loss": 1.9496,
760
+ "step": 5300
761
+ },
762
+ {
763
+ "epoch": 0.6948051948051948,
764
+ "grad_norm": 11.514996528625488,
765
+ "learning_rate": 3.4729255940786914e-05,
766
+ "loss": 1.9479,
767
+ "step": 5350
768
+ },
769
+ {
770
+ "epoch": 0.7012987012987013,
771
+ "grad_norm": 11.209217071533203,
772
+ "learning_rate": 3.5053889105311e-05,
773
+ "loss": 1.9068,
774
+ "step": 5400
775
+ },
776
+ {
777
+ "epoch": 0.7077922077922078,
778
+ "grad_norm": 11.679224014282227,
779
+ "learning_rate": 3.537852226983509e-05,
780
+ "loss": 1.847,
781
+ "step": 5450
782
+ },
783
+ {
784
+ "epoch": 0.7142857142857143,
785
+ "grad_norm": 14.561708450317383,
786
+ "learning_rate": 3.570315543435918e-05,
787
+ "loss": 1.8001,
788
+ "step": 5500
789
+ },
790
+ {
791
+ "epoch": 0.7207792207792207,
792
+ "grad_norm": 9.206267356872559,
793
+ "learning_rate": 3.602778859888327e-05,
794
+ "loss": 1.9119,
795
+ "step": 5550
796
+ },
797
+ {
798
+ "epoch": 0.7272727272727273,
799
+ "grad_norm": 10.509795188903809,
800
+ "learning_rate": 3.6352421763407355e-05,
801
+ "loss": 1.8347,
802
+ "step": 5600
803
+ },
804
+ {
805
+ "epoch": 0.7337662337662337,
806
+ "grad_norm": 10.149404525756836,
807
+ "learning_rate": 3.667705492793144e-05,
808
+ "loss": 1.944,
809
+ "step": 5650
810
+ },
811
+ {
812
+ "epoch": 0.7402597402597403,
813
+ "grad_norm": 10.415277481079102,
814
+ "learning_rate": 3.7001688092455526e-05,
815
+ "loss": 1.8523,
816
+ "step": 5700
817
+ },
818
+ {
819
+ "epoch": 0.7467532467532467,
820
+ "grad_norm": 11.224620819091797,
821
+ "learning_rate": 3.732632125697961e-05,
822
+ "loss": 1.7925,
823
+ "step": 5750
824
+ },
825
+ {
826
+ "epoch": 0.7532467532467533,
827
+ "grad_norm": 10.016857147216797,
828
+ "learning_rate": 3.76509544215037e-05,
829
+ "loss": 1.8803,
830
+ "step": 5800
831
+ },
832
+ {
833
+ "epoch": 0.7597402597402597,
834
+ "grad_norm": 10.387723922729492,
835
+ "learning_rate": 3.797558758602779e-05,
836
+ "loss": 1.8662,
837
+ "step": 5850
838
+ },
839
+ {
840
+ "epoch": 0.7662337662337663,
841
+ "grad_norm": 9.941938400268555,
842
+ "learning_rate": 3.830022075055188e-05,
843
+ "loss": 1.8841,
844
+ "step": 5900
845
+ },
846
+ {
847
+ "epoch": 0.7727272727272727,
848
+ "grad_norm": 9.691414833068848,
849
+ "learning_rate": 3.8624853915075966e-05,
850
+ "loss": 1.7907,
851
+ "step": 5950
852
+ },
853
+ {
854
+ "epoch": 0.7792207792207793,
855
+ "grad_norm": 7.804042816162109,
856
+ "learning_rate": 3.894948707960006e-05,
857
+ "loss": 1.7876,
858
+ "step": 6000
859
+ },
860
+ {
861
+ "epoch": 0.7857142857142857,
862
+ "grad_norm": 9.927968978881836,
863
+ "learning_rate": 3.9274120244124144e-05,
864
+ "loss": 1.8248,
865
+ "step": 6050
866
+ },
867
+ {
868
+ "epoch": 0.7922077922077922,
869
+ "grad_norm": 10.503766059875488,
870
+ "learning_rate": 3.959875340864823e-05,
871
+ "loss": 1.7473,
872
+ "step": 6100
873
+ },
874
+ {
875
+ "epoch": 0.7987012987012987,
876
+ "grad_norm": 8.839858055114746,
877
+ "learning_rate": 3.9923386573172314e-05,
878
+ "loss": 1.8196,
879
+ "step": 6150
880
+ },
881
+ {
882
+ "epoch": 0.8051948051948052,
883
+ "grad_norm": 9.771683692932129,
884
+ "learning_rate": 4.0248019737696407e-05,
885
+ "loss": 1.7241,
886
+ "step": 6200
887
+ },
888
+ {
889
+ "epoch": 0.8116883116883117,
890
+ "grad_norm": 9.278040885925293,
891
+ "learning_rate": 4.057265290222049e-05,
892
+ "loss": 1.8118,
893
+ "step": 6250
894
+ },
895
+ {
896
+ "epoch": 0.8181818181818182,
897
+ "grad_norm": 10.858592987060547,
898
+ "learning_rate": 4.089728606674458e-05,
899
+ "loss": 1.8037,
900
+ "step": 6300
901
+ },
902
+ {
903
+ "epoch": 0.8246753246753247,
904
+ "grad_norm": 10.266278266906738,
905
+ "learning_rate": 4.122191923126867e-05,
906
+ "loss": 1.7531,
907
+ "step": 6350
908
+ },
909
+ {
910
+ "epoch": 0.8311688311688312,
911
+ "grad_norm": 13.651795387268066,
912
+ "learning_rate": 4.1546552395792755e-05,
913
+ "loss": 1.8088,
914
+ "step": 6400
915
+ },
916
+ {
917
+ "epoch": 0.8376623376623377,
918
+ "grad_norm": 8.097331047058105,
919
+ "learning_rate": 4.187118556031685e-05,
920
+ "loss": 1.7525,
921
+ "step": 6450
922
+ },
923
+ {
924
+ "epoch": 0.8441558441558441,
925
+ "grad_norm": 9.611719131469727,
926
+ "learning_rate": 4.219581872484093e-05,
927
+ "loss": 1.7287,
928
+ "step": 6500
929
+ },
930
+ {
931
+ "epoch": 0.8506493506493507,
932
+ "grad_norm": 10.035614013671875,
933
+ "learning_rate": 4.252045188936502e-05,
934
+ "loss": 1.756,
935
+ "step": 6550
936
+ },
937
+ {
938
+ "epoch": 0.8571428571428571,
939
+ "grad_norm": 9.979487419128418,
940
+ "learning_rate": 4.28450850538891e-05,
941
+ "loss": 1.7888,
942
+ "step": 6600
943
+ },
944
+ {
945
+ "epoch": 0.8636363636363636,
946
+ "grad_norm": 11.118249893188477,
947
+ "learning_rate": 4.3169718218413195e-05,
948
+ "loss": 1.709,
949
+ "step": 6650
950
+ },
951
+ {
952
+ "epoch": 0.8701298701298701,
953
+ "grad_norm": 16.395870208740234,
954
+ "learning_rate": 4.349435138293728e-05,
955
+ "loss": 1.722,
956
+ "step": 6700
957
+ },
958
+ {
959
+ "epoch": 0.8766233766233766,
960
+ "grad_norm": 9.067273139953613,
961
+ "learning_rate": 4.381898454746137e-05,
962
+ "loss": 1.6935,
963
+ "step": 6750
964
+ },
965
+ {
966
+ "epoch": 0.8831168831168831,
967
+ "grad_norm": 9.74413013458252,
968
+ "learning_rate": 4.414361771198546e-05,
969
+ "loss": 1.7679,
970
+ "step": 6800
971
+ },
972
+ {
973
+ "epoch": 0.8896103896103896,
974
+ "grad_norm": 10.37792682647705,
975
+ "learning_rate": 4.446825087650955e-05,
976
+ "loss": 1.7438,
977
+ "step": 6850
978
+ },
979
+ {
980
+ "epoch": 0.8961038961038961,
981
+ "grad_norm": 10.507184028625488,
982
+ "learning_rate": 4.4792884041033636e-05,
983
+ "loss": 1.6815,
984
+ "step": 6900
985
+ },
986
+ {
987
+ "epoch": 0.9025974025974026,
988
+ "grad_norm": 9.988856315612793,
989
+ "learning_rate": 4.511751720555772e-05,
990
+ "loss": 1.7269,
991
+ "step": 6950
992
+ },
993
+ {
994
+ "epoch": 0.9090909090909091,
995
+ "grad_norm": 8.919309616088867,
996
+ "learning_rate": 4.544215037008181e-05,
997
+ "loss": 1.7285,
998
+ "step": 7000
999
+ },
1000
+ {
1001
+ "epoch": 0.9155844155844156,
1002
+ "grad_norm": 9.638202667236328,
1003
+ "learning_rate": 4.576678353460589e-05,
1004
+ "loss": 1.711,
1005
+ "step": 7050
1006
+ },
1007
+ {
1008
+ "epoch": 0.922077922077922,
1009
+ "grad_norm": 11.221553802490234,
1010
+ "learning_rate": 4.6091416699129984e-05,
1011
+ "loss": 1.6137,
1012
+ "step": 7100
1013
+ },
1014
+ {
1015
+ "epoch": 0.9285714285714286,
1016
+ "grad_norm": 9.269332885742188,
1017
+ "learning_rate": 4.641604986365407e-05,
1018
+ "loss": 1.6785,
1019
+ "step": 7150
1020
+ },
1021
+ {
1022
+ "epoch": 0.935064935064935,
1023
+ "grad_norm": 9.344364166259766,
1024
+ "learning_rate": 4.674068302817816e-05,
1025
+ "loss": 1.7142,
1026
+ "step": 7200
1027
+ },
1028
+ {
1029
+ "epoch": 0.9415584415584416,
1030
+ "grad_norm": 9.738685607910156,
1031
+ "learning_rate": 4.706531619270225e-05,
1032
+ "loss": 1.7585,
1033
+ "step": 7250
1034
+ },
1035
+ {
1036
+ "epoch": 0.948051948051948,
1037
+ "grad_norm": 8.694171905517578,
1038
+ "learning_rate": 4.738994935722634e-05,
1039
+ "loss": 1.7268,
1040
+ "step": 7300
1041
+ },
1042
+ {
1043
+ "epoch": 0.9545454545454546,
1044
+ "grad_norm": 12.72326374053955,
1045
+ "learning_rate": 4.7714582521750425e-05,
1046
+ "loss": 1.6739,
1047
+ "step": 7350
1048
+ },
1049
+ {
1050
+ "epoch": 0.961038961038961,
1051
+ "grad_norm": 8.399530410766602,
1052
+ "learning_rate": 4.803921568627452e-05,
1053
+ "loss": 1.6462,
1054
+ "step": 7400
1055
+ },
1056
+ {
1057
+ "epoch": 0.9675324675324676,
1058
+ "grad_norm": 11.030467987060547,
1059
+ "learning_rate": 4.8363848850798596e-05,
1060
+ "loss": 1.7591,
1061
+ "step": 7450
1062
+ },
1063
+ {
1064
+ "epoch": 0.974025974025974,
1065
+ "grad_norm": 8.93454647064209,
1066
+ "learning_rate": 4.868848201532269e-05,
1067
+ "loss": 1.6058,
1068
+ "step": 7500
1069
+ },
1070
+ {
1071
+ "epoch": 0.9805194805194806,
1072
+ "grad_norm": 9.081995964050293,
1073
+ "learning_rate": 4.901311517984677e-05,
1074
+ "loss": 1.6984,
1075
+ "step": 7550
1076
+ },
1077
+ {
1078
+ "epoch": 0.987012987012987,
1079
+ "grad_norm": 9.361384391784668,
1080
+ "learning_rate": 4.9337748344370865e-05,
1081
+ "loss": 1.6609,
1082
+ "step": 7600
1083
+ },
1084
+ {
1085
+ "epoch": 0.9935064935064936,
1086
+ "grad_norm": 9.641518592834473,
1087
+ "learning_rate": 4.966238150889495e-05,
1088
+ "loss": 1.6488,
1089
+ "step": 7650
1090
+ },
1091
+ {
1092
+ "epoch": 1.0,
1093
+ "grad_norm": 9.288655281066895,
1094
+ "learning_rate": 4.998701467341904e-05,
1095
+ "loss": 1.7626,
1096
+ "step": 7700
1097
+ },
1098
+ {
1099
+ "epoch": 1.0,
1100
+ "eval_runtime": 181.9129,
1101
+ "eval_samples_per_second": 0.0,
1102
+ "eval_steps_per_second": 0.0,
1103
+ "eval_validation_loss": 5.630685541727772,
1104
+ "step": 7700
1105
+ },
1106
+ {
1107
+ "epoch": 1.0064935064935066,
1108
+ "grad_norm": 11.218660354614258,
1109
+ "learning_rate": 4.9922075392058184e-05,
1110
+ "loss": 1.6603,
1111
+ "step": 7750
1112
+ },
1113
+ {
1114
+ "epoch": 1.0129870129870129,
1115
+ "grad_norm": 9.883772850036621,
1116
+ "learning_rate": 4.9840903925452126e-05,
1117
+ "loss": 1.6979,
1118
+ "step": 7800
1119
+ },
1120
+ {
1121
+ "epoch": 1.0194805194805194,
1122
+ "grad_norm": 10.548580169677734,
1123
+ "learning_rate": 4.975973245884607e-05,
1124
+ "loss": 1.6138,
1125
+ "step": 7850
1126
+ },
1127
+ {
1128
+ "epoch": 1.025974025974026,
1129
+ "grad_norm": 9.996445655822754,
1130
+ "learning_rate": 4.967856099224001e-05,
1131
+ "loss": 1.7008,
1132
+ "step": 7900
1133
+ },
1134
+ {
1135
+ "epoch": 1.0324675324675325,
1136
+ "grad_norm": 9.410829544067383,
1137
+ "learning_rate": 4.959738952563395e-05,
1138
+ "loss": 1.6281,
1139
+ "step": 7950
1140
+ },
1141
+ {
1142
+ "epoch": 1.0389610389610389,
1143
+ "grad_norm": 8.215593338012695,
1144
+ "learning_rate": 4.951621805902789e-05,
1145
+ "loss": 1.6029,
1146
+ "step": 8000
1147
+ },
1148
+ {
1149
+ "epoch": 1.0454545454545454,
1150
+ "grad_norm": 8.68897819519043,
1151
+ "learning_rate": 4.943504659242183e-05,
1152
+ "loss": 1.6984,
1153
+ "step": 8050
1154
+ },
1155
+ {
1156
+ "epoch": 1.051948051948052,
1157
+ "grad_norm": 10.024593353271484,
1158
+ "learning_rate": 4.9353875125815774e-05,
1159
+ "loss": 1.5865,
1160
+ "step": 8100
1161
+ },
1162
+ {
1163
+ "epoch": 1.0584415584415585,
1164
+ "grad_norm": 7.77084493637085,
1165
+ "learning_rate": 4.9272703659209715e-05,
1166
+ "loss": 1.6686,
1167
+ "step": 8150
1168
+ },
1169
+ {
1170
+ "epoch": 1.0649350649350648,
1171
+ "grad_norm": 9.608016967773438,
1172
+ "learning_rate": 4.9191532192603656e-05,
1173
+ "loss": 1.5427,
1174
+ "step": 8200
1175
+ },
1176
+ {
1177
+ "epoch": 1.0714285714285714,
1178
+ "grad_norm": 8.278243064880371,
1179
+ "learning_rate": 4.91103607259976e-05,
1180
+ "loss": 1.5441,
1181
+ "step": 8250
1182
+ },
1183
+ {
1184
+ "epoch": 1.077922077922078,
1185
+ "grad_norm": 7.700085639953613,
1186
+ "learning_rate": 4.902918925939154e-05,
1187
+ "loss": 1.5524,
1188
+ "step": 8300
1189
+ },
1190
+ {
1191
+ "epoch": 1.0844155844155845,
1192
+ "grad_norm": 8.38094425201416,
1193
+ "learning_rate": 4.894801779278548e-05,
1194
+ "loss": 1.5644,
1195
+ "step": 8350
1196
+ },
1197
+ {
1198
+ "epoch": 1.0909090909090908,
1199
+ "grad_norm": 8.41283130645752,
1200
+ "learning_rate": 4.886684632617942e-05,
1201
+ "loss": 1.5964,
1202
+ "step": 8400
1203
+ },
1204
+ {
1205
+ "epoch": 1.0974025974025974,
1206
+ "grad_norm": 9.08784008026123,
1207
+ "learning_rate": 4.878567485957336e-05,
1208
+ "loss": 1.5653,
1209
+ "step": 8450
1210
+ },
1211
+ {
1212
+ "epoch": 1.103896103896104,
1213
+ "grad_norm": 8.475194931030273,
1214
+ "learning_rate": 4.8704503392967305e-05,
1215
+ "loss": 1.4443,
1216
+ "step": 8500
1217
+ },
1218
+ {
1219
+ "epoch": 1.1103896103896105,
1220
+ "grad_norm": 8.450396537780762,
1221
+ "learning_rate": 4.8623331926361246e-05,
1222
+ "loss": 1.5609,
1223
+ "step": 8550
1224
+ },
1225
+ {
1226
+ "epoch": 1.1168831168831168,
1227
+ "grad_norm": 8.839466094970703,
1228
+ "learning_rate": 4.854216045975519e-05,
1229
+ "loss": 1.5097,
1230
+ "step": 8600
1231
+ },
1232
+ {
1233
+ "epoch": 1.1233766233766234,
1234
+ "grad_norm": 7.753498554229736,
1235
+ "learning_rate": 4.846098899314913e-05,
1236
+ "loss": 1.5169,
1237
+ "step": 8650
1238
+ },
1239
+ {
1240
+ "epoch": 1.12987012987013,
1241
+ "grad_norm": 8.667826652526855,
1242
+ "learning_rate": 4.837981752654307e-05,
1243
+ "loss": 1.4992,
1244
+ "step": 8700
1245
+ },
1246
+ {
1247
+ "epoch": 1.1363636363636362,
1248
+ "grad_norm": 8.778205871582031,
1249
+ "learning_rate": 4.829864605993701e-05,
1250
+ "loss": 1.5337,
1251
+ "step": 8750
1252
+ },
1253
+ {
1254
+ "epoch": 1.1428571428571428,
1255
+ "grad_norm": 10.098799705505371,
1256
+ "learning_rate": 4.821747459333095e-05,
1257
+ "loss": 1.4971,
1258
+ "step": 8800
1259
+ },
1260
+ {
1261
+ "epoch": 1.1493506493506493,
1262
+ "grad_norm": 9.63693618774414,
1263
+ "learning_rate": 4.8136303126724894e-05,
1264
+ "loss": 1.5218,
1265
+ "step": 8850
1266
+ },
1267
+ {
1268
+ "epoch": 1.155844155844156,
1269
+ "grad_norm": 9.274500846862793,
1270
+ "learning_rate": 4.8055131660118835e-05,
1271
+ "loss": 1.526,
1272
+ "step": 8900
1273
+ },
1274
+ {
1275
+ "epoch": 1.1623376623376624,
1276
+ "grad_norm": 10.055129051208496,
1277
+ "learning_rate": 4.797396019351278e-05,
1278
+ "loss": 1.5531,
1279
+ "step": 8950
1280
+ },
1281
+ {
1282
+ "epoch": 1.1688311688311688,
1283
+ "grad_norm": 6.797532081604004,
1284
+ "learning_rate": 4.789278872690672e-05,
1285
+ "loss": 1.5353,
1286
+ "step": 9000
1287
+ },
1288
+ {
1289
+ "epoch": 1.1753246753246753,
1290
+ "grad_norm": 8.536033630371094,
1291
+ "learning_rate": 4.781161726030066e-05,
1292
+ "loss": 1.4596,
1293
+ "step": 9050
1294
+ },
1295
+ {
1296
+ "epoch": 1.1818181818181819,
1297
+ "grad_norm": 9.294974327087402,
1298
+ "learning_rate": 4.77304457936946e-05,
1299
+ "loss": 1.434,
1300
+ "step": 9100
1301
+ },
1302
+ {
1303
+ "epoch": 1.1883116883116882,
1304
+ "grad_norm": 9.374375343322754,
1305
+ "learning_rate": 4.764927432708854e-05,
1306
+ "loss": 1.4644,
1307
+ "step": 9150
1308
+ },
1309
+ {
1310
+ "epoch": 1.1948051948051948,
1311
+ "grad_norm": 7.527572154998779,
1312
+ "learning_rate": 4.7568102860482484e-05,
1313
+ "loss": 1.5245,
1314
+ "step": 9200
1315
+ },
1316
+ {
1317
+ "epoch": 1.2012987012987013,
1318
+ "grad_norm": 9.537101745605469,
1319
+ "learning_rate": 4.7486931393876425e-05,
1320
+ "loss": 1.3869,
1321
+ "step": 9250
1322
+ },
1323
+ {
1324
+ "epoch": 1.2077922077922079,
1325
+ "grad_norm": 8.587594985961914,
1326
+ "learning_rate": 4.7405759927270366e-05,
1327
+ "loss": 1.4361,
1328
+ "step": 9300
1329
+ },
1330
+ {
1331
+ "epoch": 1.2142857142857142,
1332
+ "grad_norm": 8.717963218688965,
1333
+ "learning_rate": 4.732458846066431e-05,
1334
+ "loss": 1.4296,
1335
+ "step": 9350
1336
+ },
1337
+ {
1338
+ "epoch": 1.2207792207792207,
1339
+ "grad_norm": 9.769675254821777,
1340
+ "learning_rate": 4.724341699405825e-05,
1341
+ "loss": 1.4338,
1342
+ "step": 9400
1343
+ },
1344
+ {
1345
+ "epoch": 1.2272727272727273,
1346
+ "grad_norm": 7.474400997161865,
1347
+ "learning_rate": 4.716224552745219e-05,
1348
+ "loss": 1.383,
1349
+ "step": 9450
1350
+ },
1351
+ {
1352
+ "epoch": 1.2337662337662338,
1353
+ "grad_norm": 8.226397514343262,
1354
+ "learning_rate": 4.708107406084613e-05,
1355
+ "loss": 1.4061,
1356
+ "step": 9500
1357
+ },
1358
+ {
1359
+ "epoch": 1.2402597402597402,
1360
+ "grad_norm": 8.012401580810547,
1361
+ "learning_rate": 4.699990259424007e-05,
1362
+ "loss": 1.3497,
1363
+ "step": 9550
1364
+ },
1365
+ {
1366
+ "epoch": 1.2467532467532467,
1367
+ "grad_norm": 8.172561645507812,
1368
+ "learning_rate": 4.6918731127634015e-05,
1369
+ "loss": 1.4223,
1370
+ "step": 9600
1371
+ },
1372
+ {
1373
+ "epoch": 1.2532467532467533,
1374
+ "grad_norm": 9.439875602722168,
1375
+ "learning_rate": 4.6837559661027956e-05,
1376
+ "loss": 1.3842,
1377
+ "step": 9650
1378
+ },
1379
+ {
1380
+ "epoch": 1.2597402597402598,
1381
+ "grad_norm": 7.020490646362305,
1382
+ "learning_rate": 4.6756388194421904e-05,
1383
+ "loss": 1.3321,
1384
+ "step": 9700
1385
+ },
1386
+ {
1387
+ "epoch": 1.2662337662337662,
1388
+ "grad_norm": 7.947005271911621,
1389
+ "learning_rate": 4.667521672781584e-05,
1390
+ "loss": 1.3396,
1391
+ "step": 9750
1392
+ },
1393
+ {
1394
+ "epoch": 1.2727272727272727,
1395
+ "grad_norm": 9.912154197692871,
1396
+ "learning_rate": 4.659404526120979e-05,
1397
+ "loss": 1.3623,
1398
+ "step": 9800
1399
+ },
1400
+ {
1401
+ "epoch": 1.2792207792207793,
1402
+ "grad_norm": 9.735318183898926,
1403
+ "learning_rate": 4.651287379460372e-05,
1404
+ "loss": 1.2811,
1405
+ "step": 9850
1406
+ },
1407
+ {
1408
+ "epoch": 1.2857142857142856,
1409
+ "grad_norm": 8.971993446350098,
1410
+ "learning_rate": 4.643170232799767e-05,
1411
+ "loss": 1.2448,
1412
+ "step": 9900
1413
+ },
1414
+ {
1415
+ "epoch": 1.2922077922077921,
1416
+ "grad_norm": 8.140421867370605,
1417
+ "learning_rate": 4.6350530861391604e-05,
1418
+ "loss": 1.297,
1419
+ "step": 9950
1420
+ },
1421
+ {
1422
+ "epoch": 1.2987012987012987,
1423
+ "grad_norm": 9.203341484069824,
1424
+ "learning_rate": 4.626935939478555e-05,
1425
+ "loss": 1.2812,
1426
+ "step": 10000
1427
+ },
1428
+ {
1429
+ "epoch": 1.3051948051948052,
1430
+ "grad_norm": 7.791167736053467,
1431
+ "learning_rate": 4.618818792817949e-05,
1432
+ "loss": 1.2745,
1433
+ "step": 10050
1434
+ },
1435
+ {
1436
+ "epoch": 1.3116883116883118,
1437
+ "grad_norm": 7.96024227142334,
1438
+ "learning_rate": 4.6107016461573435e-05,
1439
+ "loss": 1.3114,
1440
+ "step": 10100
1441
+ },
1442
+ {
1443
+ "epoch": 1.3181818181818181,
1444
+ "grad_norm": 9.202474594116211,
1445
+ "learning_rate": 4.602584499496737e-05,
1446
+ "loss": 1.3039,
1447
+ "step": 10150
1448
+ },
1449
+ {
1450
+ "epoch": 1.3246753246753247,
1451
+ "grad_norm": 8.700586318969727,
1452
+ "learning_rate": 4.594467352836132e-05,
1453
+ "loss": 1.2243,
1454
+ "step": 10200
1455
+ },
1456
+ {
1457
+ "epoch": 1.3311688311688312,
1458
+ "grad_norm": 10.369848251342773,
1459
+ "learning_rate": 4.586350206175525e-05,
1460
+ "loss": 1.2523,
1461
+ "step": 10250
1462
+ },
1463
+ {
1464
+ "epoch": 1.3376623376623376,
1465
+ "grad_norm": 8.433963775634766,
1466
+ "learning_rate": 4.57823305951492e-05,
1467
+ "loss": 1.2213,
1468
+ "step": 10300
1469
+ },
1470
+ {
1471
+ "epoch": 1.344155844155844,
1472
+ "grad_norm": 9.223182678222656,
1473
+ "learning_rate": 4.5701159128543135e-05,
1474
+ "loss": 1.3041,
1475
+ "step": 10350
1476
+ },
1477
+ {
1478
+ "epoch": 1.3506493506493507,
1479
+ "grad_norm": 7.952178955078125,
1480
+ "learning_rate": 4.561998766193708e-05,
1481
+ "loss": 1.2164,
1482
+ "step": 10400
1483
+ },
1484
+ {
1485
+ "epoch": 1.3571428571428572,
1486
+ "grad_norm": 7.513680934906006,
1487
+ "learning_rate": 4.553881619533102e-05,
1488
+ "loss": 1.2241,
1489
+ "step": 10450
1490
+ },
1491
+ {
1492
+ "epoch": 1.3636363636363638,
1493
+ "grad_norm": 9.510335922241211,
1494
+ "learning_rate": 4.5457644728724966e-05,
1495
+ "loss": 1.211,
1496
+ "step": 10500
1497
+ },
1498
+ {
1499
+ "epoch": 1.37012987012987,
1500
+ "grad_norm": 7.567286491394043,
1501
+ "learning_rate": 4.53764732621189e-05,
1502
+ "loss": 1.1839,
1503
+ "step": 10550
1504
+ },
1505
+ {
1506
+ "epoch": 1.3766233766233766,
1507
+ "grad_norm": 8.19821548461914,
1508
+ "learning_rate": 4.529530179551285e-05,
1509
+ "loss": 1.1748,
1510
+ "step": 10600
1511
+ },
1512
+ {
1513
+ "epoch": 1.3831168831168832,
1514
+ "grad_norm": 9.398670196533203,
1515
+ "learning_rate": 4.521413032890678e-05,
1516
+ "loss": 1.1886,
1517
+ "step": 10650
1518
+ },
1519
+ {
1520
+ "epoch": 1.3896103896103895,
1521
+ "grad_norm": 7.934854984283447,
1522
+ "learning_rate": 4.513295886230073e-05,
1523
+ "loss": 1.1752,
1524
+ "step": 10700
1525
+ },
1526
+ {
1527
+ "epoch": 1.396103896103896,
1528
+ "grad_norm": 8.2477445602417,
1529
+ "learning_rate": 4.5051787395694666e-05,
1530
+ "loss": 1.1968,
1531
+ "step": 10750
1532
+ },
1533
+ {
1534
+ "epoch": 1.4025974025974026,
1535
+ "grad_norm": 8.72326374053955,
1536
+ "learning_rate": 4.4970615929088614e-05,
1537
+ "loss": 1.1345,
1538
+ "step": 10800
1539
+ },
1540
+ {
1541
+ "epoch": 1.4090909090909092,
1542
+ "grad_norm": 8.260149002075195,
1543
+ "learning_rate": 4.488944446248255e-05,
1544
+ "loss": 1.1882,
1545
+ "step": 10850
1546
+ },
1547
+ {
1548
+ "epoch": 1.4155844155844157,
1549
+ "grad_norm": 11.032275199890137,
1550
+ "learning_rate": 4.48082729958765e-05,
1551
+ "loss": 1.2257,
1552
+ "step": 10900
1553
+ },
1554
+ {
1555
+ "epoch": 1.422077922077922,
1556
+ "grad_norm": 7.984837055206299,
1557
+ "learning_rate": 4.472710152927043e-05,
1558
+ "loss": 1.2079,
1559
+ "step": 10950
1560
+ },
1561
+ {
1562
+ "epoch": 1.4285714285714286,
1563
+ "grad_norm": 8.030401229858398,
1564
+ "learning_rate": 4.464593006266438e-05,
1565
+ "loss": 1.1455,
1566
+ "step": 11000
1567
+ },
1568
+ {
1569
+ "epoch": 1.435064935064935,
1570
+ "grad_norm": 6.707542896270752,
1571
+ "learning_rate": 4.4564758596058314e-05,
1572
+ "loss": 1.0684,
1573
+ "step": 11050
1574
+ },
1575
+ {
1576
+ "epoch": 1.4415584415584415,
1577
+ "grad_norm": 8.220330238342285,
1578
+ "learning_rate": 4.448358712945226e-05,
1579
+ "loss": 1.0872,
1580
+ "step": 11100
1581
+ },
1582
+ {
1583
+ "epoch": 1.448051948051948,
1584
+ "grad_norm": 8.067037582397461,
1585
+ "learning_rate": 4.44024156628462e-05,
1586
+ "loss": 1.0888,
1587
+ "step": 11150
1588
+ },
1589
+ {
1590
+ "epoch": 1.4545454545454546,
1591
+ "grad_norm": 7.191831588745117,
1592
+ "learning_rate": 4.4321244196240145e-05,
1593
+ "loss": 1.0119,
1594
+ "step": 11200
1595
+ },
1596
+ {
1597
+ "epoch": 1.4610389610389611,
1598
+ "grad_norm": 8.375226020812988,
1599
+ "learning_rate": 4.424007272963408e-05,
1600
+ "loss": 1.0735,
1601
+ "step": 11250
1602
+ },
1603
+ {
1604
+ "epoch": 1.4675324675324675,
1605
+ "grad_norm": 7.966062068939209,
1606
+ "learning_rate": 4.415890126302802e-05,
1607
+ "loss": 1.0829,
1608
+ "step": 11300
1609
+ },
1610
+ {
1611
+ "epoch": 1.474025974025974,
1612
+ "grad_norm": 7.540716171264648,
1613
+ "learning_rate": 4.407772979642196e-05,
1614
+ "loss": 1.0539,
1615
+ "step": 11350
1616
+ },
1617
+ {
1618
+ "epoch": 1.4805194805194806,
1619
+ "grad_norm": 6.845109462738037,
1620
+ "learning_rate": 4.3996558329815903e-05,
1621
+ "loss": 1.0158,
1622
+ "step": 11400
1623
+ },
1624
+ {
1625
+ "epoch": 1.487012987012987,
1626
+ "grad_norm": 8.59949779510498,
1627
+ "learning_rate": 4.3915386863209845e-05,
1628
+ "loss": 1.0087,
1629
+ "step": 11450
1630
+ },
1631
+ {
1632
+ "epoch": 1.4935064935064934,
1633
+ "grad_norm": 7.456052303314209,
1634
+ "learning_rate": 4.3834215396603786e-05,
1635
+ "loss": 1.0371,
1636
+ "step": 11500
1637
+ },
1638
+ {
1639
+ "epoch": 1.5,
1640
+ "grad_norm": 9.537942886352539,
1641
+ "learning_rate": 4.375304392999773e-05,
1642
+ "loss": 1.0171,
1643
+ "step": 11550
1644
+ },
1645
+ {
1646
+ "epoch": 1.5,
1647
+ "eval_runtime": 182.3239,
1648
+ "eval_samples_per_second": 0.0,
1649
+ "eval_steps_per_second": 0.0,
1650
+ "eval_validation_loss": 5.6200983687622905,
1651
+ "step": 11550
1652
+ },
1653
+ {
1654
+ "epoch": 1.5064935064935066,
1655
+ "grad_norm": 8.21316909790039,
1656
+ "learning_rate": 4.367187246339167e-05,
1657
+ "loss": 1.0076,
1658
+ "step": 11600
1659
+ },
1660
+ {
1661
+ "epoch": 1.512987012987013,
1662
+ "grad_norm": 7.417142868041992,
1663
+ "learning_rate": 4.359070099678561e-05,
1664
+ "loss": 1.0063,
1665
+ "step": 11650
1666
+ },
1667
+ {
1668
+ "epoch": 1.5194805194805194,
1669
+ "grad_norm": 9.416158676147461,
1670
+ "learning_rate": 4.350952953017955e-05,
1671
+ "loss": 1.0176,
1672
+ "step": 11700
1673
+ },
1674
+ {
1675
+ "epoch": 1.525974025974026,
1676
+ "grad_norm": 9.061604499816895,
1677
+ "learning_rate": 4.342835806357349e-05,
1678
+ "loss": 1.0075,
1679
+ "step": 11750
1680
+ },
1681
+ {
1682
+ "epoch": 1.5324675324675323,
1683
+ "grad_norm": 6.301494121551514,
1684
+ "learning_rate": 4.3347186596967434e-05,
1685
+ "loss": 0.9626,
1686
+ "step": 11800
1687
+ },
1688
+ {
1689
+ "epoch": 1.5389610389610389,
1690
+ "grad_norm": 9.032344818115234,
1691
+ "learning_rate": 4.3266015130361376e-05,
1692
+ "loss": 0.92,
1693
+ "step": 11850
1694
+ },
1695
+ {
1696
+ "epoch": 1.5454545454545454,
1697
+ "grad_norm": 8.425912857055664,
1698
+ "learning_rate": 4.318484366375532e-05,
1699
+ "loss": 0.9672,
1700
+ "step": 11900
1701
+ },
1702
+ {
1703
+ "epoch": 1.551948051948052,
1704
+ "grad_norm": 7.536787033081055,
1705
+ "learning_rate": 4.310367219714926e-05,
1706
+ "loss": 0.9586,
1707
+ "step": 11950
1708
+ },
1709
+ {
1710
+ "epoch": 1.5584415584415585,
1711
+ "grad_norm": 10.996522903442383,
1712
+ "learning_rate": 4.30225007305432e-05,
1713
+ "loss": 0.9745,
1714
+ "step": 12000
1715
+ },
1716
+ {
1717
+ "epoch": 1.564935064935065,
1718
+ "grad_norm": 8.759782791137695,
1719
+ "learning_rate": 4.294132926393714e-05,
1720
+ "loss": 0.9643,
1721
+ "step": 12050
1722
+ },
1723
+ {
1724
+ "epoch": 1.5714285714285714,
1725
+ "grad_norm": 7.13316011428833,
1726
+ "learning_rate": 4.286015779733108e-05,
1727
+ "loss": 0.9866,
1728
+ "step": 12100
1729
+ },
1730
+ {
1731
+ "epoch": 1.577922077922078,
1732
+ "grad_norm": 10.26307201385498,
1733
+ "learning_rate": 4.2778986330725024e-05,
1734
+ "loss": 0.9212,
1735
+ "step": 12150
1736
+ },
1737
+ {
1738
+ "epoch": 1.5844155844155843,
1739
+ "grad_norm": 8.138628005981445,
1740
+ "learning_rate": 4.2697814864118965e-05,
1741
+ "loss": 0.9023,
1742
+ "step": 12200
1743
+ },
1744
+ {
1745
+ "epoch": 1.5909090909090908,
1746
+ "grad_norm": 7.918432712554932,
1747
+ "learning_rate": 4.2616643397512907e-05,
1748
+ "loss": 0.8912,
1749
+ "step": 12250
1750
+ },
1751
+ {
1752
+ "epoch": 1.5974025974025974,
1753
+ "grad_norm": 7.5802903175354,
1754
+ "learning_rate": 4.253547193090685e-05,
1755
+ "loss": 0.8844,
1756
+ "step": 12300
1757
+ },
1758
+ {
1759
+ "epoch": 1.603896103896104,
1760
+ "grad_norm": 8.902767181396484,
1761
+ "learning_rate": 4.245430046430079e-05,
1762
+ "loss": 0.9322,
1763
+ "step": 12350
1764
+ },
1765
+ {
1766
+ "epoch": 1.6103896103896105,
1767
+ "grad_norm": 8.970231056213379,
1768
+ "learning_rate": 4.237312899769473e-05,
1769
+ "loss": 0.9373,
1770
+ "step": 12400
1771
+ },
1772
+ {
1773
+ "epoch": 1.616883116883117,
1774
+ "grad_norm": 7.878787040710449,
1775
+ "learning_rate": 4.229195753108867e-05,
1776
+ "loss": 0.892,
1777
+ "step": 12450
1778
+ },
1779
+ {
1780
+ "epoch": 1.6233766233766234,
1781
+ "grad_norm": 5.676933288574219,
1782
+ "learning_rate": 4.221078606448261e-05,
1783
+ "loss": 0.9218,
1784
+ "step": 12500
1785
+ },
1786
+ {
1787
+ "epoch": 1.62987012987013,
1788
+ "grad_norm": 8.642828941345215,
1789
+ "learning_rate": 4.2129614597876555e-05,
1790
+ "loss": 0.8738,
1791
+ "step": 12550
1792
+ },
1793
+ {
1794
+ "epoch": 1.6363636363636362,
1795
+ "grad_norm": 8.371003150939941,
1796
+ "learning_rate": 4.2048443131270496e-05,
1797
+ "loss": 0.8698,
1798
+ "step": 12600
1799
+ },
1800
+ {
1801
+ "epoch": 1.6428571428571428,
1802
+ "grad_norm": 9.852381706237793,
1803
+ "learning_rate": 4.196727166466444e-05,
1804
+ "loss": 0.8722,
1805
+ "step": 12650
1806
+ },
1807
+ {
1808
+ "epoch": 1.6493506493506493,
1809
+ "grad_norm": 8.32621955871582,
1810
+ "learning_rate": 4.188610019805838e-05,
1811
+ "loss": 0.8918,
1812
+ "step": 12700
1813
+ },
1814
+ {
1815
+ "epoch": 1.655844155844156,
1816
+ "grad_norm": 8.382570266723633,
1817
+ "learning_rate": 4.180492873145232e-05,
1818
+ "loss": 0.9164,
1819
+ "step": 12750
1820
+ },
1821
+ {
1822
+ "epoch": 1.6623376623376624,
1823
+ "grad_norm": 11.756057739257812,
1824
+ "learning_rate": 4.172375726484626e-05,
1825
+ "loss": 0.8194,
1826
+ "step": 12800
1827
+ },
1828
+ {
1829
+ "epoch": 1.6688311688311688,
1830
+ "grad_norm": 7.933291435241699,
1831
+ "learning_rate": 4.16425857982402e-05,
1832
+ "loss": 0.8228,
1833
+ "step": 12850
1834
+ },
1835
+ {
1836
+ "epoch": 1.6753246753246753,
1837
+ "grad_norm": 6.6486101150512695,
1838
+ "learning_rate": 4.1561414331634144e-05,
1839
+ "loss": 0.8091,
1840
+ "step": 12900
1841
+ },
1842
+ {
1843
+ "epoch": 1.6818181818181817,
1844
+ "grad_norm": 10.598429679870605,
1845
+ "learning_rate": 4.1480242865028086e-05,
1846
+ "loss": 0.8162,
1847
+ "step": 12950
1848
+ },
1849
+ {
1850
+ "epoch": 1.6883116883116882,
1851
+ "grad_norm": 6.9950785636901855,
1852
+ "learning_rate": 4.139907139842203e-05,
1853
+ "loss": 0.8185,
1854
+ "step": 13000
1855
+ },
1856
+ {
1857
+ "epoch": 1.6948051948051948,
1858
+ "grad_norm": 6.568070888519287,
1859
+ "learning_rate": 4.131789993181597e-05,
1860
+ "loss": 0.8196,
1861
+ "step": 13050
1862
+ },
1863
+ {
1864
+ "epoch": 1.7012987012987013,
1865
+ "grad_norm": 8.708261489868164,
1866
+ "learning_rate": 4.123672846520991e-05,
1867
+ "loss": 0.7892,
1868
+ "step": 13100
1869
+ },
1870
+ {
1871
+ "epoch": 1.7077922077922079,
1872
+ "grad_norm": 7.371927261352539,
1873
+ "learning_rate": 4.115555699860385e-05,
1874
+ "loss": 0.7315,
1875
+ "step": 13150
1876
+ },
1877
+ {
1878
+ "epoch": 1.7142857142857144,
1879
+ "grad_norm": 8.890507698059082,
1880
+ "learning_rate": 4.107438553199779e-05,
1881
+ "loss": 0.7381,
1882
+ "step": 13200
1883
+ },
1884
+ {
1885
+ "epoch": 1.7207792207792207,
1886
+ "grad_norm": 7.524810314178467,
1887
+ "learning_rate": 4.0993214065391734e-05,
1888
+ "loss": 0.7465,
1889
+ "step": 13250
1890
+ },
1891
+ {
1892
+ "epoch": 1.7272727272727273,
1893
+ "grad_norm": 7.928349494934082,
1894
+ "learning_rate": 4.0912042598785675e-05,
1895
+ "loss": 0.7168,
1896
+ "step": 13300
1897
+ },
1898
+ {
1899
+ "epoch": 1.7337662337662336,
1900
+ "grad_norm": 5.499800682067871,
1901
+ "learning_rate": 4.0830871132179616e-05,
1902
+ "loss": 0.7449,
1903
+ "step": 13350
1904
+ },
1905
+ {
1906
+ "epoch": 1.7402597402597402,
1907
+ "grad_norm": 5.12719202041626,
1908
+ "learning_rate": 4.074969966557356e-05,
1909
+ "loss": 0.7447,
1910
+ "step": 13400
1911
+ },
1912
+ {
1913
+ "epoch": 1.7467532467532467,
1914
+ "grad_norm": 8.132304191589355,
1915
+ "learning_rate": 4.06685281989675e-05,
1916
+ "loss": 0.6669,
1917
+ "step": 13450
1918
+ },
1919
+ {
1920
+ "epoch": 1.7532467532467533,
1921
+ "grad_norm": 9.351593017578125,
1922
+ "learning_rate": 4.058735673236144e-05,
1923
+ "loss": 0.7496,
1924
+ "step": 13500
1925
+ },
1926
+ {
1927
+ "epoch": 1.7597402597402598,
1928
+ "grad_norm": 8.41600513458252,
1929
+ "learning_rate": 4.050618526575538e-05,
1930
+ "loss": 0.6783,
1931
+ "step": 13550
1932
+ },
1933
+ {
1934
+ "epoch": 1.7662337662337664,
1935
+ "grad_norm": 8.17464828491211,
1936
+ "learning_rate": 4.042501379914932e-05,
1937
+ "loss": 0.7175,
1938
+ "step": 13600
1939
+ },
1940
+ {
1941
+ "epoch": 1.7727272727272727,
1942
+ "grad_norm": 6.9847612380981445,
1943
+ "learning_rate": 4.034384233254327e-05,
1944
+ "loss": 0.6728,
1945
+ "step": 13650
1946
+ },
1947
+ {
1948
+ "epoch": 1.7792207792207793,
1949
+ "grad_norm": 5.476688861846924,
1950
+ "learning_rate": 4.0262670865937206e-05,
1951
+ "loss": 0.6659,
1952
+ "step": 13700
1953
+ },
1954
+ {
1955
+ "epoch": 1.7857142857142856,
1956
+ "grad_norm": 9.166929244995117,
1957
+ "learning_rate": 4.0181499399331154e-05,
1958
+ "loss": 0.6861,
1959
+ "step": 13750
1960
+ },
1961
+ {
1962
+ "epoch": 1.7922077922077921,
1963
+ "grad_norm": 7.089775562286377,
1964
+ "learning_rate": 4.010032793272509e-05,
1965
+ "loss": 0.6372,
1966
+ "step": 13800
1967
+ },
1968
+ {
1969
+ "epoch": 1.7987012987012987,
1970
+ "grad_norm": 5.01121187210083,
1971
+ "learning_rate": 4.001915646611904e-05,
1972
+ "loss": 0.6679,
1973
+ "step": 13850
1974
+ },
1975
+ {
1976
+ "epoch": 1.8051948051948052,
1977
+ "grad_norm": 7.3362298011779785,
1978
+ "learning_rate": 3.993798499951297e-05,
1979
+ "loss": 0.6307,
1980
+ "step": 13900
1981
+ },
1982
+ {
1983
+ "epoch": 1.8116883116883118,
1984
+ "grad_norm": 6.582499980926514,
1985
+ "learning_rate": 3.985681353290692e-05,
1986
+ "loss": 0.6307,
1987
+ "step": 13950
1988
+ },
1989
+ {
1990
+ "epoch": 1.8181818181818183,
1991
+ "grad_norm": 5.2383575439453125,
1992
+ "learning_rate": 3.9775642066300854e-05,
1993
+ "loss": 0.622,
1994
+ "step": 14000
1995
+ },
1996
+ {
1997
+ "epoch": 1.8246753246753247,
1998
+ "grad_norm": 6.22122049331665,
1999
+ "learning_rate": 3.96944705996948e-05,
2000
+ "loss": 0.6362,
2001
+ "step": 14050
2002
+ },
2003
+ {
2004
+ "epoch": 1.8311688311688312,
2005
+ "grad_norm": 6.812269687652588,
2006
+ "learning_rate": 3.961329913308874e-05,
2007
+ "loss": 0.62,
2008
+ "step": 14100
2009
+ },
2010
+ {
2011
+ "epoch": 1.8376623376623376,
2012
+ "grad_norm": 6.444869041442871,
2013
+ "learning_rate": 3.9532127666482685e-05,
2014
+ "loss": 0.617,
2015
+ "step": 14150
2016
+ },
2017
+ {
2018
+ "epoch": 1.844155844155844,
2019
+ "grad_norm": 6.043448448181152,
2020
+ "learning_rate": 3.945095619987662e-05,
2021
+ "loss": 0.6069,
2022
+ "step": 14200
2023
+ },
2024
+ {
2025
+ "epoch": 1.8506493506493507,
2026
+ "grad_norm": 6.895667552947998,
2027
+ "learning_rate": 3.936978473327057e-05,
2028
+ "loss": 0.5839,
2029
+ "step": 14250
2030
+ },
2031
+ {
2032
+ "epoch": 1.8571428571428572,
2033
+ "grad_norm": 9.060486793518066,
2034
+ "learning_rate": 3.92886132666645e-05,
2035
+ "loss": 0.5974,
2036
+ "step": 14300
2037
+ },
2038
+ {
2039
+ "epoch": 1.8636363636363638,
2040
+ "grad_norm": 6.743723392486572,
2041
+ "learning_rate": 3.920744180005845e-05,
2042
+ "loss": 0.5727,
2043
+ "step": 14350
2044
+ },
2045
+ {
2046
+ "epoch": 1.87012987012987,
2047
+ "grad_norm": 9.108243942260742,
2048
+ "learning_rate": 3.9126270333452385e-05,
2049
+ "loss": 0.5629,
2050
+ "step": 14400
2051
+ },
2052
+ {
2053
+ "epoch": 1.8766233766233766,
2054
+ "grad_norm": 7.595779895782471,
2055
+ "learning_rate": 3.904509886684633e-05,
2056
+ "loss": 0.56,
2057
+ "step": 14450
2058
+ },
2059
+ {
2060
+ "epoch": 1.883116883116883,
2061
+ "grad_norm": 8.444317817687988,
2062
+ "learning_rate": 3.896392740024027e-05,
2063
+ "loss": 0.5891,
2064
+ "step": 14500
2065
+ },
2066
+ {
2067
+ "epoch": 1.8896103896103895,
2068
+ "grad_norm": 6.892787933349609,
2069
+ "learning_rate": 3.8882755933634216e-05,
2070
+ "loss": 0.5899,
2071
+ "step": 14550
2072
+ },
2073
+ {
2074
+ "epoch": 1.896103896103896,
2075
+ "grad_norm": 8.623988151550293,
2076
+ "learning_rate": 3.880158446702815e-05,
2077
+ "loss": 0.5348,
2078
+ "step": 14600
2079
+ },
2080
+ {
2081
+ "epoch": 1.9025974025974026,
2082
+ "grad_norm": 6.2621378898620605,
2083
+ "learning_rate": 3.87204130004221e-05,
2084
+ "loss": 0.5753,
2085
+ "step": 14650
2086
+ },
2087
+ {
2088
+ "epoch": 1.9090909090909092,
2089
+ "grad_norm": 6.479671001434326,
2090
+ "learning_rate": 3.863924153381603e-05,
2091
+ "loss": 0.5668,
2092
+ "step": 14700
2093
+ },
2094
+ {
2095
+ "epoch": 1.9155844155844157,
2096
+ "grad_norm": 6.0399603843688965,
2097
+ "learning_rate": 3.855807006720998e-05,
2098
+ "loss": 0.5778,
2099
+ "step": 14750
2100
+ },
2101
+ {
2102
+ "epoch": 1.922077922077922,
2103
+ "grad_norm": 8.45608139038086,
2104
+ "learning_rate": 3.8476898600603916e-05,
2105
+ "loss": 0.5127,
2106
+ "step": 14800
2107
+ },
2108
+ {
2109
+ "epoch": 1.9285714285714286,
2110
+ "grad_norm": 8.474074363708496,
2111
+ "learning_rate": 3.8395727133997864e-05,
2112
+ "loss": 0.5291,
2113
+ "step": 14850
2114
+ },
2115
+ {
2116
+ "epoch": 1.935064935064935,
2117
+ "grad_norm": 6.067905902862549,
2118
+ "learning_rate": 3.83145556673918e-05,
2119
+ "loss": 0.5512,
2120
+ "step": 14900
2121
+ },
2122
+ {
2123
+ "epoch": 1.9415584415584415,
2124
+ "grad_norm": 4.828126907348633,
2125
+ "learning_rate": 3.823338420078575e-05,
2126
+ "loss": 0.533,
2127
+ "step": 14950
2128
+ },
2129
+ {
2130
+ "epoch": 1.948051948051948,
2131
+ "grad_norm": 4.592970848083496,
2132
+ "learning_rate": 3.815221273417968e-05,
2133
+ "loss": 0.5455,
2134
+ "step": 15000
2135
+ },
2136
+ {
2137
+ "epoch": 1.9545454545454546,
2138
+ "grad_norm": 7.459578990936279,
2139
+ "learning_rate": 3.807104126757363e-05,
2140
+ "loss": 0.511,
2141
+ "step": 15050
2142
+ },
2143
+ {
2144
+ "epoch": 1.9610389610389611,
2145
+ "grad_norm": 7.107131004333496,
2146
+ "learning_rate": 3.7989869800967564e-05,
2147
+ "loss": 0.4827,
2148
+ "step": 15100
2149
+ },
2150
+ {
2151
+ "epoch": 1.9675324675324677,
2152
+ "grad_norm": 6.103736400604248,
2153
+ "learning_rate": 3.7908698334361505e-05,
2154
+ "loss": 0.5358,
2155
+ "step": 15150
2156
+ },
2157
+ {
2158
+ "epoch": 1.974025974025974,
2159
+ "grad_norm": 5.223392486572266,
2160
+ "learning_rate": 3.782752686775545e-05,
2161
+ "loss": 0.4733,
2162
+ "step": 15200
2163
+ },
2164
+ {
2165
+ "epoch": 1.9805194805194806,
2166
+ "grad_norm": 4.993494987487793,
2167
+ "learning_rate": 3.774635540114939e-05,
2168
+ "loss": 0.4979,
2169
+ "step": 15250
2170
+ },
2171
+ {
2172
+ "epoch": 1.987012987012987,
2173
+ "grad_norm": 6.17307186126709,
2174
+ "learning_rate": 3.766518393454333e-05,
2175
+ "loss": 0.4809,
2176
+ "step": 15300
2177
+ },
2178
+ {
2179
+ "epoch": 1.9935064935064934,
2180
+ "grad_norm": 5.775139808654785,
2181
+ "learning_rate": 3.758401246793727e-05,
2182
+ "loss": 0.4783,
2183
+ "step": 15350
2184
+ },
2185
+ {
2186
+ "epoch": 2.0,
2187
+ "grad_norm": 5.669449329376221,
2188
+ "learning_rate": 3.750284100133121e-05,
2189
+ "loss": 0.5226,
2190
+ "step": 15400
2191
+ },
2192
+ {
2193
+ "epoch": 2.0,
2194
+ "eval_runtime": 182.0328,
2195
+ "eval_samples_per_second": 0.0,
2196
+ "eval_steps_per_second": 0.0,
2197
+ "eval_validation_loss": 5.856312504741056,
2198
+ "step": 15400
2199
+ },
2200
+ {
2201
+ "epoch": 2.0064935064935066,
2202
+ "grad_norm": 8.055193901062012,
2203
+ "learning_rate": 3.7421669534725154e-05,
2204
+ "loss": 0.4851,
2205
+ "step": 15450
2206
+ },
2207
+ {
2208
+ "epoch": 2.012987012987013,
2209
+ "grad_norm": 7.40401554107666,
2210
+ "learning_rate": 3.7340498068119095e-05,
2211
+ "loss": 0.5236,
2212
+ "step": 15500
2213
+ },
2214
+ {
2215
+ "epoch": 2.0194805194805197,
2216
+ "grad_norm": 9.347941398620605,
2217
+ "learning_rate": 3.7259326601513036e-05,
2218
+ "loss": 0.487,
2219
+ "step": 15550
2220
+ },
2221
+ {
2222
+ "epoch": 2.0259740259740258,
2223
+ "grad_norm": 8.087934494018555,
2224
+ "learning_rate": 3.717815513490698e-05,
2225
+ "loss": 0.4984,
2226
+ "step": 15600
2227
+ },
2228
+ {
2229
+ "epoch": 2.0324675324675323,
2230
+ "grad_norm": 6.169537544250488,
2231
+ "learning_rate": 3.709698366830092e-05,
2232
+ "loss": 0.493,
2233
+ "step": 15650
2234
+ },
2235
+ {
2236
+ "epoch": 2.038961038961039,
2237
+ "grad_norm": 7.2742486000061035,
2238
+ "learning_rate": 3.701581220169486e-05,
2239
+ "loss": 0.4937,
2240
+ "step": 15700
2241
+ },
2242
+ {
2243
+ "epoch": 2.0454545454545454,
2244
+ "grad_norm": 5.88269567489624,
2245
+ "learning_rate": 3.69346407350888e-05,
2246
+ "loss": 0.5143,
2247
+ "step": 15750
2248
+ },
2249
+ {
2250
+ "epoch": 2.051948051948052,
2251
+ "grad_norm": 5.668929100036621,
2252
+ "learning_rate": 3.685346926848274e-05,
2253
+ "loss": 0.4471,
2254
+ "step": 15800
2255
+ },
2256
+ {
2257
+ "epoch": 2.0584415584415585,
2258
+ "grad_norm": 3.973904848098755,
2259
+ "learning_rate": 3.6772297801876684e-05,
2260
+ "loss": 0.5013,
2261
+ "step": 15850
2262
+ },
2263
+ {
2264
+ "epoch": 2.064935064935065,
2265
+ "grad_norm": 8.019608497619629,
2266
+ "learning_rate": 3.6691126335270626e-05,
2267
+ "loss": 0.4686,
2268
+ "step": 15900
2269
+ },
2270
+ {
2271
+ "epoch": 2.0714285714285716,
2272
+ "grad_norm": 4.772839546203613,
2273
+ "learning_rate": 3.660995486866457e-05,
2274
+ "loss": 0.4246,
2275
+ "step": 15950
2276
+ },
2277
+ {
2278
+ "epoch": 2.0779220779220777,
2279
+ "grad_norm": 5.810505390167236,
2280
+ "learning_rate": 3.652878340205851e-05,
2281
+ "loss": 0.4193,
2282
+ "step": 16000
2283
+ },
2284
+ {
2285
+ "epoch": 2.0844155844155843,
2286
+ "grad_norm": 5.574864387512207,
2287
+ "learning_rate": 3.644761193545245e-05,
2288
+ "loss": 0.4438,
2289
+ "step": 16050
2290
+ },
2291
+ {
2292
+ "epoch": 2.090909090909091,
2293
+ "grad_norm": 4.754704475402832,
2294
+ "learning_rate": 3.636644046884639e-05,
2295
+ "loss": 0.4426,
2296
+ "step": 16100
2297
+ },
2298
+ {
2299
+ "epoch": 2.0974025974025974,
2300
+ "grad_norm": 4.981619834899902,
2301
+ "learning_rate": 3.628526900224033e-05,
2302
+ "loss": 0.4511,
2303
+ "step": 16150
2304
+ },
2305
+ {
2306
+ "epoch": 2.103896103896104,
2307
+ "grad_norm": 5.050504684448242,
2308
+ "learning_rate": 3.6204097535634274e-05,
2309
+ "loss": 0.3932,
2310
+ "step": 16200
2311
+ },
2312
+ {
2313
+ "epoch": 2.1103896103896105,
2314
+ "grad_norm": 5.204791069030762,
2315
+ "learning_rate": 3.6122926069028215e-05,
2316
+ "loss": 0.4574,
2317
+ "step": 16250
2318
+ },
2319
+ {
2320
+ "epoch": 2.116883116883117,
2321
+ "grad_norm": 3.777693748474121,
2322
+ "learning_rate": 3.604175460242216e-05,
2323
+ "loss": 0.4363,
2324
+ "step": 16300
2325
+ },
2326
+ {
2327
+ "epoch": 2.1233766233766236,
2328
+ "grad_norm": 4.017716884613037,
2329
+ "learning_rate": 3.59605831358161e-05,
2330
+ "loss": 0.4181,
2331
+ "step": 16350
2332
+ },
2333
+ {
2334
+ "epoch": 2.1298701298701297,
2335
+ "grad_norm": 6.477935314178467,
2336
+ "learning_rate": 3.587941166921004e-05,
2337
+ "loss": 0.4237,
2338
+ "step": 16400
2339
+ },
2340
+ {
2341
+ "epoch": 2.1363636363636362,
2342
+ "grad_norm": 6.628262519836426,
2343
+ "learning_rate": 3.579824020260398e-05,
2344
+ "loss": 0.4611,
2345
+ "step": 16450
2346
+ },
2347
+ {
2348
+ "epoch": 2.142857142857143,
2349
+ "grad_norm": 5.98927116394043,
2350
+ "learning_rate": 3.571706873599792e-05,
2351
+ "loss": 0.4072,
2352
+ "step": 16500
2353
+ },
2354
+ {
2355
+ "epoch": 2.1493506493506493,
2356
+ "grad_norm": 9.416532516479492,
2357
+ "learning_rate": 3.5635897269391863e-05,
2358
+ "loss": 0.4382,
2359
+ "step": 16550
2360
+ },
2361
+ {
2362
+ "epoch": 2.155844155844156,
2363
+ "grad_norm": 6.329733848571777,
2364
+ "learning_rate": 3.5554725802785805e-05,
2365
+ "loss": 0.4325,
2366
+ "step": 16600
2367
+ },
2368
+ {
2369
+ "epoch": 2.1623376623376624,
2370
+ "grad_norm": 15.381741523742676,
2371
+ "learning_rate": 3.5473554336179746e-05,
2372
+ "loss": 0.4315,
2373
+ "step": 16650
2374
+ },
2375
+ {
2376
+ "epoch": 2.168831168831169,
2377
+ "grad_norm": 6.046841621398926,
2378
+ "learning_rate": 3.539238286957369e-05,
2379
+ "loss": 0.4194,
2380
+ "step": 16700
2381
+ },
2382
+ {
2383
+ "epoch": 2.175324675324675,
2384
+ "grad_norm": 6.270512104034424,
2385
+ "learning_rate": 3.531121140296763e-05,
2386
+ "loss": 0.41,
2387
+ "step": 16750
2388
+ },
2389
+ {
2390
+ "epoch": 2.1818181818181817,
2391
+ "grad_norm": 5.776293754577637,
2392
+ "learning_rate": 3.523003993636157e-05,
2393
+ "loss": 0.395,
2394
+ "step": 16800
2395
+ },
2396
+ {
2397
+ "epoch": 2.188311688311688,
2398
+ "grad_norm": 7.4210357666015625,
2399
+ "learning_rate": 3.514886846975551e-05,
2400
+ "loss": 0.4141,
2401
+ "step": 16850
2402
+ },
2403
+ {
2404
+ "epoch": 2.1948051948051948,
2405
+ "grad_norm": 3.435021162033081,
2406
+ "learning_rate": 3.506769700314945e-05,
2407
+ "loss": 0.4234,
2408
+ "step": 16900
2409
+ },
2410
+ {
2411
+ "epoch": 2.2012987012987013,
2412
+ "grad_norm": 6.888435363769531,
2413
+ "learning_rate": 3.4986525536543394e-05,
2414
+ "loss": 0.3706,
2415
+ "step": 16950
2416
+ },
2417
+ {
2418
+ "epoch": 2.207792207792208,
2419
+ "grad_norm": 4.45454740524292,
2420
+ "learning_rate": 3.4905354069937336e-05,
2421
+ "loss": 0.375,
2422
+ "step": 17000
2423
+ },
2424
+ {
2425
+ "epoch": 2.2142857142857144,
2426
+ "grad_norm": 7.119910717010498,
2427
+ "learning_rate": 3.482418260333128e-05,
2428
+ "loss": 0.3856,
2429
+ "step": 17050
2430
+ },
2431
+ {
2432
+ "epoch": 2.220779220779221,
2433
+ "grad_norm": 6.561718940734863,
2434
+ "learning_rate": 3.474301113672522e-05,
2435
+ "loss": 0.4104,
2436
+ "step": 17100
2437
+ },
2438
+ {
2439
+ "epoch": 2.227272727272727,
2440
+ "grad_norm": 5.923313140869141,
2441
+ "learning_rate": 3.466183967011916e-05,
2442
+ "loss": 0.3682,
2443
+ "step": 17150
2444
+ },
2445
+ {
2446
+ "epoch": 2.2337662337662336,
2447
+ "grad_norm": 4.552441596984863,
2448
+ "learning_rate": 3.45806682035131e-05,
2449
+ "loss": 0.3849,
2450
+ "step": 17200
2451
+ },
2452
+ {
2453
+ "epoch": 2.24025974025974,
2454
+ "grad_norm": 3.2609236240386963,
2455
+ "learning_rate": 3.449949673690704e-05,
2456
+ "loss": 0.3607,
2457
+ "step": 17250
2458
+ },
2459
+ {
2460
+ "epoch": 2.2467532467532467,
2461
+ "grad_norm": 4.543074131011963,
2462
+ "learning_rate": 3.4418325270300984e-05,
2463
+ "loss": 0.3821,
2464
+ "step": 17300
2465
+ },
2466
+ {
2467
+ "epoch": 2.2532467532467533,
2468
+ "grad_norm": 8.050787925720215,
2469
+ "learning_rate": 3.4337153803694925e-05,
2470
+ "loss": 0.3749,
2471
+ "step": 17350
2472
+ },
2473
+ {
2474
+ "epoch": 2.25974025974026,
2475
+ "grad_norm": 5.056412696838379,
2476
+ "learning_rate": 3.4255982337088867e-05,
2477
+ "loss": 0.3548,
2478
+ "step": 17400
2479
+ },
2480
+ {
2481
+ "epoch": 2.2662337662337664,
2482
+ "grad_norm": 4.244089126586914,
2483
+ "learning_rate": 3.417481087048281e-05,
2484
+ "loss": 0.3684,
2485
+ "step": 17450
2486
+ },
2487
+ {
2488
+ "epoch": 2.2727272727272725,
2489
+ "grad_norm": 5.408504009246826,
2490
+ "learning_rate": 3.409363940387675e-05,
2491
+ "loss": 0.3649,
2492
+ "step": 17500
2493
+ },
2494
+ {
2495
+ "epoch": 2.279220779220779,
2496
+ "grad_norm": 5.972458362579346,
2497
+ "learning_rate": 3.401246793727069e-05,
2498
+ "loss": 0.3547,
2499
+ "step": 17550
2500
+ },
2501
+ {
2502
+ "epoch": 2.2857142857142856,
2503
+ "grad_norm": 4.5682268142700195,
2504
+ "learning_rate": 3.393129647066464e-05,
2505
+ "loss": 0.3308,
2506
+ "step": 17600
2507
+ },
2508
+ {
2509
+ "epoch": 2.292207792207792,
2510
+ "grad_norm": 5.563425064086914,
2511
+ "learning_rate": 3.385012500405857e-05,
2512
+ "loss": 0.3417,
2513
+ "step": 17650
2514
+ },
2515
+ {
2516
+ "epoch": 2.2987012987012987,
2517
+ "grad_norm": 5.86864709854126,
2518
+ "learning_rate": 3.376895353745252e-05,
2519
+ "loss": 0.3414,
2520
+ "step": 17700
2521
+ },
2522
+ {
2523
+ "epoch": 2.3051948051948052,
2524
+ "grad_norm": 5.116090774536133,
2525
+ "learning_rate": 3.3687782070846456e-05,
2526
+ "loss": 0.3372,
2527
+ "step": 17750
2528
+ },
2529
+ {
2530
+ "epoch": 2.311688311688312,
2531
+ "grad_norm": 5.31484842300415,
2532
+ "learning_rate": 3.3606610604240404e-05,
2533
+ "loss": 0.348,
2534
+ "step": 17800
2535
+ },
2536
+ {
2537
+ "epoch": 2.3181818181818183,
2538
+ "grad_norm": 4.313574314117432,
2539
+ "learning_rate": 3.352543913763434e-05,
2540
+ "loss": 0.3391,
2541
+ "step": 17850
2542
+ },
2543
+ {
2544
+ "epoch": 2.324675324675325,
2545
+ "grad_norm": 7.954375743865967,
2546
+ "learning_rate": 3.344426767102829e-05,
2547
+ "loss": 0.3172,
2548
+ "step": 17900
2549
+ },
2550
+ {
2551
+ "epoch": 2.331168831168831,
2552
+ "grad_norm": 8.023120880126953,
2553
+ "learning_rate": 3.336309620442222e-05,
2554
+ "loss": 0.3336,
2555
+ "step": 17950
2556
+ },
2557
+ {
2558
+ "epoch": 2.3376623376623376,
2559
+ "grad_norm": 4.240795135498047,
2560
+ "learning_rate": 3.328192473781617e-05,
2561
+ "loss": 0.3228,
2562
+ "step": 18000
2563
+ },
2564
+ {
2565
+ "epoch": 2.344155844155844,
2566
+ "grad_norm": 7.768456935882568,
2567
+ "learning_rate": 3.3200753271210104e-05,
2568
+ "loss": 0.3643,
2569
+ "step": 18050
2570
+ },
2571
+ {
2572
+ "epoch": 2.3506493506493507,
2573
+ "grad_norm": 4.074265956878662,
2574
+ "learning_rate": 3.311958180460405e-05,
2575
+ "loss": 0.3257,
2576
+ "step": 18100
2577
+ },
2578
+ {
2579
+ "epoch": 2.357142857142857,
2580
+ "grad_norm": 4.01935338973999,
2581
+ "learning_rate": 3.303841033799799e-05,
2582
+ "loss": 0.328,
2583
+ "step": 18150
2584
+ },
2585
+ {
2586
+ "epoch": 2.3636363636363638,
2587
+ "grad_norm": 5.831508636474609,
2588
+ "learning_rate": 3.2957238871391935e-05,
2589
+ "loss": 0.3218,
2590
+ "step": 18200
2591
+ },
2592
+ {
2593
+ "epoch": 2.3701298701298703,
2594
+ "grad_norm": 6.187150001525879,
2595
+ "learning_rate": 3.287606740478587e-05,
2596
+ "loss": 0.3208,
2597
+ "step": 18250
2598
+ },
2599
+ {
2600
+ "epoch": 2.3766233766233764,
2601
+ "grad_norm": 3.0703017711639404,
2602
+ "learning_rate": 3.279489593817982e-05,
2603
+ "loss": 0.3085,
2604
+ "step": 18300
2605
+ },
2606
+ {
2607
+ "epoch": 2.383116883116883,
2608
+ "grad_norm": 6.299106597900391,
2609
+ "learning_rate": 3.271372447157375e-05,
2610
+ "loss": 0.3118,
2611
+ "step": 18350
2612
+ },
2613
+ {
2614
+ "epoch": 2.3896103896103895,
2615
+ "grad_norm": 3.640641927719116,
2616
+ "learning_rate": 3.26325530049677e-05,
2617
+ "loss": 0.3165,
2618
+ "step": 18400
2619
+ },
2620
+ {
2621
+ "epoch": 2.396103896103896,
2622
+ "grad_norm": 3.131242275238037,
2623
+ "learning_rate": 3.2551381538361635e-05,
2624
+ "loss": 0.3058,
2625
+ "step": 18450
2626
+ },
2627
+ {
2628
+ "epoch": 2.4025974025974026,
2629
+ "grad_norm": 7.147828102111816,
2630
+ "learning_rate": 3.247021007175558e-05,
2631
+ "loss": 0.3082,
2632
+ "step": 18500
2633
+ },
2634
+ {
2635
+ "epoch": 2.409090909090909,
2636
+ "grad_norm": 6.766499042510986,
2637
+ "learning_rate": 3.238903860514952e-05,
2638
+ "loss": 0.3181,
2639
+ "step": 18550
2640
+ },
2641
+ {
2642
+ "epoch": 2.4155844155844157,
2643
+ "grad_norm": 6.955957412719727,
2644
+ "learning_rate": 3.2307867138543466e-05,
2645
+ "loss": 0.3269,
2646
+ "step": 18600
2647
+ },
2648
+ {
2649
+ "epoch": 2.4220779220779223,
2650
+ "grad_norm": 4.883868217468262,
2651
+ "learning_rate": 3.22266956719374e-05,
2652
+ "loss": 0.3197,
2653
+ "step": 18650
2654
+ },
2655
+ {
2656
+ "epoch": 2.4285714285714284,
2657
+ "grad_norm": 6.362038612365723,
2658
+ "learning_rate": 3.214552420533135e-05,
2659
+ "loss": 0.305,
2660
+ "step": 18700
2661
+ },
2662
+ {
2663
+ "epoch": 2.435064935064935,
2664
+ "grad_norm": 2.812976837158203,
2665
+ "learning_rate": 3.206435273872528e-05,
2666
+ "loss": 0.2837,
2667
+ "step": 18750
2668
+ },
2669
+ {
2670
+ "epoch": 2.4415584415584415,
2671
+ "grad_norm": 5.885760307312012,
2672
+ "learning_rate": 3.198318127211923e-05,
2673
+ "loss": 0.2694,
2674
+ "step": 18800
2675
+ },
2676
+ {
2677
+ "epoch": 2.448051948051948,
2678
+ "grad_norm": 5.411834716796875,
2679
+ "learning_rate": 3.1902009805513166e-05,
2680
+ "loss": 0.281,
2681
+ "step": 18850
2682
+ },
2683
+ {
2684
+ "epoch": 2.4545454545454546,
2685
+ "grad_norm": 3.291440963745117,
2686
+ "learning_rate": 3.1820838338907114e-05,
2687
+ "loss": 0.2493,
2688
+ "step": 18900
2689
+ },
2690
+ {
2691
+ "epoch": 2.461038961038961,
2692
+ "grad_norm": 4.711409568786621,
2693
+ "learning_rate": 3.173966687230105e-05,
2694
+ "loss": 0.279,
2695
+ "step": 18950
2696
+ },
2697
+ {
2698
+ "epoch": 2.4675324675324677,
2699
+ "grad_norm": 3.8969242572784424,
2700
+ "learning_rate": 3.165849540569499e-05,
2701
+ "loss": 0.2775,
2702
+ "step": 19000
2703
+ },
2704
+ {
2705
+ "epoch": 2.474025974025974,
2706
+ "grad_norm": 3.923271656036377,
2707
+ "learning_rate": 3.157732393908893e-05,
2708
+ "loss": 0.2566,
2709
+ "step": 19050
2710
+ },
2711
+ {
2712
+ "epoch": 2.4805194805194803,
2713
+ "grad_norm": 3.8998281955718994,
2714
+ "learning_rate": 3.149615247248287e-05,
2715
+ "loss": 0.2637,
2716
+ "step": 19100
2717
+ },
2718
+ {
2719
+ "epoch": 2.487012987012987,
2720
+ "grad_norm": 4.799726486206055,
2721
+ "learning_rate": 3.1414981005876814e-05,
2722
+ "loss": 0.2639,
2723
+ "step": 19150
2724
+ },
2725
+ {
2726
+ "epoch": 2.4935064935064934,
2727
+ "grad_norm": 6.7094855308532715,
2728
+ "learning_rate": 3.1333809539270756e-05,
2729
+ "loss": 0.2666,
2730
+ "step": 19200
2731
+ },
2732
+ {
2733
+ "epoch": 2.5,
2734
+ "grad_norm": 5.960465431213379,
2735
+ "learning_rate": 3.12526380726647e-05,
2736
+ "loss": 0.2672,
2737
+ "step": 19250
2738
+ },
2739
+ {
2740
+ "epoch": 2.5,
2741
+ "eval_runtime": 182.934,
2742
+ "eval_samples_per_second": 0.0,
2743
+ "eval_steps_per_second": 0.0,
2744
+ "eval_validation_loss": 6.039948660291147,
2745
+ "step": 19250
2746
+ },
2747
+ {
2748
+ "epoch": 2.5064935064935066,
2749
+ "grad_norm": 4.438632011413574,
2750
+ "learning_rate": 3.117146660605864e-05,
2751
+ "loss": 0.2524,
2752
+ "step": 19300
2753
+ },
2754
+ {
2755
+ "epoch": 2.512987012987013,
2756
+ "grad_norm": 4.404284954071045,
2757
+ "learning_rate": 3.109029513945258e-05,
2758
+ "loss": 0.2544,
2759
+ "step": 19350
2760
+ },
2761
+ {
2762
+ "epoch": 2.5194805194805197,
2763
+ "grad_norm": 5.944725513458252,
2764
+ "learning_rate": 3.100912367284652e-05,
2765
+ "loss": 0.2549,
2766
+ "step": 19400
2767
+ },
2768
+ {
2769
+ "epoch": 2.525974025974026,
2770
+ "grad_norm": 5.0882086753845215,
2771
+ "learning_rate": 3.092795220624046e-05,
2772
+ "loss": 0.2531,
2773
+ "step": 19450
2774
+ },
2775
+ {
2776
+ "epoch": 2.5324675324675323,
2777
+ "grad_norm": 1.8816249370574951,
2778
+ "learning_rate": 3.0846780739634404e-05,
2779
+ "loss": 0.2528,
2780
+ "step": 19500
2781
+ },
2782
+ {
2783
+ "epoch": 2.538961038961039,
2784
+ "grad_norm": 4.036990165710449,
2785
+ "learning_rate": 3.0765609273028345e-05,
2786
+ "loss": 0.2325,
2787
+ "step": 19550
2788
+ },
2789
+ {
2790
+ "epoch": 2.5454545454545454,
2791
+ "grad_norm": 5.138878345489502,
2792
+ "learning_rate": 3.0684437806422286e-05,
2793
+ "loss": 0.2595,
2794
+ "step": 19600
2795
+ },
2796
+ {
2797
+ "epoch": 2.551948051948052,
2798
+ "grad_norm": 3.4231626987457275,
2799
+ "learning_rate": 3.060326633981623e-05,
2800
+ "loss": 0.2441,
2801
+ "step": 19650
2802
+ },
2803
+ {
2804
+ "epoch": 2.5584415584415585,
2805
+ "grad_norm": 4.641517639160156,
2806
+ "learning_rate": 3.052209487321017e-05,
2807
+ "loss": 0.2348,
2808
+ "step": 19700
2809
+ },
2810
+ {
2811
+ "epoch": 2.564935064935065,
2812
+ "grad_norm": 4.260504722595215,
2813
+ "learning_rate": 3.044092340660411e-05,
2814
+ "loss": 0.2393,
2815
+ "step": 19750
2816
+ },
2817
+ {
2818
+ "epoch": 2.571428571428571,
2819
+ "grad_norm": 3.953183889389038,
2820
+ "learning_rate": 3.0359751939998055e-05,
2821
+ "loss": 0.2482,
2822
+ "step": 19800
2823
+ },
2824
+ {
2825
+ "epoch": 2.5779220779220777,
2826
+ "grad_norm": 5.157094955444336,
2827
+ "learning_rate": 3.0278580473391993e-05,
2828
+ "loss": 0.2389,
2829
+ "step": 19850
2830
+ },
2831
+ {
2832
+ "epoch": 2.5844155844155843,
2833
+ "grad_norm": 4.172763824462891,
2834
+ "learning_rate": 3.0197409006785938e-05,
2835
+ "loss": 0.2222,
2836
+ "step": 19900
2837
+ },
2838
+ {
2839
+ "epoch": 2.590909090909091,
2840
+ "grad_norm": 3.977649450302124,
2841
+ "learning_rate": 3.0116237540179876e-05,
2842
+ "loss": 0.2316,
2843
+ "step": 19950
2844
+ },
2845
+ {
2846
+ "epoch": 2.5974025974025974,
2847
+ "grad_norm": 4.546228408813477,
2848
+ "learning_rate": 3.003506607357382e-05,
2849
+ "loss": 0.2314,
2850
+ "step": 20000
2851
+ },
2852
+ {
2853
+ "epoch": 2.603896103896104,
2854
+ "grad_norm": 4.047219753265381,
2855
+ "learning_rate": 2.995389460696776e-05,
2856
+ "loss": 0.242,
2857
+ "step": 20050
2858
+ },
2859
+ {
2860
+ "epoch": 2.6103896103896105,
2861
+ "grad_norm": 6.16685152053833,
2862
+ "learning_rate": 2.9872723140361703e-05,
2863
+ "loss": 0.2445,
2864
+ "step": 20100
2865
+ },
2866
+ {
2867
+ "epoch": 2.616883116883117,
2868
+ "grad_norm": 3.1169538497924805,
2869
+ "learning_rate": 2.979155167375564e-05,
2870
+ "loss": 0.2217,
2871
+ "step": 20150
2872
+ },
2873
+ {
2874
+ "epoch": 2.6233766233766236,
2875
+ "grad_norm": 1.93070387840271,
2876
+ "learning_rate": 2.9710380207149586e-05,
2877
+ "loss": 0.2276,
2878
+ "step": 20200
2879
+ },
2880
+ {
2881
+ "epoch": 2.62987012987013,
2882
+ "grad_norm": 4.680761814117432,
2883
+ "learning_rate": 2.9629208740543524e-05,
2884
+ "loss": 0.231,
2885
+ "step": 20250
2886
+ },
2887
+ {
2888
+ "epoch": 2.6363636363636362,
2889
+ "grad_norm": 3.588186025619507,
2890
+ "learning_rate": 2.954803727393747e-05,
2891
+ "loss": 0.2195,
2892
+ "step": 20300
2893
+ },
2894
+ {
2895
+ "epoch": 2.642857142857143,
2896
+ "grad_norm": 5.263854026794434,
2897
+ "learning_rate": 2.9466865807331407e-05,
2898
+ "loss": 0.224,
2899
+ "step": 20350
2900
+ },
2901
+ {
2902
+ "epoch": 2.6493506493506493,
2903
+ "grad_norm": 4.57515811920166,
2904
+ "learning_rate": 2.938569434072535e-05,
2905
+ "loss": 0.2224,
2906
+ "step": 20400
2907
+ },
2908
+ {
2909
+ "epoch": 2.655844155844156,
2910
+ "grad_norm": 4.807409763336182,
2911
+ "learning_rate": 2.930452287411929e-05,
2912
+ "loss": 0.2338,
2913
+ "step": 20450
2914
+ },
2915
+ {
2916
+ "epoch": 2.6623376623376624,
2917
+ "grad_norm": 4.311650276184082,
2918
+ "learning_rate": 2.9223351407513234e-05,
2919
+ "loss": 0.2017,
2920
+ "step": 20500
2921
+ },
2922
+ {
2923
+ "epoch": 2.6688311688311686,
2924
+ "grad_norm": 3.570688486099243,
2925
+ "learning_rate": 2.9142179940907172e-05,
2926
+ "loss": 0.2067,
2927
+ "step": 20550
2928
+ },
2929
+ {
2930
+ "epoch": 2.675324675324675,
2931
+ "grad_norm": 2.987044334411621,
2932
+ "learning_rate": 2.9061008474301117e-05,
2933
+ "loss": 0.2019,
2934
+ "step": 20600
2935
+ },
2936
+ {
2937
+ "epoch": 2.6818181818181817,
2938
+ "grad_norm": 6.706995487213135,
2939
+ "learning_rate": 2.8979837007695055e-05,
2940
+ "loss": 0.204,
2941
+ "step": 20650
2942
+ },
2943
+ {
2944
+ "epoch": 2.688311688311688,
2945
+ "grad_norm": 2.4814531803131104,
2946
+ "learning_rate": 2.8898665541089e-05,
2947
+ "loss": 0.1931,
2948
+ "step": 20700
2949
+ },
2950
+ {
2951
+ "epoch": 2.6948051948051948,
2952
+ "grad_norm": 2.977494478225708,
2953
+ "learning_rate": 2.8817494074482938e-05,
2954
+ "loss": 0.1968,
2955
+ "step": 20750
2956
+ },
2957
+ {
2958
+ "epoch": 2.7012987012987013,
2959
+ "grad_norm": 4.158185958862305,
2960
+ "learning_rate": 2.8736322607876882e-05,
2961
+ "loss": 0.19,
2962
+ "step": 20800
2963
+ },
2964
+ {
2965
+ "epoch": 2.707792207792208,
2966
+ "grad_norm": 2.229696750640869,
2967
+ "learning_rate": 2.865515114127082e-05,
2968
+ "loss": 0.1826,
2969
+ "step": 20850
2970
+ },
2971
+ {
2972
+ "epoch": 2.7142857142857144,
2973
+ "grad_norm": 4.098623275756836,
2974
+ "learning_rate": 2.8573979674664765e-05,
2975
+ "loss": 0.1962,
2976
+ "step": 20900
2977
+ },
2978
+ {
2979
+ "epoch": 2.720779220779221,
2980
+ "grad_norm": 3.391296148300171,
2981
+ "learning_rate": 2.8492808208058703e-05,
2982
+ "loss": 0.1868,
2983
+ "step": 20950
2984
+ },
2985
+ {
2986
+ "epoch": 2.7272727272727275,
2987
+ "grad_norm": 3.235856771469116,
2988
+ "learning_rate": 2.8411636741452648e-05,
2989
+ "loss": 0.1757,
2990
+ "step": 21000
2991
+ },
2992
+ {
2993
+ "epoch": 2.7337662337662336,
2994
+ "grad_norm": 2.5943050384521484,
2995
+ "learning_rate": 2.8330465274846586e-05,
2996
+ "loss": 0.1958,
2997
+ "step": 21050
2998
+ },
2999
+ {
3000
+ "epoch": 2.74025974025974,
3001
+ "grad_norm": 2.495835781097412,
3002
+ "learning_rate": 2.824929380824053e-05,
3003
+ "loss": 0.1832,
3004
+ "step": 21100
3005
+ },
3006
+ {
3007
+ "epoch": 2.7467532467532467,
3008
+ "grad_norm": 3.121997833251953,
3009
+ "learning_rate": 2.816812234163447e-05,
3010
+ "loss": 0.1618,
3011
+ "step": 21150
3012
+ },
3013
+ {
3014
+ "epoch": 2.7532467532467533,
3015
+ "grad_norm": 5.540942192077637,
3016
+ "learning_rate": 2.8086950875028413e-05,
3017
+ "loss": 0.1919,
3018
+ "step": 21200
3019
+ },
3020
+ {
3021
+ "epoch": 2.75974025974026,
3022
+ "grad_norm": 2.7872164249420166,
3023
+ "learning_rate": 2.800577940842235e-05,
3024
+ "loss": 0.1709,
3025
+ "step": 21250
3026
+ },
3027
+ {
3028
+ "epoch": 2.7662337662337664,
3029
+ "grad_norm": 5.461463451385498,
3030
+ "learning_rate": 2.7924607941816293e-05,
3031
+ "loss": 0.1815,
3032
+ "step": 21300
3033
+ },
3034
+ {
3035
+ "epoch": 2.7727272727272725,
3036
+ "grad_norm": 2.966292381286621,
3037
+ "learning_rate": 2.7843436475210234e-05,
3038
+ "loss": 0.1738,
3039
+ "step": 21350
3040
+ },
3041
+ {
3042
+ "epoch": 2.779220779220779,
3043
+ "grad_norm": 2.3425800800323486,
3044
+ "learning_rate": 2.7762265008604175e-05,
3045
+ "loss": 0.1631,
3046
+ "step": 21400
3047
+ },
3048
+ {
3049
+ "epoch": 2.7857142857142856,
3050
+ "grad_norm": 5.251804351806641,
3051
+ "learning_rate": 2.7681093541998117e-05,
3052
+ "loss": 0.1725,
3053
+ "step": 21450
3054
+ },
3055
+ {
3056
+ "epoch": 2.792207792207792,
3057
+ "grad_norm": 3.7362570762634277,
3058
+ "learning_rate": 2.7599922075392058e-05,
3059
+ "loss": 0.1681,
3060
+ "step": 21500
3061
+ },
3062
+ {
3063
+ "epoch": 2.7987012987012987,
3064
+ "grad_norm": 3.0322623252868652,
3065
+ "learning_rate": 2.7518750608786003e-05,
3066
+ "loss": 0.1797,
3067
+ "step": 21550
3068
+ },
3069
+ {
3070
+ "epoch": 2.8051948051948052,
3071
+ "grad_norm": 2.0847392082214355,
3072
+ "learning_rate": 2.743757914217994e-05,
3073
+ "loss": 0.1653,
3074
+ "step": 21600
3075
+ },
3076
+ {
3077
+ "epoch": 2.811688311688312,
3078
+ "grad_norm": 4.219809055328369,
3079
+ "learning_rate": 2.7356407675573886e-05,
3080
+ "loss": 0.1599,
3081
+ "step": 21650
3082
+ },
3083
+ {
3084
+ "epoch": 2.8181818181818183,
3085
+ "grad_norm": 1.8284157514572144,
3086
+ "learning_rate": 2.7275236208967823e-05,
3087
+ "loss": 0.1625,
3088
+ "step": 21700
3089
+ },
3090
+ {
3091
+ "epoch": 2.824675324675325,
3092
+ "grad_norm": 2.727473258972168,
3093
+ "learning_rate": 2.7194064742361768e-05,
3094
+ "loss": 0.1662,
3095
+ "step": 21750
3096
+ },
3097
+ {
3098
+ "epoch": 2.8311688311688314,
3099
+ "grad_norm": 2.5827713012695312,
3100
+ "learning_rate": 2.7112893275755706e-05,
3101
+ "loss": 0.1499,
3102
+ "step": 21800
3103
+ },
3104
+ {
3105
+ "epoch": 2.8376623376623376,
3106
+ "grad_norm": 3.354733467102051,
3107
+ "learning_rate": 2.703172180914965e-05,
3108
+ "loss": 0.1609,
3109
+ "step": 21850
3110
+ },
3111
+ {
3112
+ "epoch": 2.844155844155844,
3113
+ "grad_norm": 3.6326746940612793,
3114
+ "learning_rate": 2.695055034254359e-05,
3115
+ "loss": 0.158,
3116
+ "step": 21900
3117
+ },
3118
+ {
3119
+ "epoch": 2.8506493506493507,
3120
+ "grad_norm": 2.440141439437866,
3121
+ "learning_rate": 2.6869378875937534e-05,
3122
+ "loss": 0.1525,
3123
+ "step": 21950
3124
+ },
3125
+ {
3126
+ "epoch": 2.857142857142857,
3127
+ "grad_norm": 4.8725972175598145,
3128
+ "learning_rate": 2.678820740933147e-05,
3129
+ "loss": 0.1458,
3130
+ "step": 22000
3131
+ },
3132
+ {
3133
+ "epoch": 2.8636363636363638,
3134
+ "grad_norm": 3.5552217960357666,
3135
+ "learning_rate": 2.6707035942725416e-05,
3136
+ "loss": 0.154,
3137
+ "step": 22050
3138
+ },
3139
+ {
3140
+ "epoch": 2.87012987012987,
3141
+ "grad_norm": 3.2921078205108643,
3142
+ "learning_rate": 2.6625864476119354e-05,
3143
+ "loss": 0.1453,
3144
+ "step": 22100
3145
+ },
3146
+ {
3147
+ "epoch": 2.8766233766233764,
3148
+ "grad_norm": 4.3363494873046875,
3149
+ "learning_rate": 2.65446930095133e-05,
3150
+ "loss": 0.1412,
3151
+ "step": 22150
3152
+ },
3153
+ {
3154
+ "epoch": 2.883116883116883,
3155
+ "grad_norm": 2.5191147327423096,
3156
+ "learning_rate": 2.6463521542907237e-05,
3157
+ "loss": 0.1572,
3158
+ "step": 22200
3159
+ },
3160
+ {
3161
+ "epoch": 2.8896103896103895,
3162
+ "grad_norm": 2.7854626178741455,
3163
+ "learning_rate": 2.6382350076301182e-05,
3164
+ "loss": 0.1451,
3165
+ "step": 22250
3166
+ },
3167
+ {
3168
+ "epoch": 2.896103896103896,
3169
+ "grad_norm": 3.8455305099487305,
3170
+ "learning_rate": 2.630117860969512e-05,
3171
+ "loss": 0.1502,
3172
+ "step": 22300
3173
+ },
3174
+ {
3175
+ "epoch": 2.9025974025974026,
3176
+ "grad_norm": 2.373847246170044,
3177
+ "learning_rate": 2.6220007143089065e-05,
3178
+ "loss": 0.1422,
3179
+ "step": 22350
3180
+ },
3181
+ {
3182
+ "epoch": 2.909090909090909,
3183
+ "grad_norm": 2.7515058517456055,
3184
+ "learning_rate": 2.6138835676483003e-05,
3185
+ "loss": 0.1495,
3186
+ "step": 22400
3187
+ },
3188
+ {
3189
+ "epoch": 2.9155844155844157,
3190
+ "grad_norm": 1.8583050966262817,
3191
+ "learning_rate": 2.6057664209876947e-05,
3192
+ "loss": 0.1446,
3193
+ "step": 22450
3194
+ },
3195
+ {
3196
+ "epoch": 2.9220779220779223,
3197
+ "grad_norm": 5.156238555908203,
3198
+ "learning_rate": 2.5976492743270885e-05,
3199
+ "loss": 0.1422,
3200
+ "step": 22500
3201
+ },
3202
+ {
3203
+ "epoch": 2.928571428571429,
3204
+ "grad_norm": 3.581411600112915,
3205
+ "learning_rate": 2.589532127666483e-05,
3206
+ "loss": 0.1416,
3207
+ "step": 22550
3208
+ },
3209
+ {
3210
+ "epoch": 2.935064935064935,
3211
+ "grad_norm": 4.647747993469238,
3212
+ "learning_rate": 2.5814149810058768e-05,
3213
+ "loss": 0.1592,
3214
+ "step": 22600
3215
+ },
3216
+ {
3217
+ "epoch": 2.9415584415584415,
3218
+ "grad_norm": 3.7315235137939453,
3219
+ "learning_rate": 2.5732978343452713e-05,
3220
+ "loss": 0.1379,
3221
+ "step": 22650
3222
+ },
3223
+ {
3224
+ "epoch": 2.948051948051948,
3225
+ "grad_norm": 2.825934648513794,
3226
+ "learning_rate": 2.565180687684665e-05,
3227
+ "loss": 0.1412,
3228
+ "step": 22700
3229
+ },
3230
+ {
3231
+ "epoch": 2.9545454545454546,
3232
+ "grad_norm": 2.537278413772583,
3233
+ "learning_rate": 2.5570635410240595e-05,
3234
+ "loss": 0.1422,
3235
+ "step": 22750
3236
+ },
3237
+ {
3238
+ "epoch": 2.961038961038961,
3239
+ "grad_norm": 2.3566558361053467,
3240
+ "learning_rate": 2.5489463943634533e-05,
3241
+ "loss": 0.1251,
3242
+ "step": 22800
3243
+ },
3244
+ {
3245
+ "epoch": 2.9675324675324677,
3246
+ "grad_norm": 3.213268280029297,
3247
+ "learning_rate": 2.5408292477028478e-05,
3248
+ "loss": 0.1481,
3249
+ "step": 22850
3250
+ },
3251
+ {
3252
+ "epoch": 2.974025974025974,
3253
+ "grad_norm": 2.007089614868164,
3254
+ "learning_rate": 2.5327121010422416e-05,
3255
+ "loss": 0.1256,
3256
+ "step": 22900
3257
+ },
3258
+ {
3259
+ "epoch": 2.9805194805194803,
3260
+ "grad_norm": 3.599897623062134,
3261
+ "learning_rate": 2.524594954381636e-05,
3262
+ "loss": 0.1343,
3263
+ "step": 22950
3264
+ },
3265
+ {
3266
+ "epoch": 2.987012987012987,
3267
+ "grad_norm": 2.638766050338745,
3268
+ "learning_rate": 2.51647780772103e-05,
3269
+ "loss": 0.1304,
3270
+ "step": 23000
3271
+ },
3272
+ {
3273
+ "epoch": 2.9935064935064934,
3274
+ "grad_norm": 1.9638385772705078,
3275
+ "learning_rate": 2.5083606610604244e-05,
3276
+ "loss": 0.1278,
3277
+ "step": 23050
3278
+ },
3279
+ {
3280
+ "epoch": 3.0,
3281
+ "grad_norm": 2.8112239837646484,
3282
+ "learning_rate": 2.500243514399818e-05,
3283
+ "loss": 0.1453,
3284
+ "step": 23100
3285
+ },
3286
+ {
3287
+ "epoch": 3.0,
3288
+ "eval_runtime": 182.9419,
3289
+ "eval_samples_per_second": 0.0,
3290
+ "eval_steps_per_second": 0.0,
3291
+ "eval_validation_loss": 6.183200098360758,
3292
+ "step": 23100
3293
+ }
3294
+ ],
3295
+ "logging_steps": 50,
3296
+ "max_steps": 38500,
3297
+ "num_input_tokens_seen": 0,
3298
+ "num_train_epochs": 5,
3299
+ "save_steps": 3850,
3300
+ "stateful_callbacks": {
3301
+ "TrainerControl": {
3302
+ "args": {
3303
+ "should_epoch_stop": false,
3304
+ "should_evaluate": false,
3305
+ "should_log": false,
3306
+ "should_save": true,
3307
+ "should_training_stop": true
3308
+ },
3309
+ "attributes": {}
3310
+ }
3311
+ },
3312
+ "total_flos": 0.0,
3313
+ "train_batch_size": 64,
3314
+ "trial_name": null,
3315
+ "trial_params": null
3316
+ }
checkpoints/checkpoint-23100/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39eedf3e35382ec77ee3bb432638337c7c6e53f79e026b0a2c955d5159b51c58
3
+ size 6225