CIRCL
/

vulnerability-severity-classification-chinese-macbert-base

@@ -1,52 +1,37 @@
 ---
-base_model: hfl/chinese-macbert-base
-datasets:
-- CIRCL/Vulnerability-CNVD
 library_name: transformers
 license: apache-2.0
-metrics:
-- accuracy
 tags:
 - generated_from_trainer
-- text-classification
-- classification
-- nlp
-- chinese
-- vulnerability
-pipeline_tag: text-classification
-language: zh
 model-index:
 - name: vulnerability-severity-classification-chinese-macbert-base
   results: []
 ---
-# VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Chinese Text)
-This model is a fine-tuned version of [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) on the dataset [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD).
-For more information, visit the [this project page](https://www.vulnerability-lookup.org/user-manual/ai/) or the [ML-Gateway GitHub repository](https://github.com/vulnerability-lookup/ML-Gateway), which demonstrates its usage in a FastAPI server.
-## How to use
-You can use this model directly with the Hugging Face `transformers` library for text classification:
-```python
-from transformers import pipeline
-classifier = pipeline(
-    "text-classification",
-    model="CIRCL/vulnerability-severity-classification-chinese-macbert-base"
-)
-# Example usage for a Chinese vulnerability description
-description_chinese = "TOTOLINK A3600R是中国吉翁电子（TOTOLINK）公司的一款6天线1200M无线路由器。TOTOLINK A3600R存在缓冲区溢出漏洞，该漏洞源于/cgi-bin/cstecgi.cgi文件的UploadCustomModule函数中的File参数未能正确验证输入数据的长度大小，攻击者可利用该漏洞在系统上执行任意代码或者导致拒绝服务。"
-result_chinese = classifier(description_chinese)
-print(result_chinese)
-# Expected output example: [{'label': '高', 'score': 0.9802}]
-```
 ## Training procedure
@@ -61,24 +46,20 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 5
-It achieves the following results on the evaluation set:
-- Loss: 0.5997
-- Accuracy: 0.7846
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss | Accuracy |
 |:-------------:|:-----:|:-----:|:---------------:|:--------:|
-| 0.6264        | 1.0   | 3548  | 0.5766          | 0.7565   |
-| 0.5523        | 2.0   | 7096  | 0.5536          | 0.7724   |
-| 0.4184        | 3.0   | 10644 | 0.5440          | 0.7836   |
-| 0.3236        | 4.0   | 14192 | 0.5629          | 0.7889   |
-| 0.2604        | 5.0   | 17740 | 0.5997          | 0.7846   |
 ### Framework versions
-- Transformers 4.57.3
-- Pytorch 2.9.1+cu128
-- Datasets 4.4.2
 - Tokenizers 0.22.2

 ---
 library_name: transformers
 license: apache-2.0
+base_model: hfl/chinese-macbert-base
 tags:
 - generated_from_trainer
+metrics:
+- accuracy
 model-index:
 - name: vulnerability-severity-classification-chinese-macbert-base
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# vulnerability-severity-classification-chinese-macbert-base
+This model is a fine-tuned version of [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.2224
+- Accuracy: 0.7783
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
 ## Training procedure
 - lr_scheduler_type: linear
 - num_epochs: 5
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss | Accuracy |
 |:-------------:|:-----:|:-----:|:---------------:|:--------:|
+| 1.2400        | 1.0   | 3588  | 1.1658          | 0.7567   |
+| 1.1318        | 2.0   | 7176  | 1.1025          | 0.7711   |
+| 1.0106        | 3.0   | 10764 | 1.0848          | 0.7829   |
+| 0.6185        | 4.0   | 14352 | 1.1507          | 0.7807   |
+| 0.6463        | 5.0   | 17940 | 1.2224          | 0.7783   |
 ### Framework versions
+- Transformers 5.3.0
+- Pytorch 2.10.0+cu128
+- Datasets 4.8.3
 - Tokenizers 0.22.2

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:89dffb10a65fc85ee1c8391870d3a1ebf3e131461f24bcc822e370b19481130f
 size 409103292

 version https://git-lfs.github.com/spec/v1
+oid sha256:a5a2edb509486b6ccab75444049c7f84444b1b704a9f1b45ba8568e22276362d
 size 409103292

tokenizer.json CHANGED Viewed

@@ -1,19 +1,7 @@
 {
   "version": "1.0",
-  "truncation": {
-    "direction": "Right",
-    "max_length": 512,
-    "strategy": "LongestFirst",
-    "stride": 0
-  },
-  "padding": {
-    "strategy": "BatchLongest",
-    "direction": "Right",
-    "pad_to_multiple_of": null,
-    "pad_id": 0,
-    "pad_type_id": 0,
-    "pad_token": "[PAD]"
-  },
   "added_tokens": [
     {
       "id": 0,

 {
   "version": "1.0",
+  "truncation": null,
+  "padding": null,
   "added_tokens": [
     {
       "id": 0,

tokenizer_config.json CHANGED Viewed

@@ -1,50 +1,8 @@
 {
-  "added_tokens_decoder": {
-    "0": {
-      "content": "[PAD]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "100": {
-      "content": "[UNK]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "101": {
-      "content": "[CLS]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "102": {
-      "content": "[SEP]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "103": {
-      "content": "[MASK]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    }
-  },
-  "clean_up_tokenization_spaces": false,
   "cls_token": "[CLS]",
   "do_lower_case": true,
-  "extra_special_tokens": {},
   "mask_token": "[MASK]",
   "model_max_length": 1000000000000000019884624838656,
   "pad_token": "[PAD]",

 {
+  "backend": "tokenizers",
   "cls_token": "[CLS]",
   "do_lower_case": true,
+  "is_local": false,
   "mask_token": "[MASK]",
   "model_max_length": 1000000000000000019884624838656,
   "pad_token": "[PAD]",