Chinese Financial Sentiment Analysis Model (Crypto Focus)

中文金融情感分析模型（加密货币领域）

模型描述 | Model Description

本模型基于 yiyanghkust/finbert-tone-chinese 经过多轮迭代微调，专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别：正面（Positive）、中性（Neutral）和负面（Negative）。

训练数据经过 Claude AI 逐条人工审阅、纠正标注错误，确保数据质量。

This model is iteratively fine-tuned from yiyanghkust/finbert-tone-chinese, specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It classifies text into three sentiment categories: Positive, Neutral, and Negative.

Training data is manually reviewed and corrected entry-by-entry by Claude AI to ensure annotation quality.

训练数据 | Training Data

数据量 | Size: 2208条人工审阅标注的中文金融新闻 | 2208 manually reviewed Chinese financial news articles
数据来源 | Source: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets
标注方式 | Annotation: 模型预测 + Claude AI 逐条审阅纠正 | Model prediction + Claude AI manual review & correction
数据分布 | Distribution:
- Positive（正面）: 734条 (33.2%)
- Neutral（中性）: 899条 (40.7%)
- Negative（负面）: 575条 (26.0%)

性能指标 | Performance Metrics

在442条测试集上的表现（80/20分层划分） | Performance on 442 test samples (80/20 stratified split):

指标 Metric	数值 Value
准确率 Accuracy	84.84%
F1分数 F1 Score	84.88%
精确率 Precision	85.36%
召回率 Recall	84.84%

各类别详细指标 | Per-class Metrics

类别 Class	Precision	Recall	F1
negative	0.938	0.791	0.858
neutral	0.806	0.878	0.840
positive	0.846	0.857	0.851
weighted avg	0.854	0.848	0.849

性能迭代历史 | Performance History

版本 Version	训练数据 Data	F1 Score	Accuracy
v1.0	500条	61.65%	—
v2.0	1000条	63.65%	64.50%
v3.5	1500条	67.16%	68.33%
v4.0	1700条	70.91%	72.06%
v5.0	2008条	76.88%	77.36%
v6.0	2208条	84.88%	84.84%

使用方法 | Usage

快速开始 | Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 加载模型和分词器 | Load model and tokenizer
model_name = "LocalOptimum/chinese-crypto-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# 分析文本 | Analyze text
text = "比特币突破10万美元创历史新高"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

# 预测 | Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# 结果映射 | Result mapping
labels = ['positive', 'neutral', 'negative']
sentiment = labels[predicted_class]
confidence = predictions[0][predicted_class].item()

print(f"情感: {sentiment}")
print(f"置信度: {confidence:.4f}")

批量处理 | Batch Processing

texts = [
    "币安获得阿布扎比监管授权",
    "以太坊完成Fusaka升级",
    "某交易所遭攻击损失100万美元"
]

inputs = tokenizer(texts, return_tensors="pt", truncation=True,
                   max_length=128, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_classes = torch.argmax(predictions, dim=-1)

labels = ['positive', 'neutral', 'negative']
for text, pred in zip(texts, predicted_classes):
    print(f"{text} -> {labels[pred]}")

训练参数 | Training Configuration

基础模型 | Base Model: yiyanghkust/finbert-tone-chinese（经多轮迭代微调）
训练轮数 | Epochs: 5（Early Stopping patience=3，Epoch 5 达到最佳）
批次大小 | Batch Size: 16
学习率 | Learning Rate: 2e-5
最大序列长度 | Max Length: 128
训练设备 | Device: NVIDIA GeForce RTX 5080 Laptop GPU (16GB)
混合精度 | Mixed Precision: FP16
最佳模型选择 | Best Model: metric_for_best_model='f1'

适用场景 | Use Cases

✅ 加密货币新闻情感分析
✅ 社交媒体舆情监控
✅ 金融市场情绪指标
✅ 实时新闻情感跟踪
✅ 投资决策辅助参考

核心标注原则 | Annotation Principles

加密货币是风险资产（类似美股），不是避险资产（类似黄金）
战争、地缘冲突、关税 → negative（利空风险资产）
平台上线新币种/功能 → neutral（常规运营，非利好）
个人观点/分析师预测 → neutral（主观意见）
明确利好（ETF通过、大额买入、政策支持）→ positive
明确利空（清算、暴跌、诈骗、监管打压）→ negative

局限性 | Limitations

⚠️ 主要针对加密货币领域的金融新闻，其他金融领域可能表现不佳
⚠️ 短文本（少于10字）的分析准确率可能下降
⚠️ 仅支持简体中文
⚠️ 模型不能替代人工判断，仅供参考

许可证 | License

Apache-2.0

引用 | Citation

如果使用本模型，请引用：

@misc{watchtower-sentiment-2026,
  title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
  author={Onefly},
  year={2026},
  howpublished={\url{https://huggingface.co/LocalOptimum/chinese-crypto-sentiment}},
  note={Fine-tuned from yiyanghkust/finbert-tone-chinese, 2208 samples, F1=84.88\%}
}

基础模型 | Base Model

本模型基于以下模型微调：

yiyanghkust/finbert-tone-chinese

感谢原作者的贡献！

更新日志 | Changelog

v6.0 (2026-02-28)

✅ 扩充训练数据至2208条（+200条Claude人工审阅数据）
✅ F1分数大幅提升（76.88% → 84.88%，+8.00%）
✅ 大规模纠正地缘政治/战争新闻标注（97条 positive→negative，修复"美以打击伊朗"系统性错误）
✅ negative recall 显著提升（67.0% → 79.1%，+12.1pp）
✅ 地缘政治专项验证：14条测试全部几乎正确（92.9%），8条战争新闻置信度1.00判为negative

v5.0 (2026-02-28)

✅ 扩充训练数据至2008条（+308条Claude人工审阅数据）
✅ F1分数大幅提升（70.91% → 76.88%，+5.97%）
✅ 纠正模型系统性错误（positive→neutral 过度预测等）
✅ 数据分布优化：negative从362增至431条

v4.0 (2026-02-28)

✅ 扩充训练数据至1700条
✅ F1分数提升（67.16% → 70.91%，+3.75%）
✅ 引入Claude AI逐条审阅标注流程

v3.5 (2026-02-27)

✅ 扩充训练数据至1500条
✅ F1分数提升（63.65% → 67.16%，+3.51%）
✅ 大幅修正战争/地缘冲突→positive的系统性错误

v2.0 (2025-12-09)

✅ 扩充训练数据至1000条
✅ 修正标注错误，提升数据质量
✅ F1分数提升（61.65% → 63.65%，+2.01%）

v1.0 (Initial Release)

基于500条标注数据的初始版本

联系方式 | Contact

如有问题或建议，欢迎提 issue 或 PR。

维护者 | Maintainer: Onefly 最后更新 | Last Updated: 2026-02-28

Downloads last month: 65

Safetensors

Model size

0.1B params

Tensor type

F32

Evaluation results

Accuracy
self-reported

0.848
F1 Score
self-reported

0.849
Precision
self-reported

0.854
Recall
self-reported

0.848