Remove emoji checkmarks and warning signs

Files changed (1) hide show

README.md CHANGED Viewed

@@ -190,9 +190,9 @@ for label, names in sorted(grouped.items()):
 | Parameter | Value |
 |---|---|
 | Learning rate | 2e-05 |
-| Batch size | 16 (×2 gradient accumulation = 32 effective) |
 | Epochs | 3 |
-| Optimizer | AdamW (β₁=0.9, β₂=0.999, ε=1e-08) |
 | LR scheduler | Cosine with 10% warmup |
 | Seed | 42 |
@@ -206,19 +206,19 @@ for label, names in sorted(grouped.items()):
 **Note:** The best checkpoint (epoch ~2, lowest validation loss 0.0606) was selected as the final model, achieving **90.6% F1**.
-## Strengths & Limitations
 ### Strengths
-- ✅ **Cross-domain**: Works on patents, papers, news, and political documents with a single model
-- ✅ **Multilingual**: Handles both English and German text
-- ✅ **Rich entity types**: 15 entity types covering people, organizations, locations, biological entities, diseases, instruments, and more
-- ✅ **Fast**: ~5ms per document on CPU — suitable for processing millions of documents
-- ✅ **Long context**: Inherits ModernBERT's 8,192 token context window
 ### Limitations
-- ⚠️ **Conference/product names**: May fragment uncommon compound names (e.g., "NeurIPS" → split tokens) — use confidence thresholding (>0.5) to filter
-- ⚠️ **Languages**: Optimized for English and German; other languages may work but are untested
-- ⚠️ **Domain drift**: Performance is best on patent, scientific, political, and news text — may degrade on informal text (social media, chat)
 ## Recommended Post-Processing

 | Parameter | Value |
 |---|---|
 | Learning rate | 2e-05 |
+| Batch size | 16 (x2 gradient accumulation = 32 effective) |
 | Epochs | 3 |
+| Optimizer | AdamW |
 | LR scheduler | Cosine with 10% warmup |
 | Seed | 42 |
 **Note:** The best checkpoint (epoch ~2, lowest validation loss 0.0606) was selected as the final model, achieving **90.6% F1**.
+## Strengths and Limitations
 ### Strengths
+- **Cross-domain**: Works on patents, papers, news, and political documents with a single model
+- **Multilingual**: Handles both English and German text
+- **Rich entity types**: 15 entity types covering people, organizations, locations, biological entities, diseases, instruments, and more
+- **Fast**: ~5ms per document on CPU — suitable for processing millions of documents
+- **Long context**: Inherits ModernBERT's 8,192 token context window
 ### Limitations
+- **Conference/product names**: May fragment uncommon compound names (e.g., "NeurIPS" split into tokens) — use confidence thresholding (>0.5) to filter
+- **Languages**: Optimized for English and German; other languages may work but are untested
+- **Domain drift**: Performance is best on patent, scientific, political, and news text — may degrade on informal text (social media, chat)
 ## Recommended Post-Processing