ettin-17m-nemotron-pii model
Light Weight PII Detection Model | Open Source | 17M Parameters | 94.21 F1 Score
Overview
Ettin-17m-nemotron-pii is based on the ettin-encoder-17M model and fine-tuned over the Nemotron PII dataset. This model can detect 50+ PII entities in both structured and unstructured texts across various domains like healthcare, finance, legal, cybersecurity etc. With just 17M parameters, the model achieves a strong F1-score of 94.21.
Key Features
- Achieves strong F1-score of 94.21 with just 17M parameters.
- Detects 50+ PII entities in both structured and unstructured texts.
- Handles text in various domains like healthcare, finance, legal etc.
Supported PII Entity Types
This model can detect the following 55 PII entity types
PII entity types with description
| Entity | Description |
|---|---|
| account_number | Account Number |
| age | Age |
| api_key | API Key |
| bank_routing_number | Bank Routing Number |
| biometric_identifier | Biometric Identifier |
| blood_type | Blood Type |
| certificate_license_number | Certificate or License Number |
| city | City |
| company_name | Company Name |
| coordinate | Geographic Coordinate |
| country | Country |
| county | County |
| credit_debit_card | Credit or Debit Card Number |
| customer_id | Customer ID |
| cvv | Card Verification Value (CVV) |
| date | Date |
| date_of_birth | Date of Birth |
| date_time | Date and Time |
| device_identifier | Device Identifier |
| education_level | Education Level |
| Email Address | |
| employee_id | Employee ID |
| employment_status | Employment Status |
| fax_number | Fax Number |
| first_name | First Name |
| gender | Gender |
| health_plan_beneficiary_number | Health Plan Beneficiary Number |
| http_cookie | HTTP Cookie |
| ipv4 | IPv4 Address |
| ipv6 | IPv6 Address |
| language | Language |
| last_name | Last Name |
| license_plate | Vehicle License Plate |
| mac_address | MAC Address |
| medical_record_number | Medical Record Number |
| national_id | National Identification Number |
| occupation | Occupation |
| password | Password |
| phone_number | Phone Number |
| pin | Personal Identification Number (PIN) |
| political_view | Political View |
| postcode | Postcode / Zip Code |
| race_ethnicity | Race or Ethnicity |
| religious_belief | Religious Belief |
| sexuality | Sexuality / Sexual Orientation |
| ssn | Social Security Number |
| state | State |
| street_address | Street Address |
| swift_bic | SWIFT / BIC Code |
| tax_id | Tax Identification Number |
| time | Time |
| unique_id | Unique Identifier |
| url | URL / Web Address |
| user_name | Username |
| vehicle_identifier | Vehicle Identification Number (VIN) |
Usage
# First install Hugging Face transformers library
!pip install transformers
# Initialize and run the PII detection pipeline to extract PII entities
from transformers import pipeline
## Initialize the PII detection pipeline
ner = pipeline("ner", model="kalyan-ks/ettin-17m-nemotron-pii", aggregation_strategy="simple")
input_text = "Kalyan KS is from India. His email id is kalyan.ks@yahoo.com"
## Run the PII detection pipeline
pii_entities = ner(input_text)
## Display the extracted PII entities
for entity in pii_entities:
print(f"{entity['entity_group']}: {entity['word']} (Score:{entity['score']:.2f})")
Evaluation
This model is evaluated on a 10k sample test set from Neomotron PII dataset and achieved the following results
| Metric | Score |
|---|---|
| F1 | 94.21 |
| Precision | 94.48 |
| Recall | 93.93 |
| Accuracy | 98.94 |
Top Performing PII Entity Types
| Entity | Precision | Recall | F1 |
|---|---|---|---|
| date_of_birth | 0.9915 | 0.9960 | 0.9938 |
| 0.9921 | 0.9926 | 0.9924 | |
| biometric_identifier | 0.9896 | 0.9951 | 0.9924 |
| employee_id | 0.9873 | 0.9918 | 0.9895 |
| vehicle_identifier | 0.9864 | 0.9904 | 0.9884 |
| mac_address | 0.9825 | 0.9929 | 0.9877 |
| ipv6 | 0.9807 | 0.9946 | 0.9876 |
| health_plan_beneficiary_number | 0.9953 | 0.9788 | 0.9869 |
| coordinate | 0.9766 | 0.9943 | 0.9854 |
| medical_record_number | 0.9898 | 0.9799 | 0.9848 |
Challenging PII Entity Types
| Entity | Precision | Recall | F1 |
|---|---|---|---|
| occupation | 0.6747 | 0.4643 | 0.5500 |
| time | 0.8499 | 0.7607 | 0.8028 |
| political_view | 0.8202 | 0.8047 | 0.8124 |
| race_ethnicity | 0.8170 | 0.8485 | 0.8324 |
| state | 0.8550 | 0.8135 | 0.8337 |
| age | 0.8307 | 0.8442 | 0.8374 |
| company_name | 0.8386 | 0.8392 | 0.8389 |
| city | 0.8514 | 0.8613 | 0.8563 |
| fax_number | 0.8752 | 0.8406 | 0.8576 |
| national_id | 0.8458 | 0.8716 | 0.8585 |
Limitations
- Language: This model works well only for English language texts.
- Challenging PII Entity Types: Some of the entity types like
occupationhas low F1 score.
Citation
@misc{ettin-17m-pii-2026,
title = {ettin-17m-nemotron-pii-2026: PII Detection Model},
author = {Kalyan KS},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/kalyan-ks/ettin-17m-nemotron-pii}
}
- Downloads last month
- 78
Model tree for kalyan-ks/ettin-17m-nemotron-pii
Base model
jhu-clsp/ettin-encoder-17mDataset used to train kalyan-ks/ettin-17m-nemotron-pii
Evaluation results
- F1 (micro) on nvidia/Nemotron-PIIself-reported0.942
- Precision on nvidia/Nemotron-PIIself-reported0.945
- Recall on nvidia/Nemotron-PIIself-reported0.939