File size: 2,072 Bytes
744a364 a6a8202 ae2a7c4 a6a8202 744a364 73f5987 54aed75 744a364 73f5987 38d992f 73f5987 b31d432 744a364 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | ---
language: en
license: mit
library_name: sklearn
tags:
- text-classification
- ensemble
- scikit-learn
datasets:
- Xerv-AI/netuark-posts-6000
model_details:
parameters: {params}
model-index:
- name: netuark-classifier-ensemble
results:
- task:
type: text-classification
dataset:
name: netuark-posts-6000
type: Xerv-AI/netuark-posts-6000
metrics:
- type: accuracy
value: 93.75
---
# NetuArk Posts Classifier (Ensemble Architecture)
This model is a ensemble classifier designed to categorize technology-related social media posts into their respective news sources.
The model is trained to classify the following sources:
- ArsTechnica
- FT
- GuardianTech
- HackerNews
- Slashdot
- TechCrunch
- TheVerge
-
## Model Details
- **Architecture:** Voting Classifier (Multinomial Naive Bayes + Logistic Regression)
- **Vectorization:** TF-IDF (N-grams 1-3)
- **Accuracy:** 94.81% on the NetuArk-6000 dataset.
- **Classes:** HackerNews, TechCrunch, TheVerge, FT, GuardianTech, Slashdot, ArsTechnica.
## Training Data
Trained on the [Xerv-AI/netuark-posts-6000](https://huggingface.co/datasets/Xerv-AI/netuark-posts-6000) dataset.
## Usage
```python
import joblib
import os
from huggingface_hub import hf_hub_download
# Define the missing custom function required by the unpickler
def advanced_clean(text):
return text
# Assign it to __main__ to ensure joblib can find it during loading
import __main__
__main__.advanced_clean = advanced_clean
# Repository and filename
repo_id = 'Phase-Technologies/netuark-classifier-ensemble'
filename = 'netuark_ensemble_classifier.joblib'
try:
# Download the file from Hugging Face
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
# Load the model
model = joblib.load(file_path)
prediction = model.predict(["📰 Perplexity's 'Personal Computer' Lets AI Agents Access Your Local Files #slashdot"])
print(f"Prediction: {prediction}")
except Exception as e:
import traceback
print(f"An error occurred: {e}")
traceback.print_exc()
``` |