File size: 2,072 Bytes
744a364
a6a8202
 
 
 
 
 
 
ae2a7c4
 
 
 
a6a8202
 
 
 
 
 
 
 
 
 
 
744a364
73f5987
 
54aed75
744a364
 
 
 
 
 
 
 
 
73f5987
 
 
38d992f
73f5987
 
 
 
 
 
 
 
b31d432
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
744a364
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
language: en
license: mit
library_name: sklearn
tags:
- text-classification
- ensemble
- scikit-learn
datasets:
- Xerv-AI/netuark-posts-6000
model_details:
  parameters: {params}
model-index:
- name: netuark-classifier-ensemble
  results:
  - task:
      type: text-classification
    dataset:
      name: netuark-posts-6000
      type: Xerv-AI/netuark-posts-6000
    metrics:
      - type: accuracy
        value: 93.75
---
# NetuArk Posts Classifier (Ensemble Architecture)

This model is a ensemble classifier designed to categorize technology-related social media posts into their respective news sources.
The model is trained to classify the following sources:
- ArsTechnica
- FT
- GuardianTech
- HackerNews
- Slashdot
- TechCrunch
- TheVerge
- 
## Model Details
- **Architecture:** Voting Classifier (Multinomial Naive Bayes + Logistic Regression)
- **Vectorization:** TF-IDF (N-grams 1-3)
- **Accuracy:** 94.81% on the NetuArk-6000 dataset.
- **Classes:** HackerNews, TechCrunch, TheVerge, FT, GuardianTech, Slashdot, ArsTechnica.

## Training Data
Trained on the [Xerv-AI/netuark-posts-6000](https://huggingface.co/datasets/Xerv-AI/netuark-posts-6000) dataset.

## Usage
```python
import joblib
import os
from huggingface_hub import hf_hub_download

# Define the missing custom function required by the unpickler
def advanced_clean(text):
    return text

# Assign it to __main__ to ensure joblib can find it during loading
import __main__
__main__.advanced_clean = advanced_clean

# Repository and filename
repo_id = 'Phase-Technologies/netuark-classifier-ensemble'
filename = 'netuark_ensemble_classifier.joblib'

try:
    # Download the file from Hugging Face
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)

    # Load the model
    model = joblib.load(file_path)
    prediction = model.predict(["📰 Perplexity's 'Personal Computer' Lets AI Agents Access Your Local Files #slashdot"])
    print(f"Prediction: {prediction}")
except Exception as e:
    import traceback
    print(f"An error occurred: {e}")
    traceback.print_exc()
```