Phase-Technologies
/

netuark-classifier-ensemble

Text Classification

Eval Results (legacy)

Model card Files Files and versions

netuark-classifier-ensemble / README.md

PhaseCEO's picture

Update README.md

ddf6930 verified 20 days ago

|

1.7 kB

	---
	datasets:
	- Xerv-AI/netuark-posts-6000
	---

	# NetuArk Posts Classifier (Ensemble Architecture)

	This model is a novel ensemble classifier designed to categorize technology-related social media posts into their respective news sources.
	The model is trained to classify the following sources:
	- ArsTechnica
	- FT
	- GuardianTech
	- HackerNews
	- Slashdot
	- TechCrunch
	- TheVerge
	-
	## Model Details
	- Architecture: Voting Classifier (Multinomial Naive Bayes + Logistic Regression)
	- Vectorization: TF-IDF (N-grams 1-3)
	- Accuracy: 99.81% on the NetuArk-6000 dataset.
	- Classes: HackerNews, TechCrunch, TheVerge, FT, GuardianTech, Slashdot, ArsTechnica.

	## Training Data
	Trained on the [Xerv-AI/netuark-posts-6000](https://huggingface.co/datasets/Xerv-AI/netuark-posts-6000) dataset.

	## Usage
	```python
	import joblib
	import os
	from huggingface_hub import hf_hub_download

	# Define the missing custom function required by the unpickler
	def advanced_clean(text):
	return text

	# Assign it to __main__ to ensure joblib can find it during loading
	import __main__
	__main__.advanced_clean = advanced_clean

	# Repository and filename
	repo_id = 'Phase-Technologies/netuark-classifier-ensemble'
	filename = 'netuark_ensemble_classifier.joblib'

	try:
	# Download the file from Hugging Face
	file_path = hf_hub_download(repo_id=repo_id, filename=filename)

	# Load the model
	model = joblib.load(file_path)
	prediction = model.predict(["📰 Perplexity's 'Personal Computer' Lets AI Agents Access Your Local Files #slashdot"])
	print(f"Prediction: {prediction}")
	except Exception as e:
	import traceback
	print(f"An error occurred: {e}")
	traceback.print_exc()
	```