--- language: en license: mit library_name: sklearn tags: - text-classification - ensemble - scikit-learn datasets: - Xerv-AI/netuark-posts-6000 model_details: parameters: {params} model-index: - name: netuark-classifier-ensemble results: - task: type: text-classification dataset: name: netuark-posts-6000 type: Xerv-AI/netuark-posts-6000 metrics: - type: accuracy value: 93.75 --- # NetuArk Posts Classifier (Ensemble Architecture) This model is a ensemble classifier designed to categorize technology-related social media posts into their respective news sources. The model is trained to classify the following sources: - ArsTechnica - FT - GuardianTech - HackerNews - Slashdot - TechCrunch - TheVerge - ## Model Details - **Architecture:** Voting Classifier (Multinomial Naive Bayes + Logistic Regression) - **Vectorization:** TF-IDF (N-grams 1-3) - **Accuracy:** 94.81% on the NetuArk-6000 dataset. - **Classes:** HackerNews, TechCrunch, TheVerge, FT, GuardianTech, Slashdot, ArsTechnica. ## Training Data Trained on the [Xerv-AI/netuark-posts-6000](https://huggingface.co/datasets/Xerv-AI/netuark-posts-6000) dataset. ## Usage ```python import joblib import os from huggingface_hub import hf_hub_download # Define the missing custom function required by the unpickler def advanced_clean(text): return text # Assign it to __main__ to ensure joblib can find it during loading import __main__ __main__.advanced_clean = advanced_clean # Repository and filename repo_id = 'Phase-Technologies/netuark-classifier-ensemble' filename = 'netuark_ensemble_classifier.joblib' try: # Download the file from Hugging Face file_path = hf_hub_download(repo_id=repo_id, filename=filename) # Load the model model = joblib.load(file_path) prediction = model.predict(["📰 Perplexity's 'Personal Computer' Lets AI Agents Access Your Local Files #slashdot"]) print(f"Prediction: {prediction}") except Exception as e: import traceback print(f"An error occurred: {e}") traceback.print_exc() ```