Deteksi Promosi Judi Online (TF-IDF + Logistic Regression)

Model klasifikasi teks untuk mendeteksi konten promosi judi online berbahasa Indonesia. Model menggunakan pendekatan machine learning klasik dengan preprocessing teks yang cukup agresif untuk menangani obfuscation (leet, unicode homoglyph, emoji, dan variasi ejaan).

Arsitektur

  • TF-IDF word n-gram (1โ€“3)
  • TF-IDF char n-gram (2โ€“4)
  • Logistic Regression (solver: SAGA)
  • Custom preprocessing

Format Model

Repository ini berisi:

  • judol-logreg-tfidf_v1.joblib โ€” model sklearn pipeline
  • preprocessing.py โ€” fungsi preprocessing
  • loadModel.py โ€” helper loader

Cara Pakai (Direkomendasikan)

1. Install dependency

pip install scikit-learn==1.7.2 joblib huggingface_hub unidecode

2. Load model langsung dari Hugging Face

from huggingface_hub import hf_hub_download
import sys, os

repo = "yusara/deteksiPromosiJudiOnline"
path = hf_hub_download(repo, "loadModel.py")
sys.path.append(os.path.dirname(path))

from loadModel import load_model
model = load_model()
model.predict(["slot gacor maxwin terpercaya"])

Output

Output mengikuti format default scikit-learn:

array([1])

Keterangan label:

  • 1 = Judol
  • 0 = Non-judol

Jika ingin output lebih ramah:

text = "Slot gacor hanya di priasolo77"
pred = model.predict([text])[0] # Menampilkan label prediksi
proba = model.predict_proba([text])[0] # Menampilkan probabilitas
label = "Judol" if pred == 1 else "Non-judol"

print(f"Prediksi: {label} probabilitas: {proba[1]} ")

Use Case

  • Moderasi komentar
  • Filtering spam komunitas
  • Dataset labeling
  • Sistem anti-promosi judol

Catatan

  • Model berbasis TF-IDF (bukan deep learning)
  • Bergantung pada preprocessing custom
  • Memerlukan library unidecode saat load model
  • Performa sangat dipengaruhi domain data

License

Apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support