Cyberbullying Detection System — Real-Time Toxic Comment Classifier
Classify comments as Harassment, Hate Speech, Threat, or Safe. Uses TF-IDF/Embeddings with Naïve Bayes/SVM/LSTM and a Flask dashboard for live analytics.
- ✓Context-aware detection (slang, sarcasm patterns)
- ✓Realtime API + web dashboard for moderators
- ✓Retraining pipeline for continuous improvement
1. Introduction
The Cyberbullying Detection System Using Comment Analysis is a Python-based AI project that identifies abusive, hateful, or threatening comments on social platforms. It leverages NLP and Machine Learning to understand linguistic context (not just keywords), enabling real-time moderation support and safer online communities. A Flask dashboard shows live classifications and confidence, helping moderators act quickly and consistently.
2. Existing System vs Proposed System
- Manual moderation is slow and inconsistent
- Keyword filters miss context/sarcasm
- No adaptation to new slang or patterns
- NLP + ML (NB/SVM/LSTM) for contextual detection
- Sentiment + semantic feature analysis
- Flags Harassment/Hate/Threat vs Safe
- Admin dashboard with analytics & reports
- Retraining loop for continuous learning
3. Working
- Data Collection: Labeled comment datasets (Kaggle/Twitter/Reddit).
- Pre-processing: Clean text, normalize case, remove stop words/emojis.
- Feature Extraction: TF-IDF or embeddings (Word2Vec/Glove).
- Model Training: Train NB/SVM/LSTM classifier on labeled data.
- Real-Time Prediction: API scores new comments instantly.
- Result Output: Category + confidence, with mod alerts.
4. Technology Stack
- Language: Python
- Libraries: NLTK, Scikit-learn, TensorFlow/Keras, Pandas, NumPy, Flask
- Algorithms: Naïve Bayes, Logistic Regression, SVM, or LSTM
- Dataset: Kaggle Cyberbullying Tweets or similar comment datasets
- Interface: Flask dashboard for live analysis
- Storage: SQLite3/CSV for logs & results
5. Modules
Clean & tokenize text.
- Stopword removal
- Emoji/special filtering
TF-IDF / embeddings.
- N-grams
- Sentiment/semantic cues
Build & validate.
- NB/SVM/LSTM
- Metrics & tuning
Realtime scoring.
- Category + confidence
- Thresholds & alerts
Analytics & review.
- Flags & moderation queue
- Charts & exports
Summaries & logs.
- Period reports
- CSV/JSON export
6. Advantages
- Automatic, real-time toxic content detection
- Understands context better than keyword filters
- Reduces moderator workload with higher accuracy
- Improves user safety across platforms
- Supports multilingual detection with proper training
7. Applications
- Social media and community platforms
- School/college forums and LMS portals
- NGO/government anti-cyberbullying initiatives
- Chat/gaming platforms and discussion boards
Python Integration Sketch (Flask + NLP + ML)
# 1) Preprocess & vectorize
def preprocess(txt):
txt = normalize(txt) # lowercase, punctuation, emoji removal
tokens = tokenize(txt) # NLTK/regex
tokens = drop_stopwords(tokens) # NLTK stopwords
return " ".join(tokens)
X_train = vectorizer.fit_transform(preprocess(t) for t in train_texts)
model = train_clf(X_train, y_train) # NB/SVM or build LSTM pipeline
# 2) Realtime API
@app.post("/predict")
def predict():
txt = request.json["comment"]
x = vectorizer.transform([preprocess(txt)])
y = model.predict(x)[0]
p = model.predict_proba(x).max()
return {"category": y, "confidence": float(p)}
# 3) Dashboard stream
# Moderators see flagged comments, filter by category/confidence, export reports.
What You Get
| Item | Included | Notes |
|---|---|---|
| Python Source Code | ✅ | NLP preprocessing, model, API |
| Flask Dashboard | ✅ | Live predictions & analytics |
| ML Models | ✅ | NB/SVM baseline + LSTM option |
| Training Notebook | ✅ | Tuning & evaluation |
| Demo Video | ✅ | Setup & working walkthrough |
| Report & PPT | ✅ | College-format templates |
| Support | ✅ | Installation + viva Q&A (1 month) |
FAQs — Cyberbullying Detection
Want a production-like NLP project?
Get the Cyberbullying Detection System with code, demo, docs, and support.
WhatsApp Us Now
