AI-Powered Cyber Threat Intelligence System

Overview

A cutting-edge Cyber Threat Intelligence (CTI) system that leverages Natural Language Processing (NLP) and AI-based classification to extract meaningful cyber threat indicators from unstructured text, categorize threat types, predict severity levels, and visualize insights through an interactive web interface.

This platform is designed for cybersecurity analysts and SOC teams to triage, investigate, and act on threat intelligence — all within one comprehensive dashboard.

Left: The main dashboard interface showing real-time threat analysis. Right: Detailed analysis results with entity extraction and classification.

Key Features

🔍 Named Entity Recognition (NER)

Extracts IOCs (IP addresses, malware names, CVEs, domains)
Uses BERT-based transformers for cybersecurity-specific entities
Identifies threat actors, organizations, and geographical locations

🎯 Threat Classification

Categorizes threats into: Phishing, Malware, APTs, Ransomware
Ensemble model combining XGBoost and Logistic Regression
High accuracy with interpretable feature importance

⚠️ Severity Level Prediction

Automated risk assessment (Low, Medium, High)
Multi-feature analysis including IOC count and sentiment
Random Forest classifier with keyword-based features

📊 Interactive Dashboard

Real-time threat analysis interface
Expandable result cards with detailed breakdowns
Downloadable reports and visualizations

Technical Architecture

System architecture showing the NLP pipeline, ML models, and web interface components.

Technology Stack

Category	Technologies
Backend	FastAPI, Python, Uvicorn
NLP Models	spaCy, HuggingFace Transformers, BERT
ML Libraries	Scikit-learn, XGBoost, PyTorch
Frontend	HTML5, Bootstrap 5, JavaScript, Jinja2
Deployment	Docker, Docker Compose
Data Storage	CSV, JSON, Pickle

Machine Learning Pipeline

1. Ensemble Threat Classification

# Combining multiple models for robust classification
ensemble_models = {
    'xgboost': XGBClassifier(),
    'logistic': LogisticRegression(),
    'random_forest': RandomForestClassifier()
}

2. Advanced NER with BERT

Model: dslim/bert-base-NER from HuggingFace
Fine-tuned on cybersecurity datasets
Extracts entities: ORG, LOC, PER, MISC with cybersecurity context

3. Severity Prediction Features

IOC count (IP addresses, domains, CVEs)
Named entity frequency
Sentiment analysis scores
Keyword matching with threat vocabulary
Text complexity metrics

Results & Performance

The system demonstrates high accuracy across all components:

Threat Classification: 94.2% accuracy with ensemble approach
Severity Prediction: 89.7% accuracy on test dataset
NER Performance: 92.1% F1-score for cybersecurity entities
Real-time Processing: <2 seconds average response time

Performance metrics showing classification accuracy, precision, and recall across different threat categories.

Implementation Highlights

Real-time Threat Analysis API

curl -X POST http://localhost:8000/analyze \
    -H "Content-Type: application/json" \
    -d '{"text": "QakBot malware exploited CVE-2023-1234 via phishing"}'

Docker Deployment

version: '3.8'
services:
  cti-nlp:
    build: .
    ports:
      - "8000:8000"
    environment:
      - PYTHONPATH=/app

Future Enhancements

Planned Features

Real-time Data Ingestion: Integration with threat feeds and social media APIs
Knowledge Graph: Visualization of threat actor relationships
Automated Response: IOC blocking and SIEM integration
Multi-language Support: Analysis of threats in multiple languages
Advanced Visualization: Interactive threat maps and timeline analysis

Research Directions

Zero-shot threat classification using GPT models
Adversarial training for improved robustness
Time-series analysis for threat trend prediction
Integration with MITRE ATT&CK framework

Academic Impact

This project contributes to cybersecurity research by:

Demonstrating effective ensemble learning for threat classification
Providing open-source tools for CTI analysis
Establishing benchmarks for NLP in cybersecurity
Creating datasets for future research

Getting Started

Clone Repository

git clone https://github.com/sanjanb/cti-nlp-system.git
cd cti-nlp-system

Setup Environment

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Run Application
```
uvicorn backend.main:app --reload
```

Visit the GitHub repository for complete documentation, setup guides, and contribution guidelines.

Team & Collaboration

Development Team:

Sanjan B M - Lead Developer & ML Engineer
Kushal S M - Frontend & API Development
Ponnanna K V - Data Engineering & Testing
Vishnu S - Documentation & DevOps
Prof. Khateeja Ambreen - Project Guide

Institution: ATME College of Engineering, Mysuru
Department: Computer Science & Engineering (AI & ML)
Duration: 2024-2025 (Final Year Project)

This project demonstrates the practical application of AI and NLP in cybersecurity, contributing to the advancement of automated threat intelligence systems.