ML Infrastructure Lab
Self-hosted experimentation platform for model training, inference, and evaluation
โš™๏ธ Infrastructure
Compute
Threadripper + Multi-Node
Orchestration
Proxmox VE + LXC
Vector Database
Qdrant (753K vectors)
LLM Runtime
vLLM + AWQ Quantization
Networking
25/100GbE + pfSense
Reverse Proxy
Caddy + Cloudflare CDN

Production-grade infrastructure with containerization, load balancing, and SSL/TLS automation.

๐Ÿš€ Projects
๐Ÿ’ฌ

RAG Chat System

๐Ÿ“น Demo Available
Available for live demonstration upon request. Combines semantic search across 753K Wikipedia vectors with Qwen 2.5 72B inference. Demonstrates context retrieval, prompt engineering, and multi-modal I/O with efficient resource utilization.
Qdrant vLLM Qwen 2.5 72B Streamlit AWQ
๐Ÿ“Š

Model Benchmarking

๐Ÿ”จ In Progress
Inference optimization framework comparing quantization strategies (AWQ, GPTQ, INT8), latency/throughput profiling, and cost-per-inference metrics across different models and hardware.
vLLM AutoGPTQ Profiling Monitoring
๐Ÿ”ง

Fine-tuning Pipeline

๐Ÿ”จ Coming Soon
Distributed fine-tuning orchestration with LoRA/QLoRA support, multi-GPU training, experiment tracking, and automated hyperparameter sweeps for efficient model adaptation.
PyTorch LoRA Weights & Biases Distributed
๐Ÿ“ˆ

Training Orchestration

๐Ÿ”จ Coming Soon
Pipeline automation for data preprocessing, model training, evaluation, and deployment. Git-driven workflows with automated CI/CD for reproducibility and version control.
Ansible Gitea CI/CD MLOps
๐Ÿ”— Services