๐
Projects
Available for live demonstration upon request. Combines semantic search across 753K Wikipedia vectors with Qwen 2.5 72B inference. Demonstrates context retrieval, prompt engineering, and multi-modal I/O with efficient resource utilization.
Qdrant
vLLM
Qwen 2.5 72B
Streamlit
AWQ
Inference optimization framework comparing quantization strategies (AWQ, GPTQ, INT8), latency/throughput profiling, and cost-per-inference metrics across different models and hardware.
vLLM
AutoGPTQ
Profiling
Monitoring
Distributed fine-tuning orchestration with LoRA/QLoRA support, multi-GPU training, experiment tracking, and automated hyperparameter sweeps for efficient model adaptation.
PyTorch
LoRA
Weights & Biases
Distributed
Pipeline automation for data preprocessing, model training, evaluation, and deployment. Git-driven workflows with automated CI/CD for reproducibility and version control.
Ansible
Gitea
CI/CD
MLOps