SGLang MLX backend
Lead contributor to the MLX backend. MoE performance optimization for Apple Silicon, with merged kernels for fused SwiGLU and quantized matvec.
Jae L.
Interested in building GPU kernels and ML Systems.
Cross-substrate work: NVFP4 on NVIDIA's Blackwell, MXFP4 on AMD's MI355X, and a custom MoE Metal kernel on Apple silicon. Working across all three, I built mental models of how each architecture ticks and where they diverge and align.
Open source contributor to SGLang's MLX backend.
Lead contributor to the MLX backend. MoE performance optimization for Apple Silicon, with merged kernels for fused SwiGLU and quantized matvec.
A fused Triton quant and shuffle kernel on MI355X, ranked on the AMD x GPU MODE leaderboard through per shape tile selection and hardcoded dispatch.
A small inference engine built on consumer hardware, benchmarking the path from cold start to first token.
Notes and reading, kept as a Zettelkasten.