San Francisco
We're pushing the boundaries of language and sequence modelling, and we want someone who can do it all — from research and experimentation through to shipping transformer-based models at scale in production.
We’re building intelligent systems that push the boundaries of what’s possible with language and sequence modelling. As our new ML Engineer, you’ll own the full lifecycle — from research and experimentation to production deployment — of transformer-based models at scale.
What you’ll do
· Design, fine-tune, and evaluate large-scale transformer architectures (BERT, GPT, T5, and beyond)
· Lead end-to-end ML pipelines: data curation, training, optimisation, and serving
· Apply techniques such as LoRA, RLHF, and quantisation to improve model efficiency and alignment
· Collaborate with product and infrastructure teams to ship models that solve real-world problems
· Stay current with research; translate papers into practical, production-ready implementations
What we’re looking for
· 3+ years of hands-on experience with transformer-based models in NLP, CV, or multi-modal settings
· Strong proficiency in Python; deep familiarity with PyTorch and the Hugging Face ecosystem
· Solid understanding of attention mechanisms, positional encodings, and training dynamics
· Experience deploying models to production (e.g. via ONNX, TorchServe, vLLM, or similar)
Bonus: publications, open-source contributions, or experience with distributed training (e.g. DeepSpeed, FSDP)