🎓 CV
🤖 > Work experience
- October 2023 - Now | Co-founder and CEO, Adaptive ML.
We are building tooling to enable large language models to learn from production feedback. We are applying methods inspired by RLHF, RLAIF, RLEF to tune models to production, optimizing directly for business outcomes.
Our philosophy: if you can measure it, you can optimize it. Our product, Adaptive Engine, allows companies to better capture the performance of their production deployment (e.g., escalation rate, hallucination rate), and to improve performance continuously with model tuning. Our approach to tuning is highly scalable, enabling personalization not just at the use case level, but at the user level.
We have a strong focus not only on technical excellency, but on building a delightful product experience, with real-world impact. We have raised a $20M seed round from Index Ventures and ICONIQ end of 2023, and are now deploying to our first customers.
- June 2023 - September 2023 | Extreme-Scale Team Lead, 🤗 Hugging Face.
I led a team of 6 researchers and engineers working on pushing the boundaries of open models and tools. Some of that early work eventually resulted in the open-sourcing of datatrove and FineWeb. I also ran the Efficient Systems for Foundation Models workshop and helped with the AI Village CTF @ DEFCON31.
- 2020 - May 2023 | Research Lead, LightOn. LightOn funded my Ph.D.
I led a team of 5-7 researchers, engineers, and interns working on developing, understanding, and improving large language models.
Some highlights from these three years:
- The public release of Falcon-40B and Falcon-180B, state-of-the-art language models with an open license;
- Scaling web data to 5 trillion high-quality tokens with RefinedWeb, matching and outperforming curated corpora with web data alone;
- Better understanding the nature of zero-shot generalization.
We put a strong focus on engineering and tooling. We developped a pipeline to filter and deduplicate trillions of words; trained models with hundreds of billions of parameters with our own custom distributed training framework on supercomputers up to 4,000 A100s; and our inferrer processed billions of tokens every month for our customers. Our total annual compute budget was in excess of 1-2M A100-hour per team member. We have received coverage in Reuters, The Batch, VentureBeat, and ImportAI, and have contributed to the Big Science project.
2019 – 2020 | Machine Learning Research Scientist, LightOn. LightOn is funding my Ph.D.
I worked on expanding the applicability of beyond backpropagation methods to modern deep learning tasks and architectures (see our NeurIPS 2020 paper). I helped with the development of optical computing prototypes, achieving scalable optical training of neural networks of varied architectures. This work has lead to applications of Direct Feedback Alignment to adversarial robustness, as well as differential privacy.
🏫 > Education
2019 – Now | Industrial Ph.D. in Applied Mathematics.
École Normale Supérieure, Paris.
"Principled modeling methods and beyond backpropagation approaches for the large-scale era".
2018 – 2019 | M.Sc. in Climate Science.
École Polytechnique, Palaiseau.
2017 – 2018 | Visiting research student.
City University of Hong Kong, Kowloon.
"Machine learning for solar engineering".
2015 – 2019 | M.Sc. in Civil Engineering.
École Normale Supérieure, Paris-Saclay.
📘 > Publications
See my publications page or my Google Scholar profile.
My research has been featured in The Batch, in Import AI, Yannic Kilcher videos, and news outlets such as Reuters, and VentureBeat (here and there).
🤗 > Service
2024 | Workshop organizer. ES-FoMo II, ICML 2024.
Efficient Systems for Foundation Models.
2023 | Workshop organizer. ES-FoMo, ICML 2023.
Efficient Systems for Foundation Models.
2023 | Reviewer.
Conferences: ICML, NeurIPS, NeurIPS Datasets & Benchmarks.
Workshops: NeurIPS I Can't Believe It's Not Better.
2021 - 2022 | Chair of the Architecture & Scaling Group, 🌸 BigScience.
I chaired the architecture & scaling working group for the Big Science workshop. Our goal was to empirically explore and validate architectural choices for BLOOM, a 176B-parameter open-access multilingual model . We studied considerations around model architecture & training objectives (encoder-decoder vs decoder-only, denoising vs language modelling), embeddings (rotary vs ALiBi), as well as multilinguality.
2022 | Reviewer.
Conferences: NeurIPS, ICML.
Journals: ACM Computing Surveys.
Workshops: NeurIPS I Can't Believe It's Not Better, ACL BigScience.
2021 | Reviewer.
Conferences: NeurIPS (Outstanding Reviewer Award).
December 2019 | Workshop organizer. Future of Random Matrices #4, Paris.
June 2019 | Science crew member, MOOSE-GE scientific campaign.
Mediterranean Sea, Thalassa vessel, 2 weeks.
Double-Diffusive Processes in the Tyrrhenian Sea.
👨🏫 > Talks
September 2024 | Meta AI Startup Program Launch.
State of AI in the EU (panel).
September 2024 | Motier Ventures GenAI Show.
Fireside Chat w/ Baptiste Pannier & Marie Outier.
June 2024 | ES-FoMO II.
Data and Architecture Trends Across Industry and Open Communities (panel moderator)
June 2024 | NP-Hard Foundry.
Fireside Chat w/ Thom Wolf.
December 2023 | NeurIPS Scaling Laws Workshop, MILA.
Challenges in Training Frontier Language Models.
November 2023 | ai-PULSE .
High-Quality Data Need Not Apply (panel).
August 2023 | Applied Machine Learning Days @ EPFL .
Challenges in Training Frontier Large Language Models (talk+panel).
July 2023 | UberAI .
Challenges in Training Large Language Models.
July 2023 | UTTER User Day .
Multimodality is What's Next for Open-Source (talk+panel).
December 2022 | NeurIPS Scaling Laws Workshop, MILA.
High-Quality Data Need Not Apply.
August 2022 | Translate Theory Reading Group, Google Research .
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
June 2022 | Machine Learning College, G-Research .
Lessons from training a massively multilingual 176B model.
May 2022 | Neural Scaling Seminar, MILA.
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
May 2022 | Challenges & Perspectives in Creating Large Language Models, ACL.
Lessons from Training a Massively Multilingual 176B Model.
April 2022 | Sharing Session, Naver AI Labs.
Demystifying Extreme-Scale Training.
March 2022 | GTC, NVIDIA.
NLP Beyond English: Training Extreme-scale Language Models with Megatron for French, and More.
December 2021 | BigScience Episode #3, NeurIPS.
Identifying the Best Architecture for a >100B Model.
September 2021 | BigScience Episode #2, INLG.
You Only Train Once: Making Architectural Decisions for a >100B model.
July 2021 | Big Science Episode #1, ELLIS.
Architectural Decisions at the 100B scale.
July 2021 | Hong Kong ML Meetup S3E12.
Extreme-scale: Trends & Perspectives.
May 2021 | Paris NLP Meetup S5E5.
PAGnol: a French Extreme-Scale Model.
December 2020 | Sharing Session, Autodesk AI Lab.
Learning and Scaling Beyond Backpropagation and Beyond Silicon.
December 2020 | Les Déjeuners NeurIPS, Paris Machine Learning Meetup.
Direct Feedback Alignment: Scaling and Perspectives.
May 2019 | Future of Random Matrices #3.
Principled Training of Neural Networks with Direct Feedback Alignment.
January 2017 | Paris Machine Learning Meetup S5E4.
Cracking Crack Mechanics with GANs.
December 2017 | TensorFlow Paris Meetup.
Lifelike Concrete Cracking Patterns using TensorFlow & GANs.
🎫 > Beyond work
- 🤿 | Diving. I am a passionate DIR diver.
- 📸 | Photography. Particular underwater photography.
- 👨🏻🍳 | Cooking. In particular sous-vide cooking, new cookery, and holistic cuisine. During 2020 lockdowns, I cooked my way through the Fat Duck Cookbook & the Eleven Madison Park cookbooks.