The Model She Built Alone

How a former research intern trained a 7-billion-parameter language model on a single rented GPU cluster, what it cost her, and what she would do differently.

Imani Brooks

Published

May 14, 2026

Issue

01 · May 2026

The Model She Built Alone — Photograph for Build With Her Magazine · Issue 01 · May 14, 2026

Editor’s note:Imani Brooks asked to write this in her own voice. We said yes without hesitation. The industry needs more first-person accounts of what training a model alone really costs.

I want to tell you the honest version of this story, not the version where I present the finished thing and imply the path was clear. The path was not clear. I spent four months training a 7-billion-parameter language model on a rented GPU cluster, mostly alone, mostly at night, mostly convinced I was doing something fundamentally wrong. I want to tell you what I actually learned.

The setup: I had left a research internship at a large AI lab with a specific frustration. The models I had been working on were excellent, but they were built for research metrics, not for the application I had in mind. A tool designed to help engineers write and review infrastructure documentation. The existing models were either too general or too expensive to serve at the throughput I needed. I thought I could do better with a smaller, domain-specific model. I was not wrong, but I dramatically underestimated what doing better would require.

I rented A100 capacity on a cloud provider I won't name because their customer service was genuinely bad. I curated a dataset of roughly 40 billion tokens drawn from technical documentation, GitHub READMEs, engineering blog posts, and internal wiki exports I had been collecting for two years with permission. I used a standard transformer architecture, nothing exotic. I had a plan. The plan was wrong in several specific ways.

The first failure was data quality. My dataset looked clean in sampling but had systematic issues at scale: duplicated content across sources that inflated certain patterns, formatting artifacts from HTML stripping, and a meaningful distribution mismatch between the kinds of writing I was training on and the kinds of writing I wanted the model to produce. I discovered this after two weeks of training, when evaluation showed the model had overfit to a particular documentation style that nobody actually uses in the real world. I scrapped the dataset and started over.

"The model doesn't care about your deadline," I wrote in my notes at week six. "It cares about your data." This was obvious in retrospect. It is the thing everyone will tell you. But there is a difference between knowing it and being four weeks into a burn rate of roughly $700 per week and needing to make a decision about whether to continue.

I continued. The second training run was better. Not perfect (there is no perfect) but better in the ways that mattered for my application. By month four I had a model that performed measurably better than GPT-3.5 on my specific task set, ran at a fraction of the inference cost, and could be fine-tuned further by engineers without a research background. Total cost: approximately $11,000, not including the months of my own time.

What I would do differently: start with a much smaller model for initial experiments, a 1B or even a 400M parameter model, before committing to the full scale. Use that smaller run to validate your dataset quality and your evaluation setup before you scale up. The temptation to go straight to the interesting size is real. Resist it.

The broader lesson is one I have been thinking about since: the infrastructure for training your own models is now accessible to individuals in a way it was not five years ago. The bottleneck is not the compute anymore. It is the judgment. Knowing what data to use, how to evaluate what you have built, and when to stop. That judgment takes time to develop, and the only way to develop it is to run things that fail.

The model doesn't care about your deadline. It cares about your data. That took me four months and about $11,000 to really internalize.

The Model She Built Alone

More from AI

The Model I Almost Shipped

The Quiet Revolution Rewiring the Cloud

She Made On-Call Boring

Every story we publish is a reminder that more women are building than the world often sees.

0 comments on “The Model She Built Alone”