·8 min read

ML Pipeline Boilerplate That Actually Works

Why chaining open-source ML libraries into production apps is a nightmare, and the modular boilerplate approach that could fix it using Modal, Inngest, and Supabase.

Share:
Professional discovering a better way to build ML pipelines

The open-source machine learning ecosystem is exploding. Voice cloning libraries that match commercial quality. Image generators that rival Midjourney. Music synthesis tools that create full compositions from text prompts. But here's the problem nobody talks about: chaining these libraries together into actual production applications is an absolute nightmare.

I learned this the hard way building ChangeLyric, a tool that swaps lyrics in songs while keeping the original voice. The concept was simple enough. Take an MP3 file, run it through stem separation, process it through voice synthesis, and output a new file. In practice? Months of chaos.

And the chaos wasn't from the ML models themselves. The models worked. The chaos came from everything around them. The architecture decisions. The database structure. The file handling. The job status tracking. Every time I asked my AI coding assistant for guidance, it would flip-flop between completely different approaches. Redis or BullMQ or Inngest? Postgres or Supabase or Firebase? The answers changed CONSTANTLY.

The Core Problem With AI-Assisted ML Development

Here's what nobody tells you about using AI to build complex ML applications. The AI is great at writing individual functions. It can implement a specific model call beautifully. But the moment you need it to maintain architectural consistency across a multi-step pipeline? It falls apart.

The fundamental issue is that LLMs don't maintain persistent memory of design decisions across conversations. Every new chat starts fresh. And even within a single long conversation, the context window eventually drops earlier decisions. So you end up with code that uses three different patterns for the same problem because the AI forgot what it suggested two hours ago.

This matters enormously when building ML pipelines because every open-source library operates on the same basic principle: inputs and outputs. Audio goes in, separated stems come out. Text goes in, synthesized speech comes out. The logic is straightforward. But coordinating these inputs and outputs across multiple libraries, tracking job statuses, handling human checkpoints, and managing file storage? That's where projects collapse into spaghetti.

What ML Pipelines Actually Need

After struggling through the ChangeLyric build, I started thinking about what would make future projects easier. Not just for me, but for anyone trying to chain ML libraries together. The pattern repeats across nearly every ML application I can think of.

  • User uploads something - an image, audio file, video, document
  • Processing happens - one or more ML models do their thing
  • Sometimes humans intervene - reviewing output, making selections, providing additional input
  • More processing happens - based on human decisions or additional ML steps
  • User downloads result - the final transformed output

The number of steps varies. The specific libraries change. But this fundamental flow stays consistent. Which suggests something important: we shouldn't be reinventing this flow from scratch every time.

Professionals collaborating on workflow connections

The Boilerplate Concept: Defining Structure Upfront

The solution I'm working toward is essentially project management for ML pipelines. Define your checkpoints upfront. Specify what status a job can have at each stage. Lock in those definitions before any code gets written. Then the system operates within those predefined rules.

Think of it like this. Before building anything, you create a manifest that says: "Step 1 is file upload. Step 2 is audio separation. Step 3 is human review. Step 4 is voice synthesis. Step 5 is final download." Each step has defined inputs and outputs. Each step has possible statuses. The AI assistant can't deviate from this structure because the structure is codified before development begins.

This approach eliminates the architectural flip-flopping because you've already made those decisions. Should we use Redis for job queues? It's in the manifest. Should file storage use Supabase or S3? It's in the manifest. The AI assistant's job becomes filling in implementation details within a fixed structure rather than making structural decisions on the fly.

Want More Technical Insights?

Subscribe to get practical ML implementation strategies and case studies delivered to your inbox.

We respect your privacy. Unsubscribe at any time.

The Tech Stack That Makes This Possible

Modal for GPU Compute

Modal handles the heavy lifting of GPU processing without requiring any DevOps knowledge. I wrote about this in detail when I built GPU-powered video tools with Modal. You define a function, specify what hardware it needs, and Modal provisions everything automatically. No managing servers. No configuring CUDA. Just Python code with decorators.

The serverless nature is particularly valuable for ML workloads because you're not paying for idle GPUs. A user uploads a file, Modal spins up a GPU container, processes the file, and shuts down. For applications with variable traffic, this approach can cut compute costs by 80% compared to running dedicated instances.

Inngest for Workflow Orchestration

Inngest is the orchestration layer that coordinates multi-step workflows. It handles the messy parts of running background jobs: retries when things fail, timeouts, step functions, and observability. Their newer AI-specific features like step.ai provide automatic retry logic for LLM calls, which is critical because inference requests fail more often than you'd expect.

What makes Inngest particularly interesting is its AgentKit framework for multi-agent systems. If you're building something that coordinates multiple AI models making decisions, Inngest handles agent memory, tool coordination, and guardrails out of the box. This is infrastructure that would take months to build from scratch.

Supabase for Everything Else

Supabase covers database, file storage, and user authentication in one platform. The Postgres database handles job tracking and user data. The storage system manages uploaded files and processed outputs. Row Level Security policies control who can access what. Realtime subscriptions push status updates to users as jobs progress.

Next.js and Vercel for Frontend

The frontend stack is Next.js deployed on Vercel. Server-side rendering for SEO, API routes for backend functionality, and deployment automation through GitHub. Every component in this stack abstracts away infrastructure so you can focus on application logic.

The Vision: Node-Based Pipeline Building

The boilerplate approach is the starting point. But the larger vision is something more ambitious: a visual interface where people can connect ML libraries together like nodes in a graph.

Person manipulating modular building blocks representing ML pipeline components

Imagine something like ComfyUI but for general ML workflows. You drag in a "file upload" node. Connect it to a "stem separation" node. Add a "human review checkpoint" node. Connect that to a "voice synthesis" node. Finally connect to a "file download" node. The system generates the infrastructure to make it work.

Each node type would have predefined inputs and outputs. Some nodes would be purely automated. Others would pause for human intervention. The configuration would happen visually rather than through code. And because the node definitions are standardized, the AI assistant could help implement new node types without architectural confusion.

This isn't entirely hypothetical. Platforms like ZenML and Metaflow already provide modular pipeline abstractions. But they're designed for data scientists building training pipelines, not creative professionals building end-user applications. The gap I'm trying to fill is for people who want to deploy ML-powered tools without becoming machine learning engineers.

Starting Simple: The Proof of Concept

First step: prove the concept with a minimal example. Two ML libraries chained together. User uploads, Library A processes, Library B processes the output, user downloads. No human checkpoints. No branching logic.

Second iteration adds human checkpoints. User uploads, processing happens, job pauses for human review, human decides, processing continues. This pattern made ChangeLyric complex. Nailing it in a reusable way saves months on future projects.

Third iteration: rebuild ChangeLyric within this clean modular structure. If the framework makes that rebuild straightforward, it validates the entire approach.

Why This Matters for Creative Professionals

A musician with a creative vision for a new production tool shouldn't need to master distributed systems. A filmmaker with ideas for AI-assisted editing shouldn't need to configure Kubernetes. The barrier to building ML-powered applications should be creative vision and domain expertise, not infrastructure knowledge.

The Business Case

There's a commercial opportunity here too. People would pay real money for a system that lets them chain ML libraries together without dealing with infrastructure. Not a small amount either. The alternative is hiring specialized engineers or spending months learning yourself. If a platform could reliably solve this problem, the pricing power would be significant.

Whether this becomes a boilerplate that developers use for their own projects or a hosted platform that non-developers use directly is still an open question. The boilerplate approach is faster to build but requires technical users. The hosted platform approach serves a larger market but requires MORE infrastructure work upfront. I'm starting with the boilerplate because it proves the concept, but the platform version is where the larger opportunity sits.

Building ML-Powered Applications?

If you're working on a project that chains multiple ML libraries together and struggling with the infrastructure complexity, I'd be happy to discuss approaches. Sometimes an outside perspective on architecture decisions can save weeks of rework.

Schedule a Strategy Call

What's Next

I'm documenting this process as I build it. The proof-of-concept pipeline is in development. When it's working, I'll share the architecture details, the manifest structure, and the patterns that emerged. If the concept proves out, open-sourcing the boilerplate is the plan.

The goal is simple: every future project involving ML pipelines should be easier than ChangeLyric was. The tools exist. We just need better frameworks for combining them.

Follow the ML Pipeline Journey

Subscribe to get updates as I build out this framework and share technical implementation details.

We respect your privacy. Unsubscribe at any time.