5 Key Insights from Building Multi-Agent AI Systems at Shopify

By ⚡ min read

Imagine an AI ecosystem where specialized agents collaborate like a well-oiled machine—this is the vision Paulo Arruda brought to life at Shopify. In a recent presentation, Arruda shared hard-won lessons from building multi-agent systems from scratch, revealing a trajectory from clumsy single chatbots to a nimble swarm of micro-agents. The results? Task completion times slashed from hours to minutes. And a forward-thinking fix for one of AI's biggest headaches—context bloat—using an unexpected ally: the filesystem. Here are five essential takeaways from that journey, distilled for anyone building or scaling agentic AI.

1. From Simple Chatbots to Specialized Swarms

Shopify’s early AI efforts centered on a single, general-purpose chat interface—a tool that could answer questions but lacked depth. The problem? One bot trying to do everything inevitably did nothing well. The breakthrough came when the team shifted to a swarm of specialized agents, each designed for a specific domain (e.g., order management, inventory, customer support). Instead of one monolithic model, a collection of lean agents now collaborates. This mirrors microservices architecture: each agent handles a narrow task, communicates via well-defined APIs, and can be updated independently. The lesson: specialization beats generalization when speed and accuracy matter.

5 Key Insights from Building Multi-Agent AI Systems at Shopify — Source: www.infoq.com

2. The Downside of 'All-in-One' Prompts

The initial approach relied on massive, single prompts crammed with every possible instruction—think thousands of lines covering every edge case. While this seemed efficient, it quickly hit a wall. Context bloat made each call expensive and slow, as the model had to process irrelevant information to find what it needed. Worse, updating one part of the prompt risked breaking another. Arruda compared it to a “junk drawer” where nothing was easy to find. The team realized that asking one system to remember everything was not scalable. The insight: monolithic prompts are brittle and fail under real-world complexity.

3. Lean Agent Microservices to the Rescue

The solution was to break that massive prompt into dozens of narrow-focused agent microservices. Each agent owns a tiny slice of knowledge and a single responsibility—like checking inventory, pricing a product, or routing a complaint. These agents are autonomous: they receive a request, process it with their focused context, and return a result. By keeping each agent’s prompt under a few hundred tokens, the system became lightning-fast. Updates are isolated: tweaking the inventory agent doesn't affect shipping. This design turns hours-long tasks into minutes-long workflows. The takeaway: small, focused agents are easier to build, test, and scale.

4. Real-World Impact: Hours to Minutes

With the shift to agent microservices, the performance gains were dramatic. A task that previously required a human to interact with a slow, all-knowing chatbot—often taking an hour or more—now completes in under five minutes. For example, resolving a complex order escalation involves a chain of three agents: one identifies the order, a second checks fulfillment status, a third suggests compensation. Each step takes seconds. The cumulative effect on operational efficiency is enormous: reduced latency, lower cost, and happier end users. This is not just theory; it’s the new baseline at Shopify. The insight: decomposition is the key to speed.

5. A Future Fix for Context Bloat: Filesystem-Based Adapters

Even with lean agents, context bloat can creep back as agents grow in capability. Arruda floated an intriguing hypothesis: use the filesystem as a dynamic memory layer. Instead of loading all context into a prompt, agents read from and write to structured directories—like a virtual file system. Each agent holds a “current working directory” and pulls only the relevant files for the task. This decouples memory from the model, allowing agents to handle infinite context without ballooning prompts. It’s an early idea, but it points toward a more modular, scalable future where context is stored, not crammed. The lesson: think beyond the prompt for long-running, context-heavy tasks.

Building multi-agent systems taught Paulo Arruda—and Shopify—that the path from simple chat to sophisticated swarms is paved with ruthless simplification. By embracing specialization, discarding monolithic prompts, and even reimagining context as files, the team cut task times from hours to minutes. The road ahead may involve filesystem adapters, but the core principle remains: keep agents lean, focused, and independent. For anyone building in this space, these insights are a blueprint for moving from experiment to production without drowning in complexity.