What Breaks, Builds: Cedar's Kevin Buckley Reflects on the AI Engineer World Fair

From internal tools to real-time evaluation, how engineering teams are turning setbacks into smarter systems and what we’re bringing back to make healthcare payments easier for patients and providers

In June I had the opportunity to attend the AI Engineer World Fair 2025. It was one of the most forward-looking, practical conferences I’ve been to in a while, filled with real, working implementations of AI systems, not just theoretical ideas. I came back with a list of takeaways that I’ve already started weaving into how we think about engineering at Cedar.

At Cedar, our goal is to make healthcare payments easier and more affordable for patients, while empowering providers to serve their patients better. That mission requires technical innovation, but also relentless iteration and clarity of focus. This conference helped sharpen both.

Here’s a look at the most useful ideas I brought back, how Cedar is working to apply them, and some questions I’m still sitting with.

Turning Fails Into Features: Learning From Zapier’s Feedback Loop

One of my favorite sessions was from Zapier, where the team shared how they turned failure signals into new features. Their strategy revolved around using tools like Braintrust (https://www.braintrust.dev/) — an open-source GenAI evaluation platform — to capture both explicit and implicit feedback from users.

What stood out was how they designed evaluation workflows that run continuously, before and after release. Their automated systems flag issues early, prioritize fixes, and even help identify where to focus development efforts next.

The talk pushed me to think differently about how we handle evaluation at Cedar. We’ve always valued quality and safety, but combining automated monitoring with manual assessments might help us catch issues sooner and evolve faster without compromising on reliability.

I’ve started exploring how we might integrate tools like Braintrust or Arize (another popular platform at the conference) into our workflow, alongside tools like Streamlit for visualization through Snowflake.

Building Better, Faster: Why Internal Tools Might Be Your Best Investment

One of the clearest patterns from the conference was how many leading teams are investing in rapid development of internal tools rather than waiting for the market to solve their problems.

Zapier gave a compelling talk on how their internal tools evolved from scrappy prototypes into high-leverage infrastructure. Instead of buying generic solutions, they focused on fast, targeted development based on their teams' actual workflows. One example of a custom tool is a real-time monitoring solution, tailored specifically to Zapier’s workflow, enabling users to generate evaluations directly from their runtime observability dashboard with just one click. That approach gave them flexibility, better fit, and shorter feedback cycles.

This really resonated with me because Cedar is also in a space where our needs are specialized. We’re supporting patients through complex billing experiences and working with provider systems that vary widely. Off-the-shelf solutions don’t always fit our context out of the box. At Cedar, we’ve built custom tools to analyze raw data, evaluate our run time agents, and simulate large agent interactions which have accelerated our work and were very quick to stand up.

Since the conference, I’ve been thinking even more about where we can build tools internally to simplify workflows for our teams, shorten cycle time on decisions, or improve visibility into our systems. Sometimes, the fastest way to move forward is to build what you need, especially when time and fit are critical.

Structured Specs Are the New Prompts in Agentic Coding

At Cedar, we have strong adoption of Agentic coding practices. We regularly check in with engineers on where Cursor is providing value, and where it's falling short. We have dedicated Slack channels where people share success and failure stories with Cursor. In short, we are continuously looking for ways to get better in this area!

Agentic coding was a large theme at the conference. The move from one off prompt-based Agentic coding workflows to what’s being called a "spec-first" approach was a very insightful talk. Instead of crafting one-off prompts or flows, engineering teams are writing structured specifications as the backbone of Agentic development.

Specs as code, similar to how infrastructure as code changed DevOps, allow teams to version, test, and maintain their product logic much more predictably. Your AI Coding Agent can’t forget previous context when it’s in the spec that’s included in each prompt.

I've already started applying this approach to some of our internal coding workflows with great success. Imagine the workflow in an unfamiliar codebase of having the AI Agent build a spec from existing code to help you and the agent learn, then modifying the spec for your new requirements and then feeding that back into the agent! What a world!

It’s making our Agentic iterations more predictable, and it gives us a clear way to define our requirements and not have loss over time. That kind of structure also helps us stay compliant and transparent, which is critical in healthcare.

Focused Agents

There was a lot of discussion about agent-based development, but the most grounded takeaway came from a talk on "context engineering." The message: don’t try to build agents that do everything. Instead, design them with a single, tightly focused goal.

This is relevant for any team building AI features, but especially for us at Cedar, where clarity and reliability are key. Whether it's a support tool for providers or a payment guidance feature for patients, the most effective systems tend to do one job well, with clear boundaries.

Smaller Teams, Bigger Impact

Another theme that came up again and again was the power of small, focused engineering teams. The “Tiny Teams” track highlighted how empowering it feels as an engineer to have autonomy and direct ownership over your work, particularly when using cutting-edge agentic coding tools. Seeing how impactful and motivating this approach is at other companies was inspiring, especially in fast-moving spaces like AI tooling.

Personal Takeaways and What I’m Still Thinking About

When I haven’t attended a conference for a while, it’s easy to forget how valuable they are. The combination of face-to-face interactions with experts and dedicated, focused attention makes these events uniquely worthwhile.

Shipping GenAI projects without a robust evaluation system introduces unnecessary risk. Fortunately, setting up such a system, whether through internal tooling or integrating an off-the-shelf solution, is straightforward and well worth the modest effort required.

Investing in mastering agentic coding workflows pays off. The amount of innovation happening in this area is enormous, and the tooling continues to rapidly improve, allowing engineers to operate at higher abstraction levels with increasingly minimal context needed.

Every good conference leaves you with more questions than answers. Here’s what I’m still chewing on:

How can we extend spec-first workflows across more of our development cycle?
Where should we double down on internal tooling to get better leverage?
What would it take to make continuous evaluation feel as standard as testing or code review?

If you’re curious about any of these or have thoughts from your own experience, I’d love to hear them. We grow fastest when we share what we’re learning, and figure it out together.

Learn more about the AI Engineer World's Fair here

To learn more about Kevin, connect on LinkedIn

Interested in a career at Cedar? Check out our careers page