Cutting Through the Black Box: How to Build Credibility for AI in the Operating Room

As AI takes a bigger role in operating rooms, the path to error-free surgeries and superior patient results comes with significant risks, says Gokce (Gilly) Yildirim ENG' 04, BUS '19, CEO of Vent Creativity, which creates AI-powered 3D surgical planning tools. How do you earn surgeons’ trust? How do you reduce the likelihood of errors before tools are being used in the operating room? Without a rigorous framework of accountability and transparency for powerful technology like his, Yildirim says, the industry is in danger of undermining its credibility.

We talked to Yildirim about where the tech is going, the governance structures that must be in place, and how to build credibility with the surgeons who will be using AI in their operating rooms every day—but probably don’t trust it yet.

Q: You’ve noted that millions of patients remain in pain after surgery. How can the digital twin model change this outcome?

Gilly Yildirim: The problem is that we are currently stuck in a one-size-fits-all model based on 2D scans. To reach a level of error-free surgery, we need a governance framework that prioritizes individualized planning. Like a map with a node-to-node system, we use point clouds and geotagging to create a digital twin of the patient, which allows us to animate a surgical plan like a ragdoll to see exactly how that person will experience life post-operation. That way we can simulate how a specific patient will move and feel.

This isn't just about the technology; it’s about a ground-up approach to solving the patient’s problem—getting them standing up and moving without pain.

Q: Surgeons are notoriously protective of their clinical autonomy. What specific structures are required to build their trust in AI?

Gilly Yildirim: You have to dismantle the black box. Trust is the primary barrier. Consider that 65% of surgeons still do not use robotic assistance, even when it may reduce mistakes. We have to build trust through a verifiable quality framework.

For example, our system allows clinicians to examine AI-generated slices, landmark locations, and cut planes manually to verify they are correct as they make cuts. We don’t want blind trust in technology; we want a system where the AI explains the "why.” If the software suggests a specific cut, the surgeon should be able to see exactly how that affects the patient’s ligaments and muscles.

Q: Where do you see the biggest need in surgery right now?

There are a huge amount of underserved patients. The number of surgeons is fairly stagnant, vs. huge growth in the patient population.

The old cliche is that we need AI to automate so we can have better healthcare. I'm worried about what’s acceptable. There are a lot of companies in the space of AI and digital twins. But when you look under the hood, it’s not all there. I’m worried that there are companies that are overpromising technology that’s not delivering. Is the market going to be patient enough to wait for the longer runway, better product?

Q: You’ve been vocal about the need for a shift in how the government regulates these technologies. What is missing from the current FDA approach?

Gilly Yildirim: The current guidance is often jumbled and non-committal because regulators don't want to draw a line on the sand for all submissions. But what really needs to happen is a governance structure and body focused on AI—a regulatory body that says we need to fast track these efforts, with mechanisms for funding and accelerating these projects while maintaining high safety standards.

In healthcare, you cannot follow the "fail fast" tech model; you cannot be debugging during surgery. We need a framework that moves beyond looking right to being right. Governance ensures that we aren't just feeding a model "garbage in" or prevalent biases, but using causal AI to understand how the world interacts with these surgical interventions.

Q: Beyond federal regulation, how should companies govern themselves to ensure safety?

Gilly Yildirim: Some of the solutions come from companies like mine that get together and through round-robin testing and internal groups, establish clear expectations for software function and make sure we’re all getting the same, correct results. From a business perspective, this high regulatory bar actually creates a moat, so you can’t just build something in your garage and start using it.

We have to create our own internal loops of quality. At Vent, we’ve developed open-source standards and a quality framework that we’ve submitted for broader use. Every time we train a system with new data, it must be tested against this framework; it is only accepted if it is incrementally better.

One death is a risk too high for healthcare, and is the reputational end of a company.

Q: Much of what you’re describing requires close collaboration between scientific, regulatory, and business communities. What role do business leaders play in establishing these frameworks for safe human-AI collaboration?

Gilly Yildirim: Business leaders provide the guardrails that academic or PhD products often lack. We see many brilliant ideas that have no grounding in a real business plan or a clear path to market. Scientists and surgeons with an idea often don’t understand the amount of work it takes to get to market.

A mature model requires a 3-to-5-year plan with solid timelines and a deep understanding of regulatory hurdles—a real business plan, not just an idea that could be a product. There need to be groups of people who understand different parts of the business. People who understand the market pressures, who understand when to build, and what is the right time to deliver the product.

We need business leaders who understand market pressures and can ensure that the technology isn't just an idea but a sustainable tool for saving lives.

External CSS

Homepage Breadcrumb Block