AI Inferencing Didn’t Break Your Architecture – It Reveals What Comes Next

Webinars
AI Inferencing Didn’t Break Your Architecture – It Reveals What Comes Next

Feb 2, 2026

00:32

Transcript

Tom O’Neill (Host):
Hello everyone, and welcome to today’s webinar, AI Inferencing: Don’t Break Your Architecture.
I’m Tom O’Neill, and I lead the Product team here at Silk. I’m delighted to be joined by Eduardo Kasner, Chief Data and AI Officer for the High-Tech sector at Microsoft, joining us today from Seattle.

Today, we’re going to explore how real-time AI inferencing is reshaping enterprise infrastructure. Eduardo, welcome—great to have you here.

Eduardo Kasner:
Thank you, Tom. It’s a pleasure to be here.

What Is AI Inferencing?

Tom O’Neill:
Before we dive in, I want to level-set on what we mean by AI inferencing.
For me, inferencing is the “thinking” phase—what happens when AI models perform the tasks they’ve been trained to do. It applies across many AI use cases: agents, RAG, semantic search embedded in databases, and more. Inferencing happens after training, may or may not require GPUs, but it always requires significant infrastructure—and critically, it needs access to context and data.

Eduardo, does that align with how you think about inferencing?

Eduardo Kasner:
Yes, largely. I’d add that many people also use the term grounding—they’re closely related. Inferencing is essentially running the model.
Whether you’re using a pre-trained model or a fine-tuned one, once you provide input data and context, you’re executing that model to generate outcomes. Inferencing plus grounding is how you guide the model to deliver the answer you actually want.

Tom O’Neill:
I love that definition: inferencing is running the model.

Real-World AI Use Cases Creating Business Value

Tom O’Neill:
You spend a lot of time with customers innovating around AI inferencing. Can you share some examples of how organizations are using AI to create real business value or competitive advantage?

Eduardo Kasner:
Absolutely. One thing I like to point out is that AI isn’t happening in isolation. Focusing only on generative AI or chatbots is like saying the most exciting part of a car is one tire—it misses the bigger picture.

We’re seeing tremendous innovation across many types of AI: sentiment analysis, translation, video recognition, document intelligence, recommendation systems, and more. When you combine these capabilities with analytics, data processing, and software engineering, what you really get are intelligent applications.

A great example is banking applications. Five years ago, chatbots were basic and often inaccurate. Today, those same applications can correlate spending across accounts, categorize transactions, and provide meaningful insights in near real time—using the same underlying app, enhanced with AI.

Another example is healthcare. Medical organizations must retain patient data for decades. That data spans multiple database migrations, formats, PDFs, images, and audio files. AI now enables multimodal search and correlation across that historical data—surfacing insights that were previously inaccessible.

We also see this in e-commerce. Product search, recommendations, and personalization are dramatically more accurate than they were just a few years ago.

What’s exciting isn’t just brand-new AI applications—it’s that almost every existing application is being upgraded with AI capabilities. That’s where the real transformation is happening.

Performance, Accuracy, and the Data Tradeoff

Tom O’Neill:
We’re seeing a proliferation of AI agents and use cases, especially embedded into existing applications. One challenge customers raise is balancing performance and accuracy. If you limit what an agent can do, you get faster responses—but less accurate ones. If you broaden access to data, accuracy improves but performance can suffer.

Does that align with what you’re seeing?

Eduardo Kasner:
It absolutely does. I’ll add a few more examples.
We’re working with companies using AI to analyze network conditions for video conferencing quality, and others using AI to guide partners through complex certification processes. These aren’t brand-new ideas—they’re existing workflows enhanced by AI.

The challenge is that AI requires more data to be accurate and differentiated. But more data means more load, more queries, and more unpredictable growth against production systems.

That’s where things get serious—especially when you’re dealing with systems of record or mission-critical databases.

The Impact of AI Inferencing on Enterprise Infrastructure

Tom O’Neill:
As inferencing workloads ramp up, cloud environments are behaving in ways many teams haven’t seen before. From your perspective, how are AI workloads changing enterprise infrastructure demands?

Eduardo Kasner:
There are two common approaches people suggest—and both are flawed in isolation.

The first is consolidating everything into a lakehouse. Conceptually, it’s elegant. Practically, it’s slow, expensive, and often takes years—especially when dealing with multiple massive data sources.

The second is pushing everything into vector databases. That also adds architectural complexity, cost, and latency—and it’s not trivial to operationalize at scale.

So what do people do instead? They hit the production database directly with AI workloads—often in unplanned ways. If the workload is small, it’s fine. But if it’s successful, query volume grows rapidly, unpredictably, and without a clear capacity plan. That’s where performance risk emerges.

Isolate or Scale? Choosing the Right Architecture

Tom O’Neill:
At Silk, we see three ways customers try to manage this:

Maximizing performance of the existing environment
Optimizing for AI-style read-heavy workloads
Creating instant, isolated copies of production data for AI use

From your perspective, how should teams think about choosing between these approaches?

Eduardo Kasner:
The honest answer is: it depends.
It depends on workload impact, architectural complexity, cost, security, and long-term maintainability.

If you fully understand the AI workload and its impact, scaling the existing environment may be the simplest option. If you don’t—or if the workload is unpredictable—isolating it is often safer.

What’s critical is having options. The worst choice is blindly hitting production. Another bad option is extracting massive amounts of data on a schedule, which introduces latency and additional load.

Having the ability to either elastically scale or instantly duplicate your environment—without introducing new technical debt—is incredibly powerful.

Preparing for What’s Next in AI

Tom O’Neill:
Looking ahead, what mindset shifts or strategic principles can help organizations evolve with AI without increasing architectural risk?

Eduardo Kasner:
I’ll simplify it into six priorities:

Accuracy – If the system gives wrong answers, users abandon it. Nothing else matters.
Data Readiness – You must have the right data, context, and parameters to answer questions correctly.
Performance – Users won’t wait. Even seconds matter.
Privacy & Security – AI must not compromise sensitive data, compliance, or sovereignty.
Scalability – Workloads vary wildly by geography, users, and usage patterns.
Cost – Many AI projects stall because costs scale faster than expected.

Performance is often the breaking point. Many AI projects get close to production but fail due to cost or infrastructure constraints—especially during experimentation.

AI development is inherently iterative. You test hypotheses, break things, refine, and iterate. Without isolated environments for experimentation, you risk impacting production systems long before a use case proves viable.

Closing Thoughts

Tom O’Neill:
That’s a great perspective. Eduardo, thank you for an insightful and practical discussion.
If you’re experiencing challenges with AI inferencing, performance, or production data access, you’ll see contact information on the screen for Silk and for me.

Any final thoughts before we wrap up?

Eduardo Kasner:
Thank you for having me. This is a silent problem many organizations don’t address until they’re close to launch. I encourage teams to evaluate technologies that let them innovate with AI while keeping systems operable, scalable, and maintainable in production.

Tom O’Neill:
Thank you again, Eduardo—and thanks to everyone who joined us today.

As real‑time AI inferencing becomes foundational to enterprise applications, infrastructure teams are encountering new patterns: sudden latency variability, unpredictable resource contention, shifting cost dynamics, and pressure on mission‑critical workloads. These aren’t signs of failure — they’re signs of change. Existing cloud architectures, built for steady transactional loads, are now being asked to support burst‑heavy, data‑intensive AI behaviors at an unprecedented scale.

In this session, Eduardo Kassner, Chief Data & AI Officer at Microsoft, and Tom O’Neill, VP of Product at Silk, examine how AI inferencing reshapes system behavior and why the solution isn’t simply adding replicas, adopting new storage systems, or rewriting applications. Instead, leading enterprises are introducing cloud‑native acceleration layers — such as virtual SAN architectures — to deliver consistent performance, isolate AI workflows, and scale responsibly in shared cloud environments.

If AI is stressing the boundaries of your current cloud design, this session will help you understand the architectural shift underway — and how organizations are adapting without disruptive re‑architecture or added operational risk.

Meet the Speakers

Eduardo Kasner

Chief Data & AI Officer, Microsoft

Tom O'Neill

VP of Product, Silk

LinkedIn Email

Additional Resources

Silk Platform Architecture
Supporting AI Inference Operations in PTC Windchill+ with Silk