Silk Live Demo: Eliminating Storage Overprovisioning
Introduction – Isabella
Welcome, everyone, to our live demo: “Why Are You Overprovisioning Storage Just to Get Performance?”
Today’s session is led by our Principal Solutions Architect, Skip Marsh, and we’ll also be joined by Damon Miller, VP of Field Engineering, who will help with Q&A at the end.
With that, I’ll hand it over to Skip.
Overview – Skip
Thank you. Hopefully everyone can see my screen.
What is Silk?
Silk is a software-defined virtual SAN that installs directly in your cloud subscription. We support Azure, AWS, and GCP—today’s demo focuses on Azure, but we’ll be covering the others in upcoming sessions.
We provision storage volumes to your applications—typically databases, but also high-performance file workloads. These volumes are mounted directly to your VMs and can coexist with native cloud storage.
Architecture Overview
We deploy what we call a Silk Data Pod via a marketplace deployment.
- Flex: Our orchestration engine (first VM deployed)
- C-nodes (Controllers): Performance layer
- M-nodes (Media nodes): Capacity layer
- D-nodes (Data nodes): Storage units (similar to SSDs)
Key points:
- Capacity per M-node: 15–120 TB
- Max per Data Pod: 360 TB usable (~1 PB addressable)
- Performance scales independently of capacity
- Scaling is non-disruptive and online
Performance Characteristics
- Minimum: 2 C-nodes
- ~500K IOPS
- 5–6 GB/s throughput
- Maximum: 8 C-nodes
- 2M+ IOPS
- 20+ GB/s throughput
Performance scales linearly as you add controllers.
Real-World Example
A trading company running SQL Server achieved:
- 34 GB/s throughput (single VM)
- ~1 ms latency
This eliminated the need to re-architect their application as they scale.
The Problem: Overprovisioning in Azure
In Azure, VM size often dictates storage limits. This forces customers to:
- Choose larger VMs than needed
- Pay more for compute and licensing
- Still hit performance ceilings
Demo Setup
We compare:
- Azure native storage (D64 VM)
- Silk on a smaller D16 VM
Azure Native (D64 VM):
- 4 Ultra disks (100K IOPS each provisioned)
- Actual limits:
- 80K IOPS
- ~3 GB/s throughput
- Latency: ~5–6 ms
Benchmark: Azure Native Results
Across multiple tests:
- Throughput capped at 3 GB/s
- IOPS capped at 80K
- Mixed workloads share limits (not independent)
- Latency consistently 5–6 ms
Benchmark: Silk on D16 VM
Same tests, smaller VM:
Results:
- 7+ GB/s throughput (vs 1.2 GB native limit)
- 400K IOPS (vs 40K native limit)
- ~1 ms latency
Key takeaway:
A 16-core VM outperforms a 64-core VM using native storage.
Business Impact
- Reduce VM size requirements
- Lower SQL licensing costs
- Achieve:
- Higher IOPS
- Higher throughput
- Lower latency
Additional Capabilities
- Instant database clones (even 80+ TB)
- No additional storage cost
- Use cases:
- Analytics
- AI/ML workloads
- Dev/test environments
Q&A Session
Q1: How does Silk scale?
Skip:
Scaling is simple and non-disruptive:
- Performance: Add C-nodes → linear scaling
- Capacity: Add M-nodes via UI or API
- Changes take minutes and require no downtime
Everything is thin-provisioned and fully API-driven.
Q2: Why does latency matter so much?
Skip:
Latency compounds in real workloads:
Example:
- 10 ms write latency → replicated → 20 ms total
- Impacts:
- SQL transactions
- Always-on clusters
- ETL pipelines
With Silk:
- 1–2 ms latency
- Queries that take hours → minutes
Closing – Isabella
Thanks everyone for joining!
We run these demos monthly, covering Azure, AWS, and GCP. Keep an eye out for upcoming sessions.