Read the Transcript
Chris Buckel: Hi, my name is Chris Buckel, Vice President of Business Development at Silk, but I’m better known in the database community as @FlashDBA. It’s my great privilege today to be interviewing my friend, Tanel Poder, who I’ve known for many decades. In fact, he’s one of the world’s foremost experts on databases and performance. One of the things that I really appreciate about Tanel is his ability to embrace cool new stuff. He’s an author, presenter, troubleshooter, and entrepreneur, but I think he’s best known for his work as a blogger. It was one of Tanel’s blogs that really drew my attention this year when he started writing about his project to build a very fast, high-performance machine at home. Tanel, hi, how are you?
Tanel Poder: I have no complaints here.
Chris Buckel: Do you remember that particular blog?
Tanel Poder: Yes. Modern hardware is super fast if you use it right. I wanted to prove that by buying a little workstation at home, putting about ten SSDs in it, PCIe 4, and all these modern technologies. And I got millions of IOPS out of this single box, tens of gigabytes per second of scanning rate. So, modern hardware is super fast, as I said, but none of that matters if you cannot use it in an enterprise. Right? If a single box goes down, you will lose all the disks in it. So, you actually need magic on top of all of this modern hardware that would make it usable for databases like Oracle.
Chris Buckel: That’s right. I invited you to test Silk and spend time to see what you could do with the performance expectations that Silk can deliver. Maybe you can tell us a bit about that testing?
Tanel Poder: Yes. When we started talking about me testing your platform before taking on this work, I wanted to make sure that it passed the “excitement test.” In other words, is there anything novel in this platform? And indeed, there is. I read through your architecture guide and Silk uses the directly attached SSD disks in the compute nodes in the cloud instance very fast, like I have demonstrated in the past. But it does it in a scale out way and the Silk software layer makes it into a big cloud data store that is resilient, does compression of all your data, and all the other kinds of enterprise features you would expect. So yes, it passed the excitement test and that led me to doing the work.
Chris Buckel: First of all, I sent you the architecture white paper. I think it’s 35 pages long and you’re the first person I’ve ever known that read it from cover to cover and apparently made notes on every single page. So, thank you for that.
Tanel Poder: Yes, it was a fun read! And when I read about the architectural features, I often found myself nodding that, yes, this is how I would have built it.
Chris Buckel: So, you’re familiar with the architecture. One of the things that we do is we have a two-tier system so that you can split out performance from capacity. The top tier allows for performance to be scaled up and down. And then the bottom tier makes all the data durable for systems in the cloud. Is there an architecture that maybe reminds you of a little bit?
Tanel Poder: Yes, exactly. It reminded me of Exadata, ASM, and all these layers a little bit. You have your databases connecting to the Silk datastore. But the Silk datastore is not just one box, it’s actually two layers of cloud compute VMs, essentially. And this gives you horizontal scalability. But these two layers give you “smartness.” What I found interesting was that all the data is compressed because you can use so many compute nodes. Compute nodes have not only the SSDs in them, but also a lot of CPUs. Right? So, you can use “smartness” with the CPUs. And you can compress in the compute nodes, and when you now read to persist the data to the lower layer, that’s already compressed, so you reduce traffic, you use less space.
Chris Buckel: That’s right. And the compute nodes actually do a lot more. They take care of things like space-efficient snapshots and deduplication, replication between different data pods in the cloud. Lots of different functionality there.
Tanel Poder: Yes. And given that you’re running in the cloud, you can choose exactly the right-sized nodes and you’re not going to run out of CPU time for doing this “smartness.”
Chris Buckel: The lower layer of the architecture, the layer where we persist the data and make it durable. Can you tell us a bit about that in comparison to, let’s say, ASM?
Tanel Poder: From the upper layer, from the compute layer, you persist already compressed chunks of data, for lack of a better term. But at the lower capacity layer, you are not using triple mirroring. You don’t need that because you have your own proprietary erasure coding algorithm. So, instead of 3x duplication of data, you only have 12.5% of overhead spread across many, many disks so you can suffer a loss of multiple nodes or multiple disks within nodes and still be able to serve the data.
Chris Buckel: We’ve talked about the features and the resilience, but really it’s the performance that Silk is all about. How did you go about testing the performance?
Tanel Poder: The performance usually is my main interest. The other features are necessary. You need them anyway. But then performance is like, you know, how much are you going to get out of your system? Right? Usually, my first approach is to do something simple, but heavy. So, I ran a very big parallel query scanning a big table in Oracle and even the smallest configuration of Silk on Azure cloud with only two compute nodes gave me 3.5 gigabytes per second scanning rate. And that was pretty impressive from the smallest configuration.
Chris Buckel: Test number one is always to try and break it. Right? So, you tried to break it?
Tanel Poder: Exactly. Yes.
Chris Buckel: And then you moved on to some kind of transactional test?
Tanel Poder: Yeah. Then you get more complex and introduce more variables and writes as well. I like to use Swingbench. Swingbench doesn’t really do IO, but it runs your joins and all kinds of operations that the database does. So, it gives you a little more complete picture. But still doing a lot of IO. I launched like a thousand concurrent users in the single database VM and ended up doing 30,000 IOPS during normal time. But, it’s not only about read IOPS, right? You have commits as well. Your OLTP system has commits. So, if you do a lot of IOPS for read and DB Writer is doing a lot of IOPS for writes, then still when you commit, you want it to finish really fast. I was pleasantly surprised to see that commit still took less than a millisecond even though I was doing tens of thousands of IOPS.
Chris Buckel: Exactly. I always think that real-world workloads are a mixture of every different type of IO. If you really want to recreate a normal workload, you need lots of different things happening at the same time. So, you tested throughput, you tested transactions. Did you try and push it any further?
Tanel Poder: Yes, I did. And I didn’t really even have to increase the workload. I just had to wait until DB Writer kicked in and started flushing hundreds of thousands of blocks to disk at once. And this is how Oracle does things. Many other databases work the same way. So, you have all these reads going on, all these commits going on, all the action, and suddenly, you have a burst of writes that need to complete really fast. And the database should not slow down at that time. Right? And Silk actually has a pretty good monitoring tool that gives you this per second, if not better. I think it’s even more than per second update rate. If you look into a one-hour AWR report, you’re not going to see any of these bursts or any of the slowdowns on some platforms. But with Silk, I saw that when DR Writer kicked in, your platform was doing over 100,000 write IOPS, or IOs per second, while the latency of IOs was still under a millisecond, so that was pretty impressive and that led me to research: How does it work? And as it turns out, as described in the architecture document, thanks to Silk’s two layer architecture, these massive writes and bursts of “bright storms,” they can be acknowledged when the written block has been sort of saved in the RAM in memory of multiple compute nodes. Right? And then the write can be acknowledged and then the compression and persistence can happen separately. So, you’re protected against a node failure. Yet you get low latency for writes. And that’s important for not only for DB Writer reasons, but also for temp IO. What if you’re running a B has join, right? You’ve got to dump a lot of stuff to disk and get it back fast and never use it again. Being able to buffer these things in RAM makes a difference.
Chris Buckel: One of Silk’s architectural features is the ability to scale up performance on-demand when you need it. And I understand that you were able to test that, too, right?
Tanel Poder: Yes. As I mentioned before, I got started with the minimum recommended configuration of only two compute nodes. And I understand that two compute nodes are needed for resilience, right? You could, in theory, run with one as well, but then you probably don’t want to do that in an enterprise. The minimum two node system gave me 3.5 gigabytes of scanning rate for my parallel query. Then I went to the Silk UI. I dragged and dropped one more compute node from the free pool to my pool. Then I think it took, like, a minute to reconfigure. And my throughput went from 3.5 gigabytes per second to 5 gigabytes per second. And it was all in live operation. I deliberately ran parallel queries, which do a lot of asynchronous IOs, pre-fetching, and stuff like that. So, there are a lot of IOs in-flight while this reconfiguration happened. And it was all done live without interruption. That’s pretty cool.
Chris Buckel: And the great thing is that you can also do it in the other direction as well. As you said, you need a minimum of two compute notes for resilience. But if you’re running a lot more and then you no longer need that performance, you can scale back down to reduce the cost.
Tanel Poder: Yes, I did not test that because… more is better for me!
Chris Buckel: Always. One of my personal bugbears in the cloud is the concept of overprovisioning. It’s the idea that in order to get a certain amount of one resource, you have to provision and pay for a whole load of another resource that you may not actually need. And the great example for that is always IO. You need so many IOPS for your database, or so much throughput, you probably have to provision a huge amount of capacity that you might not actually need. Even worse, you have to provision larger VM sizes, more vCPUs and that normally means more resources all because of one limit to one particular resource. Did you find anything similar when you were looking at your testing?
Tanel Poder: So, yes, I didn’t realize myself that in the public cloud environment there is a difference between compute networks, you know – just the networks between compute nodes – and IO networks. Compute networks and IO networks are limited differently. Right? So, you might need to provision the biggest VM in Azure to run your big data warehouse that needs to scan a lot of data, versus, you could provision a smaller VM with less CPU’s, if only you could use compute networks for your IO. And that’s exactly what Silk does, right? That explains why I got over five gigabytes per second of scanning rate with just three compute nodes with one Azure machine running the database.
Chris Buckel: That’s amazing, Tanel. Looking forward to reading about that on the blog. And I know there’s one other area that you were really interested in as well, which is snapshots.
Tanel Poder: Silk does thin snapshots, or thinly provisioned snapshots. And even though snapshot technology itself is not new, when you combine it with the elasticity of the cloud, this opens a bunch of architectural opportunities. Maybe you have burst workloads like end-of-month processing that need to finish really fast, right? So, you need a fast data store and the ability to do snapshots and launch temporary instances there on a temporary VM in the cloud. And the moment you’re done, you shut it all down. You’re not going to pay for that instance and your disk space usage goes down as well.
Chris Buckel: So, thanks very much. It’s always really exciting to talk to you and thanks everybody for listening. You can find all the materials we talked about on the link below.
Tanel Poder: And thank you for giving me an opportunity to play with cool new technology.