Data Reduction

< Back to Glossary

What is Data Reduction?

Data reduction is the process by which the space that your data takes up is decreased. With cloud computing, you create data every time you save a file or perform a transaction. The more data you generate, the more space you need in the cloud. With data reduction, you lower the amount of space you need, while keeping your data intact.

However, if you are migrating data from on-prem, you may be surprised to find that data reduction is not a standard feature of the cloud. Meaning you may experience significant data inflation when migrating to the cloud. Data inflation occurs when the space taken up by your data increases significantly on your new cloud platform compared to your local database server. Data from Oracle Exadata, for example, is notorious for experiencing data inflation when moved the cloud. The more space your data takes up, the more you need to invest in cloud resources. Once on the cloud, every byte counts.

The cost of maintaining your data in the public cloud can start to spiral out of control if you don’t manage those bytes. You’re paying for access to your data in the cloud. Taking up more space means higher cloud bills.

With the Silk Cloud Platform, you are able to get enterprise data services, such as data reduction as well as zero-footprint instantaneous snapshots, thin provisioning, and data deduplication, that allow you to keep your cloud resources to a minimum. In turn, this helps to cut your cloud bill by up to 30% making your cloud much more efficient.

Data Reduction FAQs

What are Data Reduction Methods?

There are four main types of data reduction methods or techniques. Each uses slightly different ways to achieve data reduction. You can select the best data reduction technique depending on how accurately you want to retrieve the data, once restored. The four methods are:

Dimensionality Reduction – When a data set is made up of multiple data points, each data point can be identified with attributes. As an example, say you want to build a dataset of the population in a certain city for a grocery marketing campaign. You could collect information such as the name, age, eye color and occupation of the people living in that city. Each piece of information is considered an attribute. Dimensionality reduction removes any duplicate or irrelevant attributes. In this example, eye color is eliminated since it does not add value to the data set. Eye color does not help to predict an individual’s purchasing habits as much as occupation does. In this way, dimensionality reduction creates a reduced dataset that takes up less space.

Numerosity Reduction transforms your data into a mathematical model. A mathematical model describes the original data using numbers and equations. It is the model that is then stored in the cloud, not the original data itself. This model can use parameters to describe how the data objects relate to each other within the data set. Or it can group similar data objects together into clusters with objects that are not similar stored in separate clusters to create a set of multiple clusters.

Data Cube Aggregation simplifies how your data is stored. For example, consider a dataset with rainfall amounts for every month for the last five years. You could store this data by month, and then by year. A simpler way to store this data is by year. This simplification technique is important for very large data sets with multiple years (e.g. sports scores dating back to the 1960s) or with numerous data objects (e.g. census demographic data). Data cube aggregation reduces the amount of space needed to store your data, by simplifying how the data is stored.

Data Compression is like zipping a file. It uses a special code to reduce the amount of space that your data takes up. The type of data compression code you use depends on how you ultimately would like to retrieve this data. There are two main ways to compress data: either by restoring the original data to its exact original format (“lossless”) or by restoring data that is very similar – but not exact – to the original.

What are the Benefits of Data Reduction?

With data reduction, you can significantly reduce your dataset footprint in the cloud. A smaller dataset footprint means a lower cloud bill.

In most cases, data reduction does not affect the quality of your original data. The original data is restored from its reduced state without any loss.

Data reduction improves the efficiency of your data mining process. With data reduction, you can sift through vast amounts of data much faster. Data reduction streamlines your data mining process, speeding up how you obtain insights from your data. This translates to increased productivity for your company.

What are the Disadvantages of Data Reduction?

In some cases, however, there is some loss of data integrity with data reduction. You need to carefully select the type of data reduction technique that best suits the needs of your business to minimize data loss.

Although there are many benefits to data reduction, the native public cloud does not offer these services. With Silk, you get the flexibility of cloud computing plus the cost savings of data reduction.

Use Cases

Cloud Vendors