With the rise of Artificial Intelligence (AI) and the proliferation of data lakes and object stores, some speculate that the era of large relational databases like Oracle and SQL Server is coming to an end. However, even though we’ve heard predictions about the decline of Relational Database Management Systems (RDBMS) for years, both Microsoft and Oracle reported record revenues in 2023. So, can we truly rely on these speculations?
Back to the Future
Relational databases remain crucial for AI’s future. Even Microsoft stated in their revenue report that it’s not confident they can protect data access. AI needs data, not just any data but critical information often housed in relational systems. While developers and data scientists might find this evident, many may not have experienced the role of a Database Administrator (DBA). DBAs are the trusted guardians of databases, ensuring data integrity, accessibility, and security. As the RDBMS era evolves into the AI age, DBAs transition to roles like “Site Reliability Engineer” or “Data Engineer.” These roles emphasize data governance and analytics, ensuring the right people have access to accurate data.
With AI and data science advancements, data sprawl, or the spreading of data copies in various formats, becomes a concern. Key questions arise:
- Who oversees critical data outside the secure environment of a relational database?
- Who ensures data security once it exits the relational domain?
- How can we identify the definitive source of vital data?
- How is data managed and monitored outside its original environment? Can AI unintentionally alter it?
- Are decisions being made based on unreliable data sources?
- Is crucial data shared with public AI systems?
These concerns highlight why data lakes remain popular. However, the unique benefits of relational databases ensure their continued relevance. Expect to see policies emphasizing data storage in systems that prioritize robust access controls and security.
Some of the most common security features in RDBMS databases, like Oracle and SQL Server are:
- Authentication and Authorization:
- User Authentication: Ensures that only authorized users can access the database. Authentication mechanisms may include username/password, multi-factor authentication, or integration with enterprise identity solutions.
- Role-Based Access Control (RBAC): Allows administrators to define roles and assign specific privileges to these roles. Users assigned to a role inherit its privileges.
- Data Encryption:
- At-rest Encryption: Encrypts data stored on disk to protect against unauthorized access or data breaches.
- In-transit Encryption: Encrypts data while it’s being transferred over networks using protocols like SSL/TLS.
- Track and log database activities to monitor who did what and when. This helps in forensics and identifying malicious or unintended activities.
- Data Masking and Redaction:
- Allows sensitive data to be obscured or masked for users, so they can’t view the actual data but can still perform their jobs (e.g., customer support might see only the last four digits of a credit card number).
- Object-level Security:
- Granular permissions can be set on tables, views, procedures, and other database objects. This ensures users can only access what they need to.
- Network Security:
- Database servers can be configured to accept connections only from specific IP addresses or networks.
- Firewalls and intrusion detection/prevention systems can further protect the database from external threats.
- Data Integrity:
- Features like constraints, triggers, and stored procedures ensure that data remains consistent and adheres to defined business rules.
- Backup and Recovery:
- Regular backups protect against data loss, and features like point-in-time recovery can help recover data after unintended deletions or changes.
- SQL Injection Prevention:
- Prepared statements or parameterized queries help prevent SQL injection attacks by separating SQL logic from the data being passed.
- Database Patches and Updates:
- Keeping the RDBMS software updated ensures vulnerabilities are patched, and the database is protected against known exploits.
- Replication and Failover:
- Replicating the database to secondary servers ensures data availability in case the primary server fails or is compromised.
- Anomaly Detection:
- Some advanced RDBMS or third-party tools can identify and alert on unusual or suspicious database activities.
- Data Classification and Governance:
- Classifying data based on its sensitivity (e.g., public, confidential, secret) can help in implementing appropriate security controls for different data sets.
Remove the Risk
All these features will play a central role in ensuring the data protection of company assets from modern and open AI/ML solutions that are easily assessable to those that may not understand the vulnerability to critical data AI represents. Although cloud data lake solutions may be the target for some formats, the need for security will often drive all critical data back INSIDE the database because of the concerns I’ve listed here. The best part is that the relational system products already have built the capabilities to handle this data into their newer releases.
- Oracle 23c has just announced Vector Database, specifically targeting machine learning and AI workloads. Oracle introduced JSON format data in datafiles in version 12c.
- SQL Server released Polybase as part of SQL Server 2016, handling JSON, Parquet and other modern machine learning flat files. Model parameter data, (.onnx, .h5, .pkl) can be stored as varbinary (MAX) data in SQL Server today.
- Both database platforms are incorporating more ways to be part of data lakes and larger strategies, understanding that data must be at the center of AI, but protected.
Digging down one more layer, we move to infrastructure, as services are unable to handle the sheer relational workload with those additional AI demands- we’re down to the storage solution, too.
Storage Becomes More Intelligent
Intelligent storage platforms, such as Silk, are becoming increasingly vital in the era of AI to handle the block storage layer. Traditional cloud block storage struggles to meet the demands AI imposes on relational database workloads, especially when considering the requirements already set by applications and analytics. By residing on native infrastructure, Silk offers an optimal solution. It isn’t reliant on third-party infrastructure in regional cloud data centers, yet it can handle the high IO demands that native block storage options like premium SSD, Premium SSD V2, and Ultra disk often fall short of.
LLMs can use zero-footprint provisioned snapshots for machine learning, reducing strain on the main relational database. After the task, results may be saved in parquet files, etc. and the thin snapshot is deleted, preventing data sprawl. For machine learning processes inside a relational system, Silk provides high-speed performance, surpassing usual cloud storage limits. The data remains within the RDBMS, ensuring proper data handling and security.
So, we now understand better ways to manage and secure data in the AI era. But how can we develop a private AI generative model, especially when public options like ChatGPT or Bard aren’t suitable due to sensitive data? Our next post will dive into this, offering tools to develop AI models that ensure data privacy.
Discover How AI Is Going to Affect the Way You Operate!
Join us on Nov 29th for a webinar presentation with Silk’s own Kellyn GormanSign Me Up!