Conversations about data security tend to diverge into three main threads:
- How can we protect the data we store on our on-premises or cloud infrastructure?
- What strategies and tools or platforms can reliably backup and restore data?
- What would losing all this data cost us, and how quickly could we get it back?
All are valid and necessary conversations for technology organizations of all shapes and sizes. Still, the average company uses 400+ SaaS applications. The same report also uncovered that 56% of IT professionals aren’t aware of their data backup responsibilities. This is alarming, given that 84% of survey respondents said at least 30% of their business-critical data lives inside SaaS applications.
SaaS data isn’t like on-premises or cloud data because you have no ownership over the operating environment and far less ownership of the data itself. Due to those restrictions, creating automated backups, storing them in secure environments, and owning the restoration process is a far more complicated engineering task.
That inflexibility leads organizations to develop workarounds and manual processes to back up SaaS data, leaving them in far less secure environments—a shame because your backups are almost as valuable to attackers as your production data. Organizations that treat SaaS data with less care, even in light of double-digit growth in the usage of SaaS apps, are handing over the keys to their kingdom in more obvious ways than they might expect. With the threat of data loss looming, what is the cost to your business if you don’t move quickly to build a SaaS data recovery plan?
The valuable secrets hiding in plain sight
Let’s illustrate a common scenario: Your team has a single GitHub organization where your entire engineering team collaborates on development and deployment projects on several private repositories.
Now, let’s tweak that illustration with a less common addition: You have backups for all of your GitHub data, which includes not only the code in each of those repositories but also metadata like pull request reviews, issues, project management, and more.
In this case, your GitHub backup data won’t contain passwords or personally identifiable information (PII) about your employees besides what they’ve already made public on their GitHub profile. It also wouldn’t allow an attacker to move laterally to your production servers or services because they haven’t yet found their attack vector or point of intrusion. You’re still not, however, out of the woods—backup data of all kinds does contain information attackers can learn from, creating an inference of how your production environment does operate.
Every insecure backup and clone of your private code is remarkably valuable if the attacker only aims to steal intellectual property (IP) or leak confidential information about upcoming features, partnerships, or mergers and acquisitions activity to competitors or for financial fraud.
Your Infrastructure as Code (IaC) and CI/CD configuration files would also be of particular interest, as they identify the topology of your infrastructure, expose your testing infrastructure and deployment stages, and reveal all the cloud providers or third-party services your production services rely on. These configuration files rely on secrets such as passwords or authentication tokens. Even if you’re using a secret management tool to obfuscate the actual content of said secrets from being version-controlled on GitHub, an attacker will be able to quickly identify where to look next, be that Hashicorp Vault, AWS Secrets Manager, Cloud KMS, or one of the many alternatives.
Because you’re also backing up your metadata in this illustration, an insecure implementation leaves your pull requests and issue comments, which you have otherwise hidden inside your private GitHub repositories, available for an attacker to explore. They’ll quickly learn who has privileges to approve and merge code into each repository and explore checklists for deployment or remediation to identify weaknesses.
With this information, they can craft a far more targeted attack, either directly against your infrastructure or using social engineering methods, like pretexting, on employees they now understand to have admin-level privileges.
Why are secure backups—especially of SaaS data—more critical than ever?
In short, SaaS data has never been more critical to your organization’s hour-by-hour operations. Whether you’re using a code collaboration platform like GitHub, productivity tools like Jira, or even leveraging Confluence as the core provider (and dependency) of an entire brand, you’re beholden to environments you don’t own, with data management practices you can’t fully control, just to keep the lights on.
SaaS data is uniquely vulnerable because, unlike on-premises data, there are two stakeholders: your provider and you. Your provider could experience data loss, like when GitLab lost 300GB of user data in just a few seconds when an engineer wrote over their production database. You could make an honest mistake, like accidentally deleting your instance or uploading a CSV that instantly corrupts every facet of your data.
Awareness is a major concern. In a 2023 report from AppOmni, 85% of the IT and cybersecurity experts they surveyed claimed there is no security problem around SaaS. Yet, 79% of those same folks admitted their organization had identified at least one SaaS-based cybersecurity threat in the last 12 months. The most common incidents were vulnerabilities in user permissions, data exposure, a specific cyber attack, and human error.
At the same time, a report by Oracle and analyst firm ESG uncovered that only 7% of chief information security officers (CISOs) said they fully understand the Shared Responsibility Model, which puts the onus of data security on the user rather than the SaaS provider. 49% of respondents also stated that confusion around that model has resulted in data loss, unauthorized access to data, and even compromised systems.
The answer to any fears about the security of backed-up data is not to ignore backups altogether.
What to look for in a secure SaaS data backup provider
As you explore the landscape of platforms that allow you to backup and restore data from those mission-critical SaaS apps, you should carefully validate these must-haves:
- Automation: No surefire backup involves manual processes—the backup process should automatically create incremental daily backups using a delta or diffing algorithm. Every manual process, such as leveraging an open-source backup script that hasn’t been updated in years, or even a simple task like writing a cron job to run a backup script every Tuesday at 11:59pm, creates potential points of failure.
- Comprehensiveness: The GitHub example is uniquely good at illustrating the difference between data (your code) and metadata (the conversations your engineers have around your code), but many SaaS apps have similar data hierarchies. If a backup solution can’t protect all your data, then in the case of a data loss disaster, you’ll have only a half-hearted recovery plan and a lot of manual work to get back up to speed.
- Encryption: Insist on AES-256-bit encryption, both at rest and in transit, for all your SaaS data backups. The provider should also support SSO so you can manage users and their privileges using a centralized identity provider.
- Data compliance: Details like SOC 2 Type 2 reports, which detail a backup platform’s security controls, can give you assurances about how seriously they take protecting the sensitive data in your backups. Though you don’t need it currently, features like data residency demonstrate that they have designed a sophisticated infrastructure with the correct policies for multiple regions.
- Observability: You can’t fully control what happens to your organization’s data. The next best thing is knowing exactly who, when, and what was accessed or changed in your backup data as soon as it happens. A real-time audit log will help you catch intrusions quickly and make the right remediation before an attack has time to breach your data.
The unique threats to SaaS data are rapidly expanding. Even the tools we think are designed to uncover inefficiencies or automate work we’d rather not do, like third-party AI agents, could be massive data loss incidents in disguise—ones we’ll certainly hear about in the months and years to come.
When you give an AI write access to your SaaS platforms, it might innocently corrupt all your mission-critical data at GPU-accelerated speed. When reports of these situations start popping up en masse, you’ll be glad you tucked your SaaS data away where no one—an attacker or a lost AI—can read it. You’ll be doubly glad it’s also safe and sound when you need it most.