Backup

A data backup is the result of copying or archiving files and folders for the purpose of being able to restore them in case of data loss.

Data loss can be caused by many things ranging from computer viruses to hardware failures to file corruption to fire, flood, or theft (etc). If you are responsible for business data, a loss may involve critical financial, customer, and company data. If the data is on a personal computer, you could lose financial data and other key files, pictures, music, etc that would be hard to replace.

Backup refers to the copying of physical or virtual files or databases to a secondary location for preservation in case of equipment failure or catastrophe. The process of backing up data is pivotal to a successful disaster recovery plan.

Enterprises back up data they deem to be vulnerable in the event of buggy software, data corruption, hardware failure, malicious hacking, user error or other unforeseen events. Backups capture and synchronize a point-in-time (PIT) snapshot that is then used to return data to its previous state.

What data should be backed up and how frequently?

A backup process is applied to critical databases or related line-of-business (LOB) applications. The process is governed by predefined backup policies that specify how frequently the data is backed up and how many duplicate copies (known as replicas) are required, as well as by service-level agreements (SLAs) that stipulate how quickly data must be restored.

Best practices suggest a full data backup should be scheduled to occur at least once a week, often during weekends or off-business hours. To supplement weekly full backups, enterprises typically schedule a series of differential or incremental data backup jobs that back up only the data that has changed since the last full backup took place.

Backup storage media

Enterprises typically back up key data to dedicated backup disk appliances. Backup software — either integrated in the appliances or running on a separate server — manages the process of copying data to the disk appliances. Backup software handles features such as data deduplication that shrinks the amount of data that must be backed up. Backup software also enforces policies that govern how often specific data is backed up, how many copies are made and where backups are stored.

Before disk became the main backup medium in the early 2000s, most organizations used magnetic tape drive libraries to store backups. Tape is still used today but mainly for archived data that does not need to be quickly restored.

In the early days of disk backup, the software continued to run on separate servers and moved data to disk instead of tape. As file sizes have increased, backup vendors have brought integrated data protection appliances to simplify the backup process. An integrated data appliance is essentially a file server outfitted with hard disk drives (HDDs) and backup software. These plug-and-play data storage devices often include automated features for monitoring disk capacity, expandable storage and preconfigured tape libraries.

Most disk-based backup appliances allow copies to be moved from spinning media to magnetic tape for long-term retention. Magnetic tape systems are still used because of increasing tape densities and the rise of the Linear Tape File System (LTFS).

Early disk backup systems were known as virtual tape libraries (VTLs) because they included disk that worked the same way as tape drives. That way, backup software applications developed to write data to tape could treat disk as a physical tape library. VTLs faded from popular use after backup software vendors optimized their products for disk instead of tape.

Solid-state drives (SSDs) are rarely used for data backup because of price and endurance concerns. Some storage vendors include SSDs as a caching or tiering tool for managing writes with disk-based arrays. Data is initially cached in flash storage and then written to disk. As vendors release SSDs with larger capacity than disk drives, flash drives may gain some use for backup.

Local backup vs. offline backup for primary storage

Modern primary storage systems have evolved to feature stronger native capabilities for data backup. These features include advanced RAID protection schemes, unlimited snapshots and tools for replicating snapshots to secondary backup or even tertiary off-site backup. Despite these advances, primary storage-based backup tends to be more expensive and lacks the indexing capabilities found in traditional backup products.

Local backups place data copies on external HDDs or magnetic tape systems, typically housed in or near an on-premises data center. The data is transmitted over a secure high-bandwidth network connection or corporate intranet.

One advantage of local backup is the ability to back up data behind a network firewall. Local backup is also much quicker and provides greater control over who can access the data.

Offline or cold backup is similar to local backup, although it is most often associated with backing up a database. An offline backup incurs downtime since the backup process occurs while the database is disconnected from its network.

Backup and cloud storage

Off-site backup transmits data copies to a remote location, which can include a company’s secondary data center or leased colocation facility. Increasingly, off-site data backup equates to subscription-based cloud storage as a service, which provides low-cost, scalable capacity and eliminates a customer’s need to purchase and maintain backup hardware. Despite its growing popularity, electing backup as a service (BaaS) requires users to encrypt data and take other steps to safeguard data integrity.

Cloud backup is divided into the following:

Public cloud storage: Users ship data to a cloud services provider, which charges them a monthly subscription fee based on consumed storage. There are additional fees for ingress and egress of data. Amazon Web Services (AWS), Google Cloud and Microsoft Azure are the largest public cloud providers. Smaller managed service providers (MSPs) also host backups on their clouds or manage customer backups on the large public clouds.
Private cloud storage: Data is backed up to different servers within a company’s firewall, typically between an on-premises data center and a secondary DR site. For this reason, private cloud storage is sometimes referred to as internal cloud storage.
Hybrid cloud storage: A company uses both local and off-site storage. Enterprises customarily use public cloud storage selectively for data archiving and long-term retention. They use private storage for local access and backup for faster access to their most critical data.

Backup types defined

Full backup captures a copy of an entire data set. Although considered to be the most reliable backup method, performing a full backup is time-consuming and requires a large number of disks or tapes. Most organizations run full backups only periodically.

Incremental backup offers an alternative to full backups by backing up only the data that has changed since the last full backup. The drawback is that a full restore takes longer if an incremental-based data backup copy is used for recovery.

Differential backup copies data changed since the last full backup. This enables a full restore to occur more quickly by requiring only the last full backup and the last differential backup. The downside is that progressive growth of differential backups tends to adversely affect your backup window. A differential backup spawns a file by combining an earlier complete copy of it with one or more incremental copies created at a later time. The assembled file is not a direct copy of any single current or previously created file, but rather synthesized from the original file and any subsequent modifications to that file.

Incremental-forever backups minimize the backup window, while providing faster recovery access to data. An incremental-forever backup captures the full data set and then supplements it with incremental backups from that point forward. Backing up only changed blocks is also known as delta differencing. Full backups of data sets are typically stored on the backup server, which automates the restoration.

Reverse-incremental backups are changes made between two instances of a mirror. Once an initial full backup is taken, each successive incremental backup applies any changes to the existing full backup. This essentially generates a novel synthetic full backup copy each time an incremental change is applied, while also providing reversion to previous full backups.