Data Durability and Availability in Stellar Cyber

Your data is important to you, and it's also important to us. This topic describes the different tools and techniques available to you in Stellar Cyber to ensure that your data is preserved and available no matter what the situation. This includes the following:

Data Backup and Restore
High Availability Using Warm Standby
Data Replication
High Availability Using Cold Standby

This topic links extensively to the Knowledge Base pages describing in detail how to implement these features. In general, data backup, standby, and restore features can all be found in the Data Management, Standby Management, and Data Sinks options under the System | Data Processor menu. You'll need Root scope to use these features. Data replication is configured in the CLI.

Note: Refer to Capacity Planning for Data Replication and Clustering for help provisioning your deployment for data replication and clustering in physical, AWS, and Azure environments.

Stellar Cyber Architecture

The Stellar Cyber Open XDR platform is built on a multi-node, multi-cluster architecture:

The Data Analyzer cluster performs data ingestion from sensors and connectors as well as enrichment.
The Data Lake cluster provides data storage and is also where you connect to the user interface.

Both the DA and DL clusters can span multiple nodes for scalability and reliability. Stellar Cyber's clustered architecture lends itself to easy configuration of standby servers for both DL and DA nodes, as well as replication of data across nodes in the cluster to ensure high data availability.

Data Backup and Restore

Data backup and restore is an easy first defense for data durability and is recommended for any deployment with a single DP and no replication:

Because any node can experience failures at any time, Stellar Cyber strongly recommends that you schedule regular data backups in environments where data replication is not configured. This ensures that existing data is not lost during a failure event.
You can configure a data backup and a configuration backup, or a single combined data/configuration backup. However, because you can configure different frequencies for each, Stellar Cyber recommends that you schedule separate data and configuration backups. This also helps ensure a more rapid backup of Configuration prior to performing an upgrade.

The performance of the Data Lake is reduced by up to 30% while the backup is in progress. Always schedule your backups during periods of low traffic.
Backed-up data is written to external storage such as an NFS filesystem or AWS S3.
Backups can be restored when recovering from a severe system failure.

In the event of an emergency, Stellar Cyber also maintains automatic daily configuration backups for the last four days on the local DL-Master. Refer to Automatic Local Configuration Backups for details.

High Availability Using Warm Standby

Configuring a warm standby node in the same cluster as your master node gives you an easy way to resume operations with all data intact in cases where the master node fails.

The standby node runs in warm standby mode in the same cluster as the master node.
The standby node automatically maintains a backup of the configuration from the master node.
If the master node somehow fails (for example, you are unable to connect to its user interface), you can access the CLI of the warm standby node and switch it over to the primary role. Although the switchover to the warm standby node requires manual intervention, it is also takes place quickly because it does not require lengthy data restoration.
You can also configure the warm standby node to operate as a contributor to your cluster as a worker node in the Data Lake, Data Analyzer, or both. This way, the warm standby node improves cluster performance while also providing a failsafe for switchover in the event of a failure on the master node.

Data Replication

For cluster deployments with multiple data lake worker nodes, you can enable data replication so that multiple copies of your data are stored on different appliances in real time. This way, fault tolerance is guaranteed if one of the worker nodes in your cluster goes down:

Data stored on a worker node is not accessible if the node goes down.
If you have multiple data lake worker nodes, you can enable data replication to provide fault tolerance.
Stellar Cyber supports replication of one extra copy of your data. The copy is fully distributed in the cluster.
Although replication is great for data availability, it does affect both performance and storage capacity for the entire cluster. Enabling replication causes a roughly 33% performance reduction and reduces total data storage capacity by at least 50% with 1-1 replication.
You enable data replication from the CLI using the set mode replica <data type> command, where <data type> indicates the data indices to replicate. You can specify security, traffic, syslog, windows, linux, scan, monitor, ids, snmp, aws, none, or full.

High Availability Using Cold Standby

You can also ensure high availability by configuring cold standby on a second appliance. A cold standby is a separate DP cluster that shares external storage with the primary DP cluster for data and configuration backup. If the primary DP cluster goes down, traffic can be directed to the cold standby for continued operation.

Cold standby addresses situations where the entire DP cluster is down. However, it does require external load-balancing configuration to work correctly.

Best Practices for High Availability

Stellar Cyber recommends you use the following best practices to ensure high availability for your data:

Schedule a regular configuration backup. You can back up your configuration using NFS or object storage from AWS, Azure, or Oracle Cloud.
Schedule a regular configuration and data backup if there is only one DP appliance in the deployment .

The performance of the Data Lake is reduced by up to 30% while the backup is in progress. Always schedule your backups during periods of low traffic.
- Schedule a daily data backup if you have a cluster without a data replica. If you do have a data replica but want additional disaster recovery, schedule a daily or weekly data backup.
- Back up data manually, as needed. For example, before an upgrade.
For improved performance and reliability, Stellar Cyber recommends building clusters of at least three nodes. Once the cluster scales to two nodes, use a dedicated DL-Master node without Maximized Data Storage (MDS) enabled to ensure high availability.

The MDS option specifies whether the node stores data itself (enabled) or only manages storage and ElasticSearch operations (disabled). As you scale up to a cluster deployment with two or more DL-worker nodes, you disable MDS on the DL-master and provision it with less disk space. The DL-worker nodes handle the actual storage while the DL-master provides storage management and search.
Warm standby nodes for both the DA-master and DL-master are strongly recommended. Without these nodes, the cluster can't operate, so ensuring that a warm standby is readily available for each is crucial to designing a high-availability deployment.

The exception to this rule is for the DA-master node in cloud deployments. The DA master can be restarted in the cloud more quickly than the warm standby can take over operations.
Data replication and multi-node clustering is recommended to allow continued operation without data loss and downtime if one node in the cluster fails. Note that there is a 50% storage reduction and 30% computation overhead with replication enabled.