Adjusting Baseline Provisioning for Actual Configuration

This topic explains how to fine-tune the provisioning of the Stellar Cyber Platform based on the specifics of your deployment's configuration, from data retention needs to the number of sensors managed.

Use this topic as follows:

Start with the baseline sizing guidance in Baseline Cluster Node VM System Requirements to size your deployment.
Adjust your deployment's provisioning based on your platform's configuration and load using the instructions in the sections below:

Adjusting for Data Retention Needs
Adjusting DA Provisioning for Data Sinks
Adjusting for Connectors
Adjusting for External API Activity
Adjusting for ATH Playbooks
Adjusting for Managed Sensor Volume
Adjusting for Additional Features

These sizing adjustments are cumulative unless stated otherwise. If you enable multiple features, add the resource requirements from each relevant section.

If you are deploying the Stellar Cyber Platform on a dedicated server running different virtual machines, make sure you guarantee platform performance by observing the rules in Preparing a Server for Stellar Cyber Cluster Deployment.

Platform Sizing Summary

The table below summarizes the considerations you must apply when provisioning Stellar Cyber Platform components. In all cases, start with the baseline sizing rules. Then, adjust your provisioning as summarized in the table below. See the sections in this topic for details.

Platform Component	Sizing Calculation
DL-Master Memory and CPU	Start with the baseline. Then, increase provisioning based on: Number of sensors Number of concurrent users
DL-Worker Memory and CPU	Start with the baseline. Then, increase provisioning based on retention time for disk space considerations.
DL Cluster Size (Overall Cluster Memory/CPU)	Start with the baseline. Then, increase provisioning based on: Concurrent user queries External API usage Retention requirements ATH Playbook usage Additional feature usage
DA Cluster Size (Overall Cluster Memory/CPU)	Start with the baseline. Then, increase provisioning based on: Data Sink usage Connector quantity Additional feature usage

Applying Additional Capacity Margin

The guidance in this topic assumes reservation of 30% additional capacity to account for product growth, feature changes, and real-world workload variation. If your deployment includes heavy query activity, many optional features, or unpredictable usage patterns, reserve additional capacity.

Adjusting for Data Retention Needs

The baseline configuration assumes 30 days of raw retention with three retention groups configured. If you increase retention time or configure more retention groups, you must either add more DL-Worker nodes or increase the storage of your existing DL-Worker nodes based on retention groups and retention time using the calculations below:

Retention groups increase compute overhead. Retention time increases storage requirements. Always size your DL-Workers based on the larger of the compute-bound and storage-bound results.

Retention Group Correction

The system supports up to five retention groups. Each additional group increases overhead. The sizing baseline assumes three retention groups.

RG = configured retention groups / 3

Example:

If you configure five retention groups, RG = 5 / 3 = 1.67.

Retention Time Correction

The sizing baseline assumes 30 days of raw data retention. Longer retention requires additional storage, and in some cases additional compute resources.

RT = configured retention days / 30

Example:

If you configure 90 days of retention, RT = 90 / 30 = 3.

Baseline DL-Worker Capacity

A standard DL-Worker node provides the following baseline resources:

128 GB memory
32 CPU cores
9.5 TB storage

This baseline configuration is designed to support:

250 GB daily ingestion
30 days retention

Option 1: Add More DL-Workers at Baseline Spec

If you keep each DL-Worker at the baseline specification, calculate the number of workers as follows:

DL-Workers = ((x / 250) + 1) × RG × RT

x = ingestion in daily ingestion
RG = retention group factor
RT = retention time factor

Option 2: Increase DL-Worker Resources

You can increase storage for a DL-Worker up to 16 TB. Each additional 250 GB of storage provides one additional day of retention beyond 30 days, assuming standard ingestion. Each increment also increases memory and CPU requirements.

DL-worker memory = 128 GB + ((Storage(TB) - 9.5) / 0.312) × 1.5 GB

DL-worker CPU = 32 + ((Storage(TB) - 9.5) / 0.312) × 0.5

At maximum capacity, a DL-Worker supports approximately 50 days of retention with 16 TB storage, 160 GB memory, and 44 CPU cores.

When you increase DL-Worker size, you must evaluate both compute capacity and storage capacity. The larger requirement determines the final sizing.

Maximum DL-Worker Calculation

Compute-bound = ((x / 250) + 1) × RG

Storage-bound = (x × 30 × RT) / (1000 × 12.8)

DL-Workers = max[compute-bound, storage-bound]

Examples

Baseline Case

Ingestion: 500 GB/day
RT = 1
RG = 1

DL-Workers = ((500 / 250) + 1) = 3

Result: 3 DL-Workers

High Retention and Groups

Ingestion: 500 GB/day
RT = 3 (90 days)
RG = 1.67 (five retention groups)

DL-Workers = ((500 / 250) + 1) × 1.67 × 3 ≈ 15

Result: 15 DL-Workers

Max DL-Worker Size

Compute-bound = ((500 / 250) + 1) × 1 = 3

Storage-bound = (500 × 30 × 3) / (1000 × 12.8) ≈ 4

Max of Compute-bound vs. Storage-bound = 4

Result: 4 DL-Workers

If you configure retention longer than 60 days and have more than two retention groups, disable the Maximized Data Storage (MDS) option on the DL-Master node, as described in Best Practices for High Availability.

For additional system-wide sizing considerations, see Platform Sizing Summary.

Adjusting DA Provisioning for Data Sinks

Each enabled data sink increases resource usage on the DA-Master and DA-Worker nodes. Apply these increases in addition to any other sizing changes.

CPU Requirement

Add 4 CPU cores to each DA node for each data sink. This requirement applies to both the DA-Master and DA-Worker nodes.

Memory Requirement

For each 250 GB/day of ingestion, add 16 GB memory to each DA node for each data sink.

Ingestion	Base Memory without Data Sinks	Memory with x Data Sinks	Base CPU without Data Sinks	CPU with x Data Sinks
250 GB/day	64 GB	64 + 16×x GB	32	32 + 4×x
500 GB/day	128 GB	128 + 32×x GB	64	64 + 8×x
1250 GB/day	320 GB	320 + 80×x GB	160	160 + 20×x

You can meet these requirements by scaling out with more nodes or by increasing the size of existing nodes.

For other DA adjustments, see Adjusting for Connectors and Adjusting for Additional Features.

Adjusting for Connectors

The baseline model assumes 25 connector units for every 250 GB/day of ingestion. One large connector can consume multiple connector units.

Each connector uses an average of 500 MB memory. For every 25 additional connectors beyond the baseline, reserve 12 GB more memory and four more CPU cores.

For example, if ingestion is 250 GB or less, you can configure up to 25 connectors with the baseline provisioning. For each additional set of 25 connectors, you must provision an additional 12 GB of memory and four CPU cores.

Ingestion	Included Connectors	Additional Memory	Additional CPU
250 GB/day	25	12 GB per additional 25 connectors	4 CPU per additional 25 connectors
500 GB/day	50	12 GB per additional 25 connectors	4 CPU per additional 25 connectors
1250 GB/day	125	12 GB per additional 25 connectors	4 CPU per additional 25 connectors

If your deployment uses many connectors, review connector count together with data sink and feature overhead. See Adjusting DA Provisioning for Data Sinks and Adjusting for Additional Features.

Adjusting for External API Activity

If the Stellar Cyber Platform is expected to handle a high volume of external API traffic (for example from scripts or other automation), you must increase system resources accordingly. The required adjustment depends on the API type and usage pattern.

ElasticSearch Query APIs

ElasticSearch query performance depends heavily on query characteristics, including the data being queried, aggregation depth, and overall query complexity.

Because these variables can differ significantly across environments, a fixed sizing model is not practical. Use average query behavior as a baseline and adjust resources further based on observed workload and performance.

Alert Index (SER) Queries

Queries to the Alerts (aella-ser-*) index typically have relatively low overhead and scale predictably across query rate and data volume.

Query Rate

Baseline: 1 SER query per second
For each additional SER query per second, increase system capacity by 5%

Data Volume

Baseline: 1 day of SER data across all tenants queried per second
For each additional unit of SER data queried per second, increase system capacity by 5%

Raw Data Queries

Raw data queries are resource-intensive and require more aggressive scaling.

A query that retrieves one month of raw data without aggregation can occupy system resources for approximately 60 seconds.

External raw data query capacity is measured by data retrieval rate:

One unit equals 15 minutes of data retrieved per second
For example, querying one month of data across all tenants requires a minimum interval of 2,880 seconds under baseline conditions.

For each additional unit of raw data query throughput, increase system capacity by 20%.

Raw data queries that include aggregation require additional resources beyond this baseline.

Case APIs

Case API overhead is primarily driven by the total number of cases stored in the system. As case volume increases, allocate additional resources to maintain performance and responsiveness.

Size the Stellar Cyber Platform based on the highest expected external API demand across these categories.

Adjusting for Concurrent User Sessions

Concurrent user sessions primarily affect the CPU and memory required to support interactive access to the Stellar Cyber Platform.

The first 25 concurrent UI sessions do not require additional capacity from the Platform's baseline configuration. For every additional set of 25 concurrent UI sessions, add:

8 GB memory
4 CPU cores

Examples

Up to 25 concurrent sessions – no additional resources required
26 to 50 concurrent sessions – add 8 GB memory and 4 CPU cores
51 to 75 concurrent sessions – add 16 GB memory and 8 CPU cores

In addition to the session count, you must also consider user-created queries. Query traffic generated by users in the user interface require the same resource adjustments as external API query traffic.

When session concurrency and query volume are both high, size the system based on the combined resource requirements.

Adjusting for ATH Playbooks

Automated Threat Hunting (ATH) Playbooks consume system resources based on the complexity and frequency of their queries. Resource usage varies significantly depending on how each playbook is defined and how often it runs.

To estimate the , evaluate each playbook based on its execution time relative to its run interval.

Playbook Weight

The relative impact of an ATH Playbook is expressed as its weight:

Weight = ES query execution time / playbook run interval

For example, if a playbook runs every hour and requires 36 seconds to complete:

Weight = 36 / 3600 = 0.01

This value represents the proportion of system query capacity consumed by that playbook.

Capacity Guidelines

The Stellar Cyber Platform can process multiple queries in parallel. To maintain stable performance, the combined weight of all ATH Playbooks must remain within a defined limit.

The total weight of all ATH Playbooks should not exceed 0.4.
If the total weight exceeds this threshold, increase system capacity.

You can reduce playbook weight by optimizing queries or increasing available compute resources.

Resource Scaling

If ATH Playbook demand exceeds system capacity, adjust provisioning as follows:

Increase Data Lake capacity to support additional query load.
Optimize or reduce playbook execution time where possible.

Memory Requirements

ATH Playbooks also increase memory requirements based on the total number of configured playbooks:

For every 300 ATH Playbooks, add 10 GB of memory.
If the total exceeds 1000 ATH Playbooks, you must add a Data Lake Coordinating node to your cluster to run the playbooks.

Example

Click here to see a detailed example working through these calculations with a sample set of ATH Playbooks

Assume you have the following ATH Playbooks configured:

Playbook A runs every 60 minutes and requires 36 seconds to complete.
Playbook B runs every 30 minutes and requires 18 seconds to complete.
Playbook C runs every 15 minutes and requires 9 seconds to complete.

Calculate the weight of each playbook:

Playbook A = 36 / 3600 = 0.01

Playbook B = 18 / 1800 = 0.01

Playbook C = 9 / 900 = 0.01

Total playbook weight:

Total weight = 0.01 + 0.01 + 0.01 = 0.03

In this example, the combined playbook weight is 0.03, which is below the recommended limit of 0.4. No additional query-related capacity is required.

Now assume the system has 650 ATH Playbooks configured in total.

Memory requirement:

650 / 300 = 2.17

Round up to the next full increment and add memory for 3 groups of 300 playbooks:

3 × 10 GB = 30 GB additional memory

In this example, the system remains within the recommended playbook weight limit, but still requires 30 GB of additional memory based on total ATH Playbook count.

When both playbook count and playbook complexity are high, size the system based on the combined impact of total weight and memory requirements.

Adjusting for Managed Sensor Volume

The number of managed sensors has a direct affect on system performance, particularly for Data Lake processing. As the quantity of managed sensors increases, you must provision additional CPU and memory to maintain consistent ingestion and coordination performance.

Capacity Guidelines for Sensors

The baseline configuration supports up to 1000 sensors without additional resource requirements. Beyond this baseline, you must scale system resources incrementally based on the total number of sensors using the following rule of thumb:

For every additional 1000 sensors, add:
- 20 GB of memory
- 4 CPU cores

Example

For example, a deployment with 2500 sensors has 1500 sensors more than the baseline value of 1000 and would require an additional 30 GB of memory and 6 CPU cores.

Adjusting for Additional Features

Some optional features add resource overhead beyond baseline sizing. Adjust your Platform's provisioning based on your usage of the features listed below.

Entity-Based Asset Licensing

If your deployment uses the new entity-based asset counting model (as opposed to the classic asset counting model), adjust your provisioning as follows:

For every 250 GB/day of ingestion, increase the total DA cluster provisioning as follows:

2 CPU cores
1 GB memory

InSyncs

If your deployment is integrated with ServiceNow using InSyncs, you must adjust your Data Lake provisioning by adding the following to all DL-Master and DL-Worker nodes:

2 GB memory
2 CPU cores

Actual overhead depends on the ServiceNow table update rate.

Webhook Ingestion

If your deployent ingests webhooks using the XDR connector, you must adjust the provisioning of all DA-Master and DA-Worker nodes:

Webhook Ingest adds overhead to the DA cluster.

Each webhook-ingest-fluentd replica requires 500 MB memory
Adding memory does not improve performance

Workers per Replica	CPU (millicores)	Approximate EPS
1	1000	1000
2	2000	1950
6	6000	5300

Horizontal scaling usually provides the best efficiency.

TIPv2

TIPv2 increases load on the DL cluster and Elasticsearch. You might need a dedicated coordinating node or a separate Elasticsearch service design.

Understanding Factors That Are Difficult to Model

Some workloads cannot be predicted precisely and can increase resource usage significantly:

Complex dashboard queries
Machine learning job memory usage

If you expect heavy dashboard use or large ML workloads, plan additional capacity beyond the baseline sizing model.

Guidance for Existing Appliance Deployments

If you use an existing appliance deployment, older specifications can still be supported in some cases. However, older appliance sizing leaves little or no room for additional features such as TIPv2 or Webhook Ingest.

As deployments grow, the DL-Master and DA-Master can require separate larger systems. In larger environments, a combined role on the same appliance may no longer be sufficient.