Adding a GCP Data Sink

You must have Root scope to use this feature.

You can add a GCP Data Sink from the System | Data Processor | Data Sinks page using the instructions in this topic. Adding a GCP Data Sink consists of the following major steps:

Get the storage bucket name in GCP.
Get the Service Account key in GCP.
Add the GCP Data Sink in Stellar Cyber.

Use our example as a guideline, as you might be using a different software version.

Getting the Bucket Name in GCP

To get the GCP bucket name:

Log in to your GCP Console at https://console.cloud.google.com/.
Type Cloud Storage into the search field at the top of the console.
Click the Cloud Storage entry in the list of Services that appears.

The Cloud Storage browser appears listing your existing buckets.
Locate the bucket you want to use as the destination for the data sink in Stellar Cyber and copy its name. You will need this value when configuring the Data Sink in Stellar Cyber.

Getting the Service Account Key in GCP

You must supply a Service Account Key with access to the GCP project that includes the destination bucket when you add the new GCP data sink in Stellar Cyber. The following procedure explains how to create a Service Account Key in GCP and download it as a .JSON file. You can also refer to the GCP online help for service account keys for details.

To create service account keys, your account must be either assigned the roles/iam.serviceAccountKeyAdmin role or have the iam.serviceAccountKeys.* permissions assigned.

Log in to your GCP Console at https://console.cloud.google.com/.
Navigate to the IAM & Admin | Service Accounts page.
Select the project from the list that includes the destination bucket for the data sink.
Navigate to the Keys tab and select Add key | Create new key.
Set Key type to JSON and click the CREATE button as illustrated below.

The private key file is saved to your computer in JSON format. Take note of the location – you will need to upload this file in Stellar Cyber when you add the GCP data sink.

Adding the GCP Data Sink in Stellar Cyber

To add a GCP Data Sink:

Click System | Data Processor | Data Sinks. The Data Sink list appears.
Click Create. The Setup Data Sink screen appears.
Enter the Name of your new Data Sink. This field does not support multibyte characters.
Choose GCPfor the Type.

Additional fields appear in the Setup Data Sink screen:
Supply the name of the bucket in the Bucket field. You identified this name in Getting the Bucket Name.
Supply the Service Account Key you identified in Getting the Service Account Key in GCP in the corresponding field.
Select the types of data to send to the Data Sink by toggling the following checkboxes:
- Raw Data – Raw data received from sensors, log analysis, and connectors after normalization and enrichment has occurred and before the data is stored in the Data Lake.
- Alerts – Security anomalies identified by Stellar Cyber using machine learning and third-party threat-intelligence feeds, reported in the Alerts interface, and stored in the aella-ser-* index.
- Assets – MAC addresses, IP addresses, and routers identified by Stellar Cyber based on network traffic, log analysis, and imported asset feeds and stored in the aella-assets-* index.
- Users – Users identified by Stellar based on network traffic and log analysis and stored in the aella-users-* index.
Alerts, assets, and users are also known as derived data because Stellar Cyber extrapolates them from raw data.
Click Next.

The Advanced (Optional) page appears.
Specify whether to partition records into files based on their write_time (the default) or timestamp.

Every interflow record includes both of these fields:
- write_time indicates the time at which the Interflow record was actually created.
- timestamp indicates the time at which the action documented by the Interflow record took place (for example, the start of a session, the time of an update, and so on).
When files are written to the Data Sink they are stored at a path like the following, with separate files for each minute:

In this example, we see the path for November 9, 2021 at 00:23. The records appearing in this file would be different depending on the setting of the Partition time by setting as follows:
- If write_time is enabled, then all records stored under this path would have a write_time value falling into the minute of UTC 2021.11.09 - 00:23.
- If timestamp is enabled, then all records stored under this path would have a timestamp value falling into the minute of UTC 2021.11.09 - 00:23.
In most cases, you will want to use the default of write_time. It tends to result in a more cost-efficient use of resources and is also compatible with future use cases of data backups and cold storage using a data sink as a target.
Enable the Compression option to specify that records be written to the Data Sink in compressed (gzip) format.

For most use cases, Stellar Cyber recommends enabling the compression option to save on storage costs. Compression results in file sizes roughly 1/10^th the size of uncompressed files.
Use the Batch Window (seconds) and Batch Size fields to specify how data is written to the sink.
- The Batch Window specifies the maximum amount of time that can elapse before data is written to the Data Sink.
- The Batch Size specifies the maximum number of records that can accumulate before they are sent to the Data Sink. You can specify either 0 (disabled) or a number of records between 100 and 10,000.
Stellar Cyber batches data to the Data Sink depending on whichever of these parameters is reached first.

So, for example, consider a Data Sink with a Batch Window of 30 seconds and a Batch Size of 300 records:
- If at the end of the Batch Window of 30 seconds, Stellar Cyber has 125 records, it sends them to the data sink. The Batch Window was reached before the Batch Size.
- If at the end of 10 seconds, Stellar Cyber has 300 records, it send the 300 records to the Data Sink. The Batch Size was reached before the Batch Window.
These options are primarily useful for data sink types that charge you by the API call (for example, AWS S3 and Azure). Instead of sending records as they are received, you can use these options to batch the records, minimizing both API calls and their associated costs for Data Sinks in the public cloud.

By default, these options are set to 30 seconds and 1000 records for S3 data sinks.
You can use the Retrieve starting from field to specify a date and time from which Stellar Cyber should attempt to write alert, asset, and user records to a newly created Data Sink. You can click in the field to use a handy calendar to set the time/date

Note the following:
- If you do not set this option, Stellar Cyber simply writes data from the time at which the sink is created.
- This option only affects alert, asset, and user records. Raw data is written from the time at which the sink is created regardless of the time/date specified here.
- If you set a time/date earlier than available data, Stellar Cyber silently skips the time without any available records.
You can use the Filter options to Exclude or Include specific Message Classes for the Data Sink. By default, Filter is set to None. If you check either Exclude or Include, an additional Message Class field appears where you can specify the message classes to use as part of the filter. For example:

You can find the available message classes to use as filters by searching your Interflow in the Investigate | Threat Hunting | Interflow Search page. Search for the msg_class field in any index to see the prominent message classes in your data.
Click Next to review the Data Sink configuration. Use the Back button to correct any errors you notice. When you are satisfied with the sink's configuration, click Submit to add it to the DP.
Click Submit.

The new Data Sink is added to the list.