Archiving and Restoring Data in Azure

Microsoft Azure offers different storage tiers depending on the time required to access. For infrequently accessed data, you may want to take advantage of a lower-cost Archive tier for Stellar Cyber data in cold storage. This topic covers the following tasks for external data management in Microsoft Azure:

Obtaining the archive-cli.py Script

The archive-cli.py script is available on Stellar Cyber's open GitHub repository at the following link:

System Requirements

The Stellar Cyber cold storage archive script has the following prerequisites for use in Azure:

  • Python 3 (3.7 or greater)

  • Azure CLI installed on machine used to run script.

  • Azure CLI Storage Preview Extension. Install with the following command in the Azure CLI:

    az extension add --name storage-blob-preview

  • Prior to script usage, run az login in Azure CLI to establish connection to Azure.

Configuring Lifecycle Management to Archive Cold Storage Data

The following procedure describes how to configure lifecycle management so that data stored in Azure Blob Storage is automatically moved to the Archive Tier after a specified number of days:

  1. Create an Azure blob and configure it as cold storage in Stellar Cyber.

  2. Configure a lifecycle rule in Azure that automatically moves data with the archive tag from regular Azure Blob Storage to the Archive tier:

    1. Log in to the Azure Console, navigate to Data management | Lifecycle management, and click the Add a rule option in the toolbar, as illustrated below..

    2. The Lifecycle rule works by moving data with the StellarBlobTier key set to Archive to the Archive tier in Azure. Set the options in the Details tab of the Add a rule wizard as illustrated below and click Next:

      • Rule name: archive-cold in this example

      • Rule scope: Limit blobs with filters

      • Blob type: Block blobs

      • Blob subtype: Base blobs

    3. Set the following options in the Base blobs tab and click Next:

      • Set If to base blobx haven't been modified in x days. Specify a value for x that makes sense for your organization. This is the number of days data matching the archive tag exists in Azure Blob Storage before it is transitioned to the Archive tier. In this example, we have set this option to 1 days.

      • Set Then to Move to archive storage.

    4. Set the following options in the Filter set tab and click Add:

      • Set Blob prefix to <container name>/stellar_data_backup/indices/

      • Set the Blob index match Key/Value pair to StellarBlobTier == Archive

Add External Cold Storage and Disable Cold Storage Deletion

  1. Add your Azure blob as cold storage in Stellar Cyber.

  2. In Stellar Cyber, disable the deletion of data in cold storage by navigating to the System | Data Management | Retention Group page and setting the cold retention times artificially high (for example, 100 years, as in the example below). You must do this in order to ensure that data can be moved back and forth between archival storage and regular storage and still imported to Stellar Cyber.

Running archive-cli.py Daily to Tag Blobs for Archiving

Now that you have a lifecycle rule that transfers data matching the archive tag to the Archive tier after a specified number of days, you need to use the archive-cli.py script to apply the archive tag to data in the Stellar Cyber cold storage backup in Azure Blob Storage.

  1. Log in to the Azure CLI with the az cli command.

  2. Use the device code and URL returned by the az cli command to verify your login.

  3. If you have multiple Azure subscriptions, select the correct one with the following commands:

    1. View your subscriptions with the az account list --output table command.

    2. Select the correct subscription from the returned list with az account set --subscription "<subscription-id>" command.

  4. Verify you can see your Azure blob with the following command:

    az storage blob list --container-name <container> --account-name <account>

    Substitute the name of your Azure storage container and account for the corresponding fields shown in bold.

  5. Type the following command to see the arguments for the archive-cli.py script:

    $ python3 archive-cli.py -h

  6. Set up a cron job that runs the archive-cli.py script on a daily basis with the arguments below:

    > python archive-cli.py azure --account-name <account> --container-name <container> tag --included-prefix 'stellar_data_backup/indices/' --src-tier hot --dst-tier archive

Verifying Tagged Data

Once you run the archive-cli.py script using the syntax above, you can verify that data has been tagged with StellarBlobTier=Archive by using the Azure Console to look at the properties of any of the files in the /indices folder, as illustrated below:

Verifying Data in Archive Tier

Once data has been tagged with StellarBlobTier=Archive by the archive-cli-py script, the lifecycle management rule you configured at the start of this procedure begins to move it to the ArchiveTier in accordance with the rule you specified. So, in our example, we set a value of 1 day as the length data can sit in this bucket unchanged before transitioning to the Archive Tier. The illustration below shows data moved to the Archive Tier automatically by our lifecycle management rule:

Restoring Data from Archive Tier to Hot Tier

You can use the archive-cli.py script to restore data stored in the Archive Tier to the Hot Tier, after which it can be imported to Stellar Cyber.

Depending on your analysis needs and the volume of data to be moved, you can use either of the following techniques to restore data from the Archive Tier:

Both procedures are described below, as well as a description of how restored data appears in the Azure Console.

Restoring Selected Indices from the Archive Tier

Use the following procedure to import data from the Archive Tier to the Hot Tier so it can be imported from cold storage in Stellar Cyber:

  1. If you have a regular cron job running to tag data using archive-cli.py, stop it now.

  2. Start a restore of all indices from the Archive Tier to the Hot Tier using the following command:

    python3 archive-cli.py azure --account-name <account> --container-name '< container>' restore --included-prefix 'stellar_data_backup/indices/'

    Substitute the name of your Azure account and storage container for the corresponding fields in bold.

  3. Before the restore completes, navigate to System | Data Management | Cold Storage Imports and start an import from your Azure cold storage.

  4. The import fails because the data has not yet been fully restored from the Archive Tier. However, the import should still show the index names it attempted to retrieve in the System | Data Management | Cold Storage Imports tab. You can use these external index names to retrieve the internal index IDs that the archive-cli.py script needs to restore specific data:

    1. Navigate to the System | Data Management | Cold Storage Imports tab in Stellar Cyber.

    2. Click the Change Columns button and add the Index column to the display if it is not already enabled.

    3. Copy the index name to be restored from the archive tier. This is the external index name.

    4. Run the archive-cli.py script with the following syntax to retrieve the internal index IDs associated with the external index IDs you copied in the previous step:

      > python archive-cli.py azure --account-name <account> --container-name <container> get-prefix "aella-syslog-1624488492158-,aella-syslog-1627512494132-"

      Substitute the name of your Azure account and storage container for the corresponding fields in bold. In addition, substitute the external names of the indices for which you are querying for the two aella-syslog-... entries. You can query for multiple indices separated by commas and surrounded by double-quotation marks. The following example illustrates the internal index IDs returned by the script based on the provided external index IDs:

  5. Once you have the internal index IDs, you can restore them from the Archive tier to the Hot Tier:

    1. Restore blobs matching the specified prefix from the Archive tier with the following command:

      > python archive-cli.py azure --account-name <account> --container-name <container> restore --included-prefix 'stellar_data_backup/indices/<internal_index_id>/'

      Substitute your Azure account, container name, and internal index ID for the values shown in bold in the command above.

    You can also run the previous commands as a script to retrieve the contents of multiple indices. For example:

    Restoring from Archive Tier to Hot Tier for i in <internal_index1> <internal_index2>; do
       python archive-cli.py azure --account-name <account> --container-name <container>
     restore --included-prefix "stellar_data_backup/indices/$i/"
    done
  6. Return to the System | Data Management | Cold Storage Imports tab in Stellar Cyber and delete the failed imports (those listed with a red icon in the Status column).

  7. Re-import when the selected indices are restored successfully to the Hot Tier.

Restoring All Data from the Archive Tier

Use the following procedure to restore all data from the Archive tier to the Hot Tier so it can be imported from cold storage in Stellar Cyber:

  1. If you have a regular cron job running to tag data using archive-cli.py, stop it now.
  2. Restore all indices from the Archive Tier to the Hot Tier using the following command:

    python3 archive-cli.py azure --account-name <account> --container-name '< container>' restore --included-prefix 'stellar_data_backup/indices/'

    Substitute the name of your Azure account and storage container for the corresponding fields in bold.

Appearance of Restored Data in Azure Console

Restoring data from the Archive Tier to the Hot Tier takes several days, depending on the volume of data to be restored. The following illustrations show how the transition from the Archive Tier to the Hot Tier appears in the Azure Console.

Initially, restored files appear with an Archive Status of Rehydrate Pending to Hot:

When rehydration completes, all files in the /indices folder show an entry of Hot in the Access Tier and no entry in the Archive Status column. At this point, the files can be imported into Stellar Cyber.

Returning Data to Archival Storage

Once you have finished your analysis of data restored from the Archive tier, you can use the steps below to delete the data from within Stellar Cyber and return it to the Archive tier in Azure:

Tagging Specific Indices as Archive

If you restored specific indices from the Archive Tier to the Hot Tier, you can use the following procedure to retag them as archive so that they are returned to the Archive Tier by lifecycle management:

  1. Use the archive-cli.py script to retag the data imported to regular storage as archive so that it is returned to the Archive tier by lifecycle management:

    1. Run the archive-cli.py script to retrieve the internal index IDs of the data to be returned to the Archive tier, substituting your Azure account, container name and the external index names from the user interface:

      > python archive-cli.py azure --account-name <account> --container-name <container> get-prefix "aella-syslog-1624488492158-,aella-syslog-1627512494132-"

    2. Run the archive-cli.py script to archive the identified indices with StellarBlobTier=Archive:

      Syntax for a single index:

      > python archive-cli.py azure --account-name <account> --container-name <container> archive --included-prefix 'stellar_data_backup/indices/<external_index_id>/'

      Syntax using a shell script for multiple indices:

      for i in <internal_index1> <internal_index2>; do
       python archive-cli.py azure --account-name <account> --container-name <container> archive --included-prefix "stellar_data_backup/indices/$i/"
      done

Tagging All Data in a Specific Container as Archive

If you restored all data from the Archive Tier to the Hot Tier, you can use archive-cli.py with the following syntax to retag it as archive so that it is returned to the Archive Tier by lifecycle management::

> python archive-cli.py azure --account-name <account> --container-name <container> tag --included-prefix 'stellar_data_backup/indices/' --src-tier hot --dst-tier archive

Deleting Imports from Stellar Cyber

Once you have completed your analysis in Stellar Cyber, you can remove the imports:

  1. Navigate to the System | Data Management | Cold Storage Imports tab in Stellar Cyber and delete the imports.