Scaling Up the Data Lake with Coordinating Nodes
You must have Root scope to use this feature.
You can scale a DP deployment by adding additional Data Lake and Data Analyzer worker nodes to achieve the higher data ingestion rates, greater storage capacity, and longer data retention in warm storage values described in Capacity Planning for Data Replication and Clustering
Once you've built out your cluster to the point where it includes six or more data nodes, Stellar Cyber recommends that you enhance cluster performance by adding a Data Lake Coordinating Node.
-
A Data Lake Coordinating Node (DL-CN) is a special type of Data Lake worker that does not store any ElasticSearch data itself, but instead focuses on coordinating data searches and requests from different DP services to the data nodes in the cluster. The DL-CN offloads that responsibility from the data nodes themselves, improving overall Data Lake performance.
-
You provision the node to be used as the DL-CN with less memory and disk space than a normal Data Lake worker. When you enable the use of Coordinating Nodes in the Data Lake Master's CLI, the master selects the coordinating node from the available pool based on this provisioning, choosing the node with less memory and disk space.
The figure below shows a cluster with six Data Nodes that has been provisioned with an additional Data Lake Worker operating as a Coordinating Node.
Data Nodes Defined
Data nodes are any Data Lake component that stores ElasticSearch data, as follows:
-
Data Lake Master nodes with the Maximized Data Storage (MDS) option enabled.
You can use the show mode command to see the current setting for the MDS option. The MDS option specifies whether the DL node stores data itself (enabled) or only manages storage and ElasticSearch operations (disabled). As you scale up to a cluster deployment with three or more DL-worker nodes, you disable MDS on the DL-Master and provision it with less disk space. The DL-worker nodes have MDS enabled and handle the actual storage while the DL-master provides storage management and search.
-
Data Lake Worker nodes that are not operating as Coordinating Nodes. Coordinating Nodes are not data nodes because they do not store ElasticSearch data.
Once your cluster includes six or more of the node types listed above, Stellar Cyber recommends adding a Data Lake Coordinating Node to improve performance.
Provisioning a Data Lake Coordinating Node
You provision a Data Lake Coordinating Node with less memory and disk space than a normal Data Lake worker. There are two main reasons for this:
-
The DL-CN doesn't require the same disk space as a typical DL-w because it doesn't store any ElasticSearch data itself.
-
A DL-w is only considered as a candidate for Coordinating Node status if it is provisioned with less memory and disk space than a standard DL-w.
With this in mind, Stellar Cyber recommends that you provision a Data Lake Coordinating Node using the same minimum specifications for a Data Analyzer node:
-
vCPUs – 16 (same as DL specification)
-
RAM – 64 GB (half of DL specification)
-
OS Disk – 500 GB (same as DL specification)
-
DL Disk Space – Not required
Eligibility for Coordinating Node Status – The Details
A DL-w's eligibility for Coordinating Node status is determined both by its provisioning and the value of the set mode coordinate
option in the DL-Master's CLI:
-
A DL-w is only considered a candidate to run as a Coordinating Node if its provisioned disk space and memory are less than 80% of the largest existing DL-w in the cluster. This is because the cluster always gives priority to creating new Data Lake Workers as data nodes rather than coordinating nodes. Stellar Cyber recommends provisioning a Coordinating Node using the DA specifications listed above, which use less memory and do not provision secondary DL storage.
-
The DL-Master only looks for Coordinating Node candidates when the
set mode coordinate
option is set to eitherdynamic
or an integer value in the DL-Master CLI.You can see the current setting of the
mode
option usingshow mode
in the DL-Master CLI. Refer to Adding a Data Lake Coordinating Node for details on the differentset mode coordinate
options.Once you've used
set mode coordinate
to specify eitherdynamic
or an integer value in the DL-Master CLI, the System | Data Lake | Node List identifies DL-CN candidates with an entry of true in the Coordinate Candidate column, as shown in the example below:
Adding a Data Lake Coordinating Node
You add a Data Lake Coordinating Node using the same general procedure you use to add any new worker node:
-
Launch and configure the VM for the DL-w, ensuring that it is provisioned using the specifications described in Provisioning a Data Lake Coordinating Node.
-
Configure the node in the CLI as a resource.
-
Add the node in the user interface, converting its resource role to Data Lake Worker.
-
Enable the use of Coordinating Nodes in the DL-Master's CLI. Connect to the DL-Master's CLI and use the set mode coordinate option to specify either dynamic or an integer value (refer to About Coordinating Mode Options).
You can find detailed examples on each of these steps for a cloud-based deployment starting with the procedures in this section. You can do the same with a physical appliance by changing its role to resource and then adding it to a cluster as a resource and reconfiguring it as a DL-w in the user interface.
About Coordinating Mode Options
You use the set mode coordinate [dynamic | <1..x>]
command on the DL-Master to enable the use of coordinating nodes in the cluster:
-
Dynamic – If you select this option, the system creates one Coordinating Node for every three data nodes in the cluster. If there are not sufficient nodes available for selection based on their provisioning, the system creates as many as it can up to the maximum of one for every three data nodes.
-
Integer <1..x> – If you state a specific number of Coordinating Nodes to create, the system attempts to create the number specified. If there are not sufficient nodes available for selection based on their provisioning, the system creates as many as it can up to the number specified.
Note the following:
-
The system always gives priority to creating data nodes. If a node is provisioned with sufficient resources to run as a data node, the DL-Master will not select it as a coordinating node even when
set mode coordinate
is enabled on the DL-Master. Re-provision the node with the resources listed in Provisioning a Data Lake Coordinating Node and try again. -
If you have added more coordinating node candidates to the cluster than are requested by the
set mode coordinate
option on the DL-Master, the excess DL-w candidates will be idle. For example, if you have added three DL Coordinating Code candidates to the cluster butset mode coordinate
is set to2
, one of the candidates will be idle.
Viewing Data Lake Coordinating Nodes in the User Interface
The System | Data Lake page lists Data Lake Coordinating Nodes in the following locations:
-
The Data Lake Configuration table includes a Running Coordinating Nodes column that indicates whether the Data Lake is running coordinating nodes.
-
You can click the entry for the Data Lake in the Node List column to see the actual nodes in the lake, including columns indicating whether each node is a candidate for Coordinating Node or is currently running as a Coordinating Node (see the illustration in Eligibility for Coordinating Node Status – The Details).