Create a Cluster on Microsoft Azure¶
-
POST/api/v2/clusters/¶
Use this API to create a new cluster when you are using Qubole on the Azure cloud. You create a cluster for a workload that has to run in parallel with your pre-existing workloads.
You might want to run workloads across different geographical locations or there could be other reasons for creating a new cluster.
Required Role¶
The following roles can make this API call:
- A user who is part of the system-user/system-admin group.
- A user invoking this API must be part of a group associated with a role that allows creating a cluster. See Managing Groups and Managing Roles for more information.
Parameters¶
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
| Parameter | Description |
|---|---|
| cloud_config | A list of labels that identify the cluster. At least one label must be provided when creating a cluster. |
| cluster_info | It contains the configurations of a cluster. |
| engine_config | It contains the configurations of the type of clusters |
| monitoring | It contains the cluster monitoring configuration. |
| security_settings | It contains the security settings for the cluster. |
cloud_config¶
| Parameter | Description |
|---|---|
| provider | It defines the cloud provider. Set azure when the cluster is created on QDS-on-Azure. |
| compute_config | It defines the Azure account compute credentials for the cluster. |
| location | It is used to set the geographical Azure location. eastus is the default location. The other locations are centralus, southcentralus, southeastasia, and westus. |
| network_config | It defines the network configuration for the cluster. |
| storage_config | It defines the Azure account storage credentials for the cluster. |
compute_config¶
| Parameter | Description |
|---|---|
| compute_validated | It denotes if the credentials are validated or not. |
| use_account_compute_creds | It is to use account compute credentials. By default, it is set to false. Set it to true to use account compute credentials. Setting it to ``true`` implies that the following four settings are not required to be set. |
| compute_client_id | The client ID of the Azure active directory application which has the permissions over the subscription. It is required when use_account_compute_creds is set to false. |
| compute_client_secret | The client secret of the Azure active directory application. It is required when use_account_compute_creds is set to false. |
| compute_tenant_id | The tenant_id of the Azure Active Directory. It is required when use_account_compute_creds is set to false. |
| compute_subscription_id | The subscription id of the azure account where you want to create the compute resources. It is required when use_account_compute_creds is set to false. |
network_config¶
| Parameter | Description |
|---|---|
| vnet_name | Set the virtual network. |
| subnet_name | Set the subnet |
| vnet_resource_group_name | Set the resource group of your virtual network. |
| bastion_node | It is the public IP address of bastion node to access private subnets if required. |
| persistent_security_group_name | It is the network security group name on the Azure account. |
| persistent_security_group_resource_group_name | It is the resource group of the network security group of the Azure account. |
storage_config¶
| Parameter | Description |
|---|---|
| disk_storage_account_name | Set your Azure storage account. You must only configure this parameter or managed_disk_account_type. |
| disk_storage_account_resource_group_name | Set your Azure disk storage account resource group name. |
| managed_disk_account_type | You can set it if you do not want to configure disk storage account details. Its accepted values are standard_lrs and premium_lrs. You must only configure this parameter or disk_account_storage_name. |
| data_disk_count | It is the number of reserved disks to be attached to each cluster node; so, for example, choosing a Data Disk Count of 2 in a four-node cluster will provision eight disks in all. |
| data_disk_size | It is used to set the Data Disk Size in gigabytes (GB). The default size is 256 GB. |
cluster_info¶
| Parameter | Description |
|---|---|
| label | A cluster can have one or more labels separated by a commas. You can make a cluster the default cluster by including the label “default”. |
| master_instance_type | To change the master node type from the default (Standard_A5), select a different type from the drop-down list. |
| slave_instance_type | To change the worker node type from the default (Standard_A5), select a different type from the drop-down list. |
| min_nodes | Enter the minimum number of worker nodes if you want to change it (the default is 1). |
| max_nodes | Enter the maximum number of worker nodes if you want to change it (the default is 1). |
| node_bootstrap | You can append the name of a node bootstrap script to the default path. |
| disallow_cluster_termination | Set it to true if you do not want QDS to terminate idle clusters automatically. Qubole recommends that you to set this parameter to false. |
| custom_tags | It is an optional parameter. Its value contains a <tag> and a <value>. |
engine_config¶
| Parameter | Description |
|---|---|
| flavour | It denotes the type of cluster. The supported values are: hadoop2, presto, and spark. |
| hadoop_settings | To change the master node type from the default (Standard_A5), select a different type from the drop-down list. |
| presto_settings | To change the worker node type from the default (Standard_A5), select a different type from the drop-down list. |
| spark_settings | Enter the minimum number of worker nodes if you want to change it (the default is 1). |
hadoop_settings¶
| Parameter | Description |
|---|---|
| custom_hadoop_config | The custom Hadoop configuration overrides. The default value is blank. |
| fairscheduler_settings | The fair scheduler configuration options. |
fairscheduler_settings¶
| Parameter | Description |
|---|---|
| fairscheduler_config_xml | The XML string, with custom configuration parameters, for the fair scheduler. The default value is blank. |
| default_pool | The default pool for the fair scheduler. The default value is blank. |
presto_settings¶
| Parameter | Description |
|---|---|
| presto_version | Specify the Presto version to be used on the cluster. The default version is 0.142.
The stable version that is supported is 0.157. |
| custom_presto_config | Specifies if the custom Presto configuration overrides. The default value is blank. |
spark_settings¶
| Parameter | Description |
|---|---|
| zeppelin_interpreter_mode | The default mode is legacy. Set it to user mode if you want the user-level
cluster-resource management on notebooks. See Configuring a Spark Notebook for more
information. |
| custom_spark_config | Specify the custom Spark configuration overrides. The default value is blank. |
| spark_version | It is the Spark version used on the cluster. The default version is 2.0-latest.
The other supported version is 2.1-latest. |
monitoring¶
| Parameter | Description |
|---|---|
| enable_ganglia_monitoring | Enable Ganglia monitoring for the cluster. The default value is, false. |
security_settings¶
| Parameter | Description |
|---|---|
| ssh_public_key | SSH key to use to login to the instances. The default value is none. (Note: This parameter is not visible to non-admin users.) The SSH key must be in the OpenSSH format and not in the PEM/PKCS format. |
Request API Syntax¶
If use_account_compute_creds is set to false, then it is not required to set compute credentials.
curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json" \
-d '{
"cloud_config" : {
"provider" : "azure"
"compute_config" : {
"compute_validated": "<default is ``false``/set it to ``true``>",
"use_account_compute_creds": false,
"compute_client_id": "<your client ID>",
"compute_client_secret": "<your client secret key>",
"compute_tenant_id": "<your tenant ID>",
"compute_subscription_id": "<your subscription ID>"
},
"location": {
"location": "centralus"
},
"network_config" : {
"vnet_name" : "<vpc name>",
"subnet_name": "<subnet name>",
"vnet_resource_group_name": "<vnet resource group name>",
"bastion_node_public_dns": "<bastion node public dns>",
"persistent_security_groups": "<persistent security group>",
"master_elastic_ip": ""
},
"storage_config" : {
"disk_storage_account_name": "<Disk storage account name>",
"disk_storage_account_resource_group_name": "<Disk account resource group name>",
//You can either configure "disk_storage_account_name" or "managed_disk_account_type"
"managed_disk_account_type":"<standard_lrs/premium_lrs>",
"data_disk_count":"<Count>",
"data_disk_size":"<Disk Size>"
}
},
"cluster_info": {
"master_instance_type": "Standard_A6",
"slave_instance_type": "Standard_A6",
"label": ["azure1"],
"min_nodes": 1,
"max_nodes": 2,
"cluster_name": "Azure1",
"node_bootstrap": "node_bootstrap.sh",
},
"engine_config": {
"flavour": "hadoop2",
"hadoop_settings": {
"custom_hadoop_config": <default is null>,
"fairscheduler_settings": {
"default_pool": <default is null>
}
}
},
"monitoring": {
"ganglia": <default is false/set it to true>,
}
}' \ "https://azure.qubole.com/api/v2/clusters"
Sample API Request¶
curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json"
-d '{
"cloud_config" : {
"provider" : "azure"
"compute_config" : {
"compute_validated": False,
"use_account_compute_creds": False,
"compute_client_id": "<your client ID>",
"compute_client_secret": "<your client secret key>",
"compute_tenant_id": "<your tenant ID>",
"compute_subscription_id": "<your subscription ID>"
},
"location": {
"location": "centralus"
},
"network_config" : {
"vnet_name" : "<vpc name>",
"subnet_name": "<subnet name>",
"vnet_resource_group_name": "<vnet resource group name>",
"persistent_security_groups": "<persistent security group>",
},
"storage_config" : {
"storage_access_key": "<your storage access key>",
"storage_account_name": "<your storage account name>",
"disk_storage_account_name": "<your disk storage account name>",
"disk_storage_account_resource_group_name": "<your disk storage account resource group name>"
"data_disk_count":4,
"data_disk_size":300 GB
}
},
"cluster_info": {
"master_instance_type": "Standard_A6",
"slave_instance_type": "Standard_A6",
"label": ["azure1"],
"min_nodes": 1,
"max_nodes": 2,
"cluster_name": "Azure1",
"node_bootstrap": "node_bootstrap.sh",
},
"engine_config": {
"flavour": "hadoop2",
"hadoop_settings": {
"custom_hadoop_config": "mapred.tasktracker.map.tasks.maximum=3",
}
},
"monitoring": {
"ganglia": true,
}
}' "https://azure.qubole.com/api/v2/clusters"