Create a Cluster on Microsoft Azure

POST /api/v2/clusters/

Use this API to create a new cluster when you are using Qubole on the Azure cloud. You create a cluster for a workload that has to run in parallel with your pre-existing workloads.

You might want to run workloads across different geographical locations or there could be other reasons for creating a new cluster.

Required Role

The following roles can make this API call:

  • A user who is part of the system-user/system-admin group.
  • A user invoking this API must be part of a group associated with a role that allows creating a cluster. See Managing Groups and Managing Roles for more information.

Parameters

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter Description
cloud_config A list of labels that identify the cluster. At least one label must be provided when creating a cluster.
cluster_info It contains the configurations of a cluster.
engine_config It contains the configurations of the type of clusters
monitoring It contains the cluster monitoring configuration.
security_settings It contains the security settings for the cluster.

cloud_config

Parameter Description
provider It defines the cloud provider. Set azure when the cluster is created on QDS-on-Azure.
compute_config It defines the Azure account compute credentials for the cluster.
location It is used to set the geographical Azure location. eastus is the default location. The other locations are centralus, southcentralus, southeastasia, and westus.
network_config It defines the network configuration for the cluster.
storage_config It defines the Azure account storage credentials for the cluster.

compute_config

Parameter Description
compute_validated It denotes if the credentials are validated or not.
use_account_compute_creds It is to use account compute credentials. By default, it is set to false. Set it to true to use account compute credentials. Setting it to ``true`` implies that the following four settings are not required to be set.
compute_client_id The client ID of the Azure active directory application which has the permissions over the subscription. It is required when use_account_compute_creds is set to false.
compute_client_secret The client secret of the Azure active directory application. It is required when use_account_compute_creds is set to false.
compute_tenant_id The tenant_id of the Azure Active Directory. It is required when use_account_compute_creds is set to false.
compute_subscription_id The subscription id of the azure account where you want to create the compute resources. It is required when use_account_compute_creds is set to false.

network_config

Parameter Description
vnet_name Set the virtual network.
subnet_name Set the subnet
vnet_resource_group_name Set the resource group of your virtual network.
bastion_node It is the public IP address of bastion node to access private subnets if required.
persistent_security_group_name It is the network security group name on the Azure account.
persistent_security_group_resource_group_name It is the resource group of the network security group of the Azure account.

storage_config

Parameter Description
disk_storage_account_name Set your Azure storage account. You must only configure this parameter or managed_disk_account_type.
disk_storage_account_resource_group_name Set your Azure disk storage account resource group name.
managed_disk_account_type You can set it if you do not want to configure disk storage account details. Its accepted values are standard_lrs and premium_lrs. You must only configure this parameter or disk_account_storage_name.
data_disk_count It is the number of reserved disks to be attached to each cluster node; so, for example, choosing a Data Disk Count of 2 in a four-node cluster will provision eight disks in all.
data_disk_size It is used to set the Data Disk Size in gigabytes (GB). The default size is 256 GB.

cluster_info

Parameter Description
label A cluster can have one or more labels separated by a commas. You can make a cluster the default cluster by including the label “default”.
master_instance_type To change the master node type from the default (Standard_A5), select a different type from the drop-down list.
slave_instance_type To change the worker node type from the default (Standard_A5), select a different type from the drop-down list.
min_nodes Enter the minimum number of worker nodes if you want to change it (the default is 1).
max_nodes Enter the maximum number of worker nodes if you want to change it (the default is 1).
node_bootstrap You can append the name of a node bootstrap script to the default path.
disallow_cluster_termination Set it to true if you do not want QDS to terminate idle clusters automatically. Qubole recommends that you to set this parameter to false.
custom_tags It is an optional parameter. Its value contains a <tag> and a <value>.

engine_config

Parameter Description
flavour It denotes the type of cluster. The supported values are: hadoop2, presto, and spark.
hadoop_settings To change the master node type from the default (Standard_A5), select a different type from the drop-down list.
presto_settings To change the worker node type from the default (Standard_A5), select a different type from the drop-down list.
spark_settings Enter the minimum number of worker nodes if you want to change it (the default is 1).

hadoop_settings

Parameter Description
custom_hadoop_config The custom Hadoop configuration overrides. The default value is blank.
fairscheduler_settings The fair scheduler configuration options.

fairscheduler_settings

Parameter Description
fairscheduler_config_xml The XML string, with custom configuration parameters, for the fair scheduler. The default value is blank.
default_pool The default pool for the fair scheduler. The default value is blank.

presto_settings

Parameter Description
presto_version Specify the Presto version to be used on the cluster. The default version is 0.142. The stable version that is supported is 0.157.
custom_presto_config Specifies if the custom Presto configuration overrides. The default value is blank.

spark_settings

Parameter Description
zeppelin_interpreter_mode The default mode is legacy. Set it to user mode if you want the user-level cluster-resource management on notebooks. See Configuring a Spark Notebook for more information.
custom_spark_config Specify the custom Spark configuration overrides. The default value is blank.
spark_version It is the Spark version used on the cluster. The default version is 2.0-latest. The other supported version is 2.1-latest.

monitoring

Parameter Description
enable_ganglia_monitoring Enable Ganglia monitoring for the cluster. The default value is, false.

security_settings

Parameter Description
ssh_public_key SSH key to use to login to the instances. The default value is none. (Note: This parameter is not visible to non-admin users.) The SSH key must be in the OpenSSH format and not in the PEM/PKCS format.

Request API Syntax

If use_account_compute_creds is set to false, then it is not required to set compute credentials.

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json" \
-d '{
     "cloud_config" : {
       "provider" : "azure"
       "compute_config" : {
                     "compute_validated": "<default is ``false``/set it to ``true``>",
                     "use_account_compute_creds": false,
                     "compute_client_id": "<your client ID>",
                     "compute_client_secret": "<your client secret key>",
                     "compute_tenant_id": "<your tenant ID>",
                     "compute_subscription_id": "<your subscription ID>"
               },
               "location": {
                     "location": "centralus"
                  },
               "network_config" : {
                     "vnet_name" : "<vpc name>",
                         "subnet_name": "<subnet name>",
                         "vnet_resource_group_name": "<vnet resource group name>",
                         "bastion_node_public_dns": "<bastion node public dns>",
                        "persistent_security_groups": "<persistent security group>",
                        "master_elastic_ip": ""
               },
               "storage_config" : {
                     "disk_storage_account_name": "<Disk storage account name>",
                     "disk_storage_account_resource_group_name": "<Disk account resource group name>",
         //You can either configure "disk_storage_account_name" or "managed_disk_account_type"
         "managed_disk_account_type":"<standard_lrs/premium_lrs>",
         "data_disk_count":"<Count>",
         "data_disk_size":"<Disk Size>"
         }
         },
     "cluster_info": {
          "master_instance_type": "Standard_A6",
          "slave_instance_type": "Standard_A6",
          "label": ["azure1"],
          "min_nodes": 1,
          "max_nodes": 2,
          "cluster_name": "Azure1",
          "node_bootstrap": "node_bootstrap.sh",
          },
     "engine_config": {
          "flavour": "hadoop2",
          "hadoop_settings": {
             "custom_hadoop_config": <default is null>,
             "fairscheduler_settings": {
                "default_pool": <default is null>
             }
          }
     },
     "monitoring": {
            "ganglia": <default is false/set it to true>,
           }
     }' \ "https://azure.qubole.com/api/v2/clusters"

Sample API Request

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json"
-d '{
     "cloud_config" : {
       "provider" : "azure"
       "compute_config" : {
                     "compute_validated": False,
                     "use_account_compute_creds": False,
                     "compute_client_id": "<your client ID>",
                     "compute_client_secret": "<your client secret key>",
                     "compute_tenant_id": "<your tenant ID>",
                     "compute_subscription_id": "<your subscription ID>"
               },
       "location": {
                     "location": "centralus"
               },
       "network_config" : {
                     "vnet_name" : "<vpc name>",
                         "subnet_name": "<subnet name>",
                         "vnet_resource_group_name": "<vnet resource group name>",
                         "persistent_security_groups": "<persistent security group>",
               },
       "storage_config" : {
                     "storage_access_key": "<your storage access key>",
                     "storage_account_name": "<your storage account name>",
                     "disk_storage_account_name": "<your disk storage account name>",
                     "disk_storage_account_resource_group_name": "<your disk storage account resource group name>"
         "data_disk_count":4,
         "data_disk_size":300 GB
               }
     },
     "cluster_info": {
          "master_instance_type": "Standard_A6",
          "slave_instance_type": "Standard_A6",
          "label": ["azure1"],
          "min_nodes": 1,
          "max_nodes": 2,
          "cluster_name": "Azure1",
          "node_bootstrap": "node_bootstrap.sh",
          },
     "engine_config": {
          "flavour": "hadoop2",
            "hadoop_settings": {
                "custom_hadoop_config": "mapred.tasktracker.map.tasks.maximum=3",
            }
           },
     "monitoring": {
            "ganglia": true,
           }
     }' "https://azure.qubole.com/api/v2/clusters"