Create a Cluster on Microsoft Azure¶

POST /api/v2/clusters/¶

Use this API to create a new cluster when you are using Qubole on the Azure cloud. You create a cluster for a workload that has to run in parallel with your pre-existing workloads.

You might want to run workloads across different geographical locations or there could be other reasons for creating a new cluster.

Required Role¶

The following roles can make this API call:

A user who is part of the system-user/system-admin group.
A user invoking this API must be part of a group associated with a role that allows creating a cluster. See Managing Groups and Managing Roles for more information.

Parameters¶

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter	Description
cloud_config	A list of labels that identify the cluster. At least one label must be provided when creating a cluster.
cluster_info	It contains the configurations of a cluster.
engine_config	It contains the configurations of the type of clusters
monitoring	It contains the cluster monitoring configuration.
security_settings	It contains the security settings for the cluster.

cloud_config¶

Parameter	Description
provider	It defines the cloud provider. Set `azure` when the cluster is created on QDS-on-Azure.
compute_config	It defines the Azure account compute credentials for the cluster.
location	It is used to set the geographical Azure location. `eastus` is the default location. The other locations are `centralus`, `southcentralus`, `southeastasia`, and `westus`.
network_config	It defines the network configuration for the cluster.
storage_config	It defines the Azure account storage credentials for the cluster.

compute_config¶

Parameter	Description
compute_validated	It denotes if the credentials are validated or not.
use_account_compute_creds	It is to use account compute credentials. By default, it is set to `false`. Set it to `true` to use account compute credentials. Setting it to ``true`` implies that the following four settings are not required to be set.
compute_client_id	The client ID of the Azure active directory application which has the permissions over the subscription. It is required when `use_account_compute_creds` is set to `false`.
compute_client_secret	The client secret of the Azure active directory application. It is required when `use_account_compute_creds` is set to `false`.
compute_tenant_id	The tenant_id of the Azure Active Directory. It is required when `use_account_compute_creds` is set to `false`.
compute_subscription_id	The subscription id of the azure account where you want to create the compute resources. It is required when `use_account_compute_creds` is set to `false`.

network_config¶

Parameter	Description
vnet_name	Set the virtual network.
subnet_name	Set the subnet
vnet_resource_group_name	Set the resource group of your virtual network.
bastion_node	It is the public IP address of bastion node to access private subnets if required.
persistent_security_group_name	It is the network security group name on the Azure account.
persistent_security_group_resource_group_name	It is the resource group of the network security group of the Azure account.

storage_config¶

Parameter	Description
disk_storage_account_name	Set your Azure storage account. You must only configure this parameter or `managed_disk_account_type`.
disk_storage_account_resource_group_name	Set your Azure disk storage account resource group name.
managed_disk_account_type	You can set it if you do not want to configure disk storage account details. Its accepted values are `standard_lrs` and `premium_lrs`. You must only configure this parameter or `disk_account_storage_name`.
data_disk_count	It is the number of reserved disks to be attached to each cluster node; so, for example, choosing a Data Disk Count of 2 in a four-node cluster will provision eight disks in all.
data_disk_size	It is used to set the Data Disk Size in gigabytes (GB). The default size is 256 GB.

cluster_info¶

Parameter	Description
label	A cluster can have one or more labels separated by a commas. You can make a cluster the default cluster by including the label “default”.
master_instance_type	To change the master node type from the default (Standard_A5), select a different type from the drop-down list.
slave_instance_type	To change the worker node type from the default (Standard_A5), select a different type from the drop-down list.
min_nodes	Enter the minimum number of worker nodes if you want to change it (the default is 1).
max_nodes	Enter the maximum number of worker nodes if you want to change it (the default is 1).
node_bootstrap	You can append the name of a node bootstrap script to the default path.
disallow_cluster_termination	Set it to `true` if you do not want QDS to terminate idle clusters automatically. Qubole recommends that you to set this parameter to `false`.
custom_tags	It is an optional parameter. Its value contains a <tag> and a <value>.

engine_config¶

Parameter	Description
flavour	It denotes the type of cluster. The supported values are: `hadoop2`, `presto`, and `spark`.
hadoop_settings	To change the master node type from the default (Standard_A5), select a different type from the drop-down list.
presto_settings	To change the worker node type from the default (Standard_A5), select a different type from the drop-down list.
spark_settings	Enter the minimum number of worker nodes if you want to change it (the default is 1).

hadoop_settings¶

Parameter	Description
custom_hadoop_config	The custom Hadoop configuration overrides. The default value is blank.
fairscheduler_settings	The fair scheduler configuration options.

fairscheduler_settings¶

Parameter	Description
fairscheduler_config_xml	The XML string, with custom configuration parameters, for the fair scheduler. The default value is blank.
default_pool	The default pool for the fair scheduler. The default value is blank.

presto_settings¶

Parameter	Description
presto_version	Specify the Presto version to be used on the cluster. The default version is `0.142`. The stable version that is supported is `0.157`.
custom_presto_config	Specifies if the custom Presto configuration overrides. The default value is blank.

spark_settings¶

Parameter	Description
zeppelin_interpreter_mode	The default mode is `legacy`. Set it to `user` mode if you want the user-level cluster-resource management on notebooks. See Configuring a Spark Notebook for more information.
custom_spark_config	Specify the custom Spark configuration overrides. The default value is blank.
spark_version	It is the Spark version used on the cluster. The default version is `2.0-latest`. The other supported version is `2.1-latest`.

monitoring¶

Parameter	Description
enable_ganglia_monitoring	Enable Ganglia monitoring for the cluster. The default value is, `false`.

security_settings¶

Parameter	Description
ssh_public_key	SSH key to use to login to the instances. The default value is none. (Note: This parameter is not visible to non-admin users.) The SSH key must be in the OpenSSH format and not in the PEM/PKCS format.

Request API Syntax¶

If use_account_compute_creds is set to false, then it is not required to set compute credentials.

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json" \
-d '{
     "cloud_config" : {
       "provider" : "azure"
       "compute_config" : {
                     "compute_validated": "<default is ``false``/set it to ``true``>",
                     "use_account_compute_creds": false,
                     "compute_client_id": "<your client ID>",
                     "compute_client_secret": "<your client secret key>",
                     "compute_tenant_id": "<your tenant ID>",
                     "compute_subscription_id": "<your subscription ID>"
               },
               "location": {
                     "location": "centralus"
                  },
               "network_config" : {
                     "vnet_name" : "<vpc name>",
                         "subnet_name": "<subnet name>",
                         "vnet_resource_group_name": "<vnet resource group name>",
                         "bastion_node_public_dns": "<bastion node public dns>",
                        "persistent_security_groups": "<persistent security group>",
                        "master_elastic_ip": ""
               },
               "storage_config" : {
                     "disk_storage_account_name": "<Disk storage account name>",
                     "disk_storage_account_resource_group_name": "<Disk account resource group name>",
         //You can either configure "disk_storage_account_name" or "managed_disk_account_type"
         "managed_disk_account_type":"<standard_lrs/premium_lrs>",
         "data_disk_count":"<Count>",
         "data_disk_size":"<Disk Size>"
         }
         },
     "cluster_info": {
          "master_instance_type": "Standard_A6",
          "slave_instance_type": "Standard_A6",
          "label": ["azure1"],
          "min_nodes": 1,
          "max_nodes": 2,
          "cluster_name": "Azure1",
          "node_bootstrap": "node_bootstrap.sh",
          },
     "engine_config": {
          "flavour": "hadoop2",
          "hadoop_settings": {
             "custom_hadoop_config": <default is null>,
             "fairscheduler_settings": {
                "default_pool": <default is null>
             }
          }
     },
     "monitoring": {
            "ganglia": <default is false/set it to true>,
           }
     }' \ "https://azure.qubole.com/api/v2/clusters"

Sample API Request¶

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json"
-d '{
     "cloud_config" : {
       "provider" : "azure"
       "compute_config" : {
                     "compute_validated": False,
                     "use_account_compute_creds": False,
                     "compute_client_id": "<your client ID>",
                     "compute_client_secret": "<your client secret key>",
                     "compute_tenant_id": "<your tenant ID>",
                     "compute_subscription_id": "<your subscription ID>"
               },
       "location": {
                     "location": "centralus"
               },
       "network_config" : {
                     "vnet_name" : "<vpc name>",
                         "subnet_name": "<subnet name>",
                         "vnet_resource_group_name": "<vnet resource group name>",
                         "persistent_security_groups": "<persistent security group>",
               },
       "storage_config" : {
                     "storage_access_key": "<your storage access key>",
                     "storage_account_name": "<your storage account name>",
                     "disk_storage_account_name": "<your disk storage account name>",
                     "disk_storage_account_resource_group_name": "<your disk storage account resource group name>"
         "data_disk_count":4,
         "data_disk_size":300 GB
               }
     },
     "cluster_info": {
          "master_instance_type": "Standard_A6",
          "slave_instance_type": "Standard_A6",
          "label": ["azure1"],
          "min_nodes": 1,
          "max_nodes": 2,
          "cluster_name": "Azure1",
          "node_bootstrap": "node_bootstrap.sh",
          },
     "engine_config": {
          "flavour": "hadoop2",
            "hadoop_settings": {
                "custom_hadoop_config": "mapred.tasktracker.map.tasks.maximum=3",
            }
           },
     "monitoring": {
            "ganglia": true,
           }
     }' "https://azure.qubole.com/api/v2/clusters"