Configuring a Qubole Spark Cluster

Prerequisites

Contact Qubole Support to enable hadoop2.use_hadoop28 flag for your account. This flag enables Hadoop 2.8 for all the cluster, by default. To contact Qubole Support team, click the Help icon on the top-right corner of the Submit Support Ticket. For more information, see Using the QDS Help Center. You can also call us at (855) 423-6674 and select option 2.

Configuring a Spark Cluster

  1. Navigate to the QDS UI > Control Panel.

  2. In the Control Panel, select Environments. The Environments tab is displayed.

    ../../_images/environment.png
  3. Click New to add a new environment.

  4. Enter the Name and Description for the new environment and click Create.

    ../../_images/new_environment.png
  5. Attach the Spark cluster that is used with SageMaker. Wait for the status to change to Active.

    ../../_images/status.png
  6. Click Add to add a python package for the created environment.

  7. Select the Source as Python Packages and rename the python packages as mentioned below:

    • py4j
    • boto3 version 1.9.20 or later
    • awscli version 1.16.30 or later
    • sagemaker_pyspark

    Here is an illustrated example.

    ../../_images/add_package.png
  8. Click Add and wait for the status to become Installed.

    ../../_images/add.png
  9. Click the Clusters drop-down list located at the top-right corner of the Qubole UI and select Spark cluster to run the Spark jobs.

  10. On the cluster details page, click Edit.

    ../../_images/cluster.png
  11. Enter the filename in the Node Bootstrap File field, as shown below:

    ../../_images/bootstrap.png
  12. Click Update.

  13. On the Clusters page, click “…” at the top-right corner and select Edit Node Bootstrap.

    ../../_images/bootstrap11.png
  14. Copy and paste the code specified under Appendix 1: BootStrap Script in this document and click Save. The bootstrap downloads aws-java-sdk-core-1.11.288.jar jar required for AWS SageMaker, sets the AWS credentials in /home/yarn/.aws/credentials file, and starts Livy Server on Spark cluster to allow SageMaker Notebook use it.

  15. Click Run to start the cluster.

You have successfully configured your Qubole Spark cluster.