Setting the JDBC Connection String¶

You must set the JDBC connection string for Hive, Presto, and Spark queries.

Setting the Connection String for Hive and Presto Queries (AWS and Azure)¶

Use the following syntax to set the JDBC connection string for Hive and Presto queries.

jdbc:qubole://<hive or presto>/<Cluster-Label>[/<database>][?propertyName1=propertyValue1[;propertyName2=propertyValue2]...]

In the connection string, <hive or presto> (command type) and the cluster label are mandatory; database name and property name/value are optional.

Note

If you do not specify a database, then in the query, specify either the database or fully-qualified table names.

An example of a connection string for Hive and Presto queries is mentioned below.

jdbc:qubole://hive/default/tpcds_orc_500?endpoint=https://api.qubole.com;chunk_size=86

In the above example, https://api.qubole.com is one of the QDS endpoints on AWS. For a list of supported endpoints, see Supported Qubole Endpoints on Different Cloud Providers.

Connection String Properties¶

Property Name	Property Value
endpoint	The endpoint is not required only for the `https://api.qubole.com` endpoint. You must specify the API endpoint for other QDS-on-AWS endpoints and Cloud providers. For the list, see Supported Qubole Endpoints on Different Cloud Providers.
chunk_size	The chunk size in MB and used in streaming large results from Cloud storage. The default value is 100 MB. Reduce the default value if you face out-of-memory (OOM) issues.

Additional Properties (Optional)¶

In addition, you can:

Enable Logging as described in Enabling Logging.
Enable Proxy as described in Enabling the Proxy Connection.

Setting the Connection String for Spark Queries¶

Use the following syntax to set the JDBC connection string for Spark queries.

jdbc:qubole://spark/<Cluster-Label>/<app-id>[/<database>][?propertyName1=propertyValue1[;propertyName2=propertyValue2]...]

For example:

jdbc:qubole://spark/spark-cluster/85/my-sql?endpoint=https://us.qubole.com;chunk_size=86

Note

Create an App with the configuration parameter, zeppelin.spark.maxResult=<A VERY BIG VALUE>. It can return only the configured maximum number of row results.

In the connection string, spark (command type) and the cluster label are mandatory; database name and property name/value are optional.

Note

If you do not specify a database, then in the query, specify either the database or fully-qualified table names.

Specifying app-id is mandatory. An app is the main abstraction in the Spark Job Server API. It is used to store the configuration for a Spark application. Creating an app returns an app-id. (You can get the app id using GET API http://api.qubole.com/api/v1.2/apps). See Understanding the Spark Job Server for more information.