Hive-JDBC Connector

QDS provides a Hive connector for JDBC, so you can run SQL queries to analyze data that resides in JDBC tables. Optimizations such as Support for PredicatePushDown are also available.

You can find sample queries and a POM file in Hive JDBC Storage Handler

Adding Required Jars

AWS Example:

add jar s3://paid-qubole/jars/jdbchandler/mysql-connector-java-5.1.34-bin.jar;
add jar s3://paid-qubole/jars/jdbchandler/qubole-hive-JDBC-0.0.7.jar;

Azure Blob Example:

add jar wasb://default-datasets@paidqubole.blob.core.windows.net/jars/jdbchandler/mysql-connector-java-5.1.34-bin.jar;
add jar wasb://default-datasets@paidqubole.blob.core.windows.net/paid-qubole/jars/jdbchandler/qubole-hive-JDBC-0.0.7.jar;

Creating a Table

An external Hive table connecting a JDBC table can be created as follows, allowing read and write to an underlying JDBC table.

Example

The table can be created in two ways:

  • You can explicitly give column mappings along with the table creation statement.
DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT, id_double DOUBLE, names STRING, test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storageHandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="-----",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="------"
);
  • You can specify no table mappings; the SerDe class automatically generates the mappings.
CREATE EXTERNAL TABLE HiveTable
row format serde 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name" = "JDBCTable",
  "mapred.jdbc.password"=""
);

Usage

The Hive-JDBC connector supports almost all types of SQL queries. Some examples of supported queries are:

Reading Data

> select * from HiveTable;
> select count(*) from HiveTable;
> select id from HiveTable where id > 50000;
> select names from HiveTable;
> select * from HiveTable where names like ‘D%’;
> select * FROM HiveTable ORDER BY id DESC;

Joining Tables

> select HiveTable_1.*, HiveTable_2.* from HiveTable_1 a join HiveTable_2 b
   on (a.id = b.id) where a.id > 90000 and b.id > 97000;

Writing Data

> Insert Into Table HiveTable_1 select * from HiveTable_2;
> Insert Into Table HiveTable_1 select * from HiveTable_2 where id > 50;

Group By Queries

> select id, sum(id_double) as sum_double from HiveTable group by id;

Support for PredicatePushDown

To enable/disable PredicatePushDown, add the following configuration.

set hive.optimize.ppd = true/false

Handling Unsuccessful Tez Queries While Querying JDBC Tables

In the Hive JDBC connector, the JDBC Storage handler does not work when Input Splits Grouping is enabled in Hive-on-Tez.

As a result, the following exception message is displayed.

java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects.

HiveInputFormat is enabled by default in Tez to support Splits Grouping.

You can avoid the issue by setting the input format as CombineHiveInputFormat using this command that disables the Splits Grouping.

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;