Hive-JDBC Connector¶
QDS provides a Hive connector for JDBC, so you can run SQL queries to analyze data that resides in JDBC tables. Optimizations such as Support for PredicatePushDown are also available.
You can find sample queries and a POM file in Hive JDBC Storage Handler
Adding Required Jars¶
AWS Example:
add jar s3://paid-qubole/jars/jdbchandler/mysql-connector-java-5.1.34-bin.jar;
add jar s3://paid-qubole/jars/jdbchandler/qubole-hive-JDBC-0.0.7.jar;
Azure Blob Example:
add jar wasb://default-datasets@paidqubole.blob.core.windows.net/jars/jdbchandler/mysql-connector-java-5.1.34-bin.jar;
add jar wasb://default-datasets@paidqubole.blob.core.windows.net/paid-qubole/jars/jdbchandler/qubole-hive-JDBC-0.0.7.jar;
Creating a Table¶
An external Hive table connecting a JDBC table can be created as follows, allowing read and write to an underlying JDBC table.
Example
The table can be created in two ways:
- You can explicitly give column mappings along with the table creation statement.
DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
id INT, id_double DOUBLE, names STRING, test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storageHandler.JdbcStorageHandler'
TBLPROPERTIES (
"mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
"mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
"mapred.jdbc.username"="-----",
"mapred.jdbc.input.table.name"="JDBCTable",
"mapred.jdbc.output.table.name"="JDBCTable",
"mapred.jdbc.password"="------"
);
- You can specify no table mappings; the SerDe class automatically generates the mappings.
CREATE EXTERNAL TABLE HiveTable
row format serde 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
"mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
"mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
"mapred.jdbc.username"="root",
"mapred.jdbc.input.table.name"="JDBCTable",
"mapred.jdbc.output.table.name" = "JDBCTable",
"mapred.jdbc.password"=""
);
Usage¶
The Hive-JDBC connector supports almost all types of SQL queries. Some examples of supported queries are:
Reading Data
> select * from HiveTable;
> select count(*) from HiveTable;
> select id from HiveTable where id > 50000;
> select names from HiveTable;
> select * from HiveTable where names like ‘D%’;
> select * FROM HiveTable ORDER BY id DESC;
Joining Tables
> select HiveTable_1.*, HiveTable_2.* from HiveTable_1 a join HiveTable_2 b
on (a.id = b.id) where a.id > 90000 and b.id > 97000;
Writing Data
> Insert Into Table HiveTable_1 select * from HiveTable_2;
> Insert Into Table HiveTable_1 select * from HiveTable_2 where id > 50;
Group By Queries
> select id, sum(id_double) as sum_double from HiveTable group by id;
Support for PredicatePushDown¶
To enable/disable PredicatePushDown, add the following configuration.
set hive.optimize.ppd = true/false
Handling Unsuccessful Tez Queries While Querying JDBC Tables¶
In the Hive JDBC connector, the JDBC Storage handler does not work when Input Splits Grouping is enabled in Hive-on-Tez.
As a result, the following exception message is displayed.
java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects.
HiveInputFormat is enabled by default in Tez to support Splits Grouping.
You can avoid the issue by setting the input format as CombineHiveInputFormat using this command that disables the
Splits Grouping.
set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;