5. How can I create a table in HDFS?¶
A CREATE TABLE statement in QDS creates a managed table in Cloud storage. To create a table in HDFS to hold intermediate data, use CREATE TMP TABLE or CREATE TEMPORARY TABLE. Remember that HDFS in QDS is ephemeral and the data is destroyed when the cluster is shut down; use HDFS only for intermediate outputs.
You can use either TMP or TEMPORARY when creating temporary tables in QDS. CREATE TMP TABLE is Qubole’s custom extension and is not part of Apache Hive. The differences are as follows:
| Characteristic | CREATE TMP TABLE | CREATE TEMPORARY TABLE |
|---|---|---|
| Implemented by | Qubole (supported only by QDS) | Open-source Hive. See this document and the OSS Hive Wiki for details. |
| Metadata | Stored in Hive metastore | Lives only in memory |
| Table storage | HDFS | HDFS |
| Life of table | QDS user session | Hive user session |
| Table clean-up | When QDS cluster is terminated or QDS user session ends | When Hive user session ends |
| Advantages | Can be shared across clusters and users and multiple query records (because the metadata is in the Hive metastore) | Short-lived, quicker clean-up |
| Disadvantages | Heavy clean up (traversing metastore); more disk capacity needed in HDFS because clean-up is less frequent | Available only in Hive user session; doesn’t support index, partition, etc. |
| Recommended if… | The temporary table is expected to live across multiple QDS query history-records (a query history-record is the one row a user can see in the History view on the QDS Analyze page) | The temporary table is needed only in one query history-record |