Running Hive Queries on Tez

To run Hive queries on Tez, you need to:

Note

While running a Tez query on a JDBC table, you may get an exception that you can debug by using the workaround described in Handling Unsuccessful Tez Queries While Querying JDBC Tables.

Configure and Start a Hadoop 2 Cluster

A Hadoop 2 cluster is configured by default in QDS. The default cluster should work well for Hive queries on Tez, but if you modify it, make sure the instances you choose for the cluster nodes have plenty of local storage; disk space used for queries is freed up only when the Tez DAG is complete.

The ApplicationMaster also takes up more memory for multi-stage jobs than it needs for similar MapReduce jobs, because in the Tez case it must keep track of all the tasks in the DAG, whereas MapReduce processes one job at a time.

Configure ApplicationMaster Memory

To make sure that the ApplicationMaster has sufficient memory, set the following parameters for the cluster on which you are going to run Tez:

tez.am.resource.memory.mb=<Size in MB>;
tez.am.launch.cmd-opts=-Xmx<Size in MB>m;

To set these parameters in QDS, go to the Control Panel and choose the pencil item next to the cluster you are going to use; then on the Edit Cluster page enter the parameters into the Override Hadoop Configuration Variables field.

Do pre-production testing to determine the best values. Start with the value currently set for MapReduce; that is, the value of yarn.app.mapreduce.am.resource.mb (stored in the Hadoop file mapred-site.xml). You can see the current (default) value in the Recommended Configuration field on the Edit Cluster page. If out-of-memory (OOM) errors occur under a realistic workload with that setting, start bumping up the number as a multiple of yarn.scheduler.minimum-allocation-mb, but do not exceed the value of yarn.scheduler.maximum-allocation-mb.

Enable Offline Job History

To enable offline job history, set the following parameter for the cluster on which you are going to run Tez:

yarn.ahs.leveldb.backup.enabled = true

To set this parameter in QDS, go to the Control Panel and choose the pencil item next to the cluster you are going to use; then on the Edit Cluster page enter the parameter into the Override Hadoop Configuration Variables field.

Start or Restart the Cluster

To start the cluster, click on the arrow to the right of the cluster’s entry on the Clusters page in the QDS Control Panel.

Configure Tez as the Hive Execution Engine

You can configure Tez as the Hive execution engine either globally (for all queries) or for a single query at query time.

To use Tez as the execution engine for all queries, enter the following text into the bootstrap file:

set hive.execution.engine = tez

To use Tez as the execution engine for a single Hive query, use the same command, but enter it before the query itself on the QDS Analyze page.

To use Tez globally across the QDS account, set it in the account-level Hive bootstrap. For more information on setting the account-level Hive bootstrap through UI, see Managing Hive Bootstrap and through REST API, see Set and View a Hive Bootstrap in a QDS Account.