Understanding the System Metrics for Monitoring (AWS)¶

Qubole clusters support Datadog monitoring when the Datadog monitoring is enabled at the QDS account level. For more information on enabling Datadog in Control Panel > Account Settings, see Configuring your Access Settings using IAM Keys or Managing Roles.

The following table lists the different system metrics that are published to the Datadog account.

System Metrics	Metrics Definition
disk_free	Total free disk space
disk_total	Total disk space
part_max_used	Maximum percent used on any single disk partition.
load_one	Load Average over 1 minute
load_five	Load Average over 5 minutes
load_fifteen	Load Average over 15 minutes
cpu_user	Percentage of CPU utilization while executing at the user level.
cpu_system	Percentage of CPU utilization while executing at the system level.
cpu_wio	The percentage of CPU Wait I/O.
cpu_nice	Percentage of CPU cycles spent on nice processes.
cpu_steal	Stolen time, which is the time spent in other operating systems when running in a virtualized environment.
cpu_aidle	Percentage of CPU cycles spent idle since last boot.
cpu_idle	Percentage of CPU idle time.
cpu_report	Aggregate report of CPU utilization percentage.
mem_report	Aggregate report of memory usage in bytes.
load_report	Aggregate report with current load, number of processes running processes, nodes and CPU count.
network_report	Aggregate report with network traffic in and out of the cluster nodes.
cluster-addnodefailure	The node addition metric to monitor the autoscaling feature.
cluster-removenodefailure	The node removal metric to monitor the downscaling/autoscaling events in a cluster.
system-rootdiskfullmaster	The metric displays the disk space in the master node’s root partition.
system-ephemeral0fullmaster	The metric displays the disk space in the master node’s ephemeral0 partition.