Resources Monitoring

V1.1 – December 2023

Version Author Description
V1.0 – 2023-12-20 Diogo Hatz 50037923 Initial Version
V1.0 – 2023-12-21 Wisley da Silva 00830850 Document Review

Introduction

Cloud Eye (CES) is a free tool for monitoring Huawei Cloud resources. In addition to resource monitoring, Cloud Eye can also be used to create event- or metric-based alarms, identify resource malfunctions, and quickly react to resource changes. It is worth noting that, although Cloud Eye is a free service, charges for sending notifications when alarms are triggered are charged.

This document aims to describe the main functionalities of the Cloud Eye service and guide the reader to use CES for monitoring cloud resources, such as ECSs, VPNs, and CBRs, etc. In addition, it also describes how to create event- or metric-based alarms and customize dashboards for resource monitoring.

Cloud Eye on the console

Overview

When you open Cloud Eye on the console, the home page that will load is the Overview, where you can see an overview of all resources used in Huawei Cloud, the overall network, CPU, memory, and disk utilization, and which resources have recently triggered alarms and need further attention.

  • Resource Overview: Allows you to view the total number of monitored resources and the alarms generated for these resources.

  • Alarm Statistics: Shows the alarms triggered in the last seven days by alarm severity.

  • Server Monitoring: Allows you to view the overall CPU and memory utilization of monitored servers and a list of the top 5 ECSs ranked by CPU or memory utilization.

  • Network Monitoring: Shows the overall bandwidth utilization of EIPs and a list of the top 5 EIPs ranked by bandwidth utilization.

  • Storage Monitoring: Allows you to view the overall disk utilization (EVS) by read and write IOPS and a list of the top 5 disks ranked by IOPS.

You can see what the Cloud Eye home page looks like in the images below:

Server Monitoring

Server Monitoring (ECSs and BMSs) can be viewed in the Server Monitoring section. It is worth noting that for server monitoring, installing the agent (Telescope) is recommended, since it provides more specific and accurate metrics, according to appendix 4.1.

The agent can be installed in three different ways: manually, automatically, or in batch mode. Regardless of the installation method chosen, you must configure the permissions for the agent in advance: in the server monitoring section, click Configure on the warning that the agent permission has not been configured for the current region.

Automatic:

To install the agent automatically, simply click on the puzzle piece in the server monitoring section and in the agent status column in the corresponding ECS/BMS and wait for the agent to install.

Manual:

To install the agent manually, first go to the section related to ECS or BMS, depending on the type of server on which the agent will be installed.

Select Remote Login to log in to the desired server

Log in to the server by entering the username and password configured when the server was created and then enter the following command, if the region where the server is located is LA-Santiago:

cd /usr/local && curl -k -O https://uniagent-la-south-2.obs.la-south-2.myhuaweicloud.com/script/agent_install.sh && bash agent_install.sh

If the region where the server is located is different from LA-Santiago, you can find the list of commands by region in the following link: https://support.huaweicloud.com/intl/en-us/usermanual-ces/ces_01_0029.html

If the red message above appears at the end of the installation, the agent has been successfully installed.

Dashboard

The Dashboard section concerns the area where custom charts can be created for monitoring selected services and resources, with the chosen metrics.

To create a dashboard, navigate to the My Dashboards section in Dashboards and click Create Dashboard.

Choose a name for the dashboard in Name and click OK.

To add graphs for monitoring specific metrics, graphs can be added to dashboards. To add a graph, click on the created dashboard and click Add Graph.

Choose the type of chart to create and click OK.

Certain settings can be made when adding a chart to a dashboard, such as whether the same chart will have multiple metrics or only one metric, the period in which the data was collected, the type of data to be displayed (raw data, maximum, minimum, average or sum) and the metrics to be displayed.

Under Metric Display, select One graph for a single metric to add a single metric to the graph, or select One graph for multiple metrics to add multiple metrics to the graph.

Click Select Resource and Metric to select the resource to be monitored and the metric for that resource.

Select the type of service to be monitored on the left side of the Select Resource and Metric page, the specific resource to be monitored in the middle area of ​​the page, and the metrics for that resource on the right. In this example, CPU, disk, memory, and network usage will be monitored on “ecs-9152”.

Adjust the data collection time in the upper right corner of the page Add Graph.

A sample of the generated graph will appear on the page. Click Save to confirm and add the graph to the dashboard.

On the dashboard, you can create a legend for the graph, edit it, make it full screen, reload the data shown in the graph, and move the graph.

In Cloud Eye, you can create numerous dashboards with several graphs in each dashboard, and each graph can show multiple monitoring metrics. In addition, as described in topic 3.1, in the CES Overview section, you can have an overview of the monitored resources with the main metrics used, such as CPU, memory and disk usage on servers; network usage and a total of alarms triggered in Cloud Eye.

Cloud service monitoring

In the Cloud Service Monitoring section, dashboards for each resource of the ECS, EIP and bandwidth, NAT and VPN services are automatically created during the creation of these resources. The main monitoring metrics of these services are added in the form of a graph in this section for quick and general monitoring of these services.

In addition to viewing the graphs related to the main monitored metrics, it is also possible to export the collected data by clicking the Export Data button.

Attachments

Server Monitoring Metrics

Metrics Agentless Agent Installed
CPU Usage Yes Yes / Dedicated
Disk Usage Yes Yes
Memory Usage Yes Yes / Dedicated
Disk Write Bandwidth Yes Yes
Disk Read Bandwidth Yes Yes
Disk Write IOPS Yes Yes
Disk Read IOPS Yes Yes
Bandwidth Input Rate Yes Yes
In-band egress rate Yes Yes
Out-of-band egress rate Yes Yes
Out-of-band egress rate Yes Yes
CPU credit usage Yes Yes
CPU credit balancing Yes Yes
CPU credit balancing surplus Yes Yes
CPU credit loaded surplus Yes Yes
Network connections Yes Yes
Inbound bandwidth per server Yes Yes
Outbound bandwidth per server Yes Yes
Inbound PPS Yes Yes
Outbound PPS Yes Yes
New connections Yes Yes
Aggregate ECC uncorrectable errors Yes Yes
Pages retired with single bit errors Yes Yes
Pages retired with double bit errors Yes Yes
GPU health status Yes Yes
GPU encoder usage Yes Yes
GPU decoder usage Yes Yes
ECC volatile correctable errors Yes Yes
ECC volatile uncorrectable errors Yes Yes
Idle CPU No Yes / Dedicated
User space CPU usage No Yes / Dedicated
Kernel space CPU usage No Yes / Dedicated
Other processes CPU usage No Yes / Dedicated
Optimal processes CPU usage No Yes / Dedicated
Time CPU is waiting for I/O operations No Yes / Dedicated
CPU interrupt time No Yes / Dedicated
Software CPU interrupt time No Yes / Dedicated
Available memory No Yes / Dedicated
Idle memory No Yes / Dedicated
Buffer No Yes / Dedicated
Cache No Yes / Dedicated
Inbound bandwidth per NIC No Yes / Dedicated
Outbound bandwidth per NIC No Yes / Dedicated
Packet rate sent per NIC No Yes / Dedicated
Packet rate received per NIC No Yes / Dedicated
Packet rate with error received per NIC No Yes / Dedicated
Packet rate with error transmitted per NIC No Yes / Dedicated
Packet rate received dropped per NIC No Yes / Dedicated
Packet rate transmitted dropped per NIC No Yes / Dedicated
Processes running No Yes / Dedicated
Idle processes No Yes / Dedicated
Zombie processes No Yes / Dedicated
Blocked processes No Yes / Dedicated
Sleeping processes No Yes / Dedicated
Total processes No Yes / Dedicated
TCP retransmission rate No Yes / Dedicated
TCP SYS_SENT No Yes / Dedicated
TCP SYS_RECV No Yes / Dedicated
TCP FIN_WAIT1 No Yes / Dedicated
TCP FIN_WAIT2 No Yes / Dedicated
TCP CLOSE No Yes / Dedicated
TCP LAST_ACK No Yes / Dedicated
TCP LISTEN No Yes / Dedicated
TCP CLOSING No Yes / Dedicated
Average CPU load in the last minute No Yes / Dedicated
Average CPU load in the last 15 minutes No Yes / Dedicated
Average CPU load in the last 5 minutes No Yes / Dedicated
TCP ESTABLISHED No Yes / Dedicated
TCP TOTAL No Yes / Dedicated
UDP TOTAL No Yes / Dedicated
NTP Offset No Yes / Dedicated
Total files processed No Yes / Dedicated

VPN Gateway Monitoring Metrics

Metrics Supported
Inbound Packet Rate Yes
Outbound Packet Rate Yes
Inbound Bandwidth Yes
Outbound Bandwidth Yes
Inbound Bandwidth Usage Yes
Number of Connections Yes
Outbound Bandwidth Usage Yes

VPN Connection Monitoring Metrics

Metrics Supported
Tunnel Average RTT Yes
Tunnel Max RTT Yes
Tunnel Packet Loss Rate Yes
Link Average RTT Yes
Link Max RTT Yes
Link Packet Loss Rate Yes
VPN Connection Status Yes
Packet Receive Rate Yes
Packet Send Rate Yes
Traffic Receive Rate Yes
Traffic Send Rate Yes
SA Packet Send Rate Yes
SA Packet Receive Rate Yes
SA Traffic Send Rate Yes
SA Traffic Receive Rate Yes

NAT Monitoring Metrics

Metrics Supported
SNAT Connections Yes
Inbound Bandwidth Yes
Outbound Bandwidth Yes
Inbound PPS Yes
Outbound PPS Yes
Inbound Traffic Yes
Outbound Traffic Yes
SNAT Connection Usage Rate Yes
Ingress bandwidth usage rate Yes
Egress bandwidth usage rate Yes
Total egress bandwidth (UDP) Yes
Total egress bandwidth (TCP) Yes
Total ingress bandwidth (UDP) Yes
Total ingress bandwidth (TCP) Yes
Packets lost due to excessive SNAT connections Yes
Packets lost due to excessive PPS Yes
Packets lost by all allocated EIP ports Yes

References