H3C SeerFabric Agent Deployment Guide-F65xx-6W503

HomeSupportAD-NET(SDN)H3C SeerEngine-DCInstall & UpgradeInstallation GuidesH3C SeerFabric Agent Deployment Guide-F65xx-6W503
01-Text
Title Size Download
01-Text 181.52 KB

Overview

The SeerFabric agent is installed on servers within the DC network. Once installed, the agent can collect server information and send collected information to the SeerEngine-DC controller for server configuration.

 


Prerequisites

Server requirements

Hardware requirements

Table 1 Supported server models

Server models

Hardware requirements

H3C R5500G6

H3C R5500G5

H3C R5300G6

H3C R5300G5

CPU architecture: x86_64

GPUs: NVIDIA A100, NVIDIA H800, KUNLUNXIN R300/P800, iluvatar MR-V100, iluvatar BI-V150, and MetaX MXC500

RDMA NIC type: Mellanox CX6 and Yunmai Network Card metaScale-200

 

Table 2 Minimum hardware resources

CPU cores

Memory

Disk space

1 core

2 GB

5 GB

 

Software requirements

Operating system

Dependencies

Ubuntu 22.04

Java-1.8.0-openjdk, LLDPD, and NetworkManager

Rocky Linux 8.8 (Green Obsidian)

LSHW, java-1.8.0-openjdk, LLDPD, and NetworkManager

 

Kubernetes environment

SeerFabric agent containerized deployment supports Kubernetes v1.21.14.

Preparing for installation

Before deploying the SeerFabric agent, complete the following tasks:

1.     Install the SeerEngine-DC controller.

2.     Install and deploy servers.

 

IMPORTANT

IMPORTANT:

When deploying the network environment between SeerFabric agent and SeerEngine-DC, no intermediary devices or networks that use NAT are allowed. Make sure the SeerFabric agent and SeerEngine-DC can communicate directly through IP addresses, without relying on translation of the NAT gateway.

 

 


Deploying the SeerFabric agent on the baremetal server

Configuring basic environment settings

Configuring basic environment settings for the Ubuntu 22.04 system

1.     Verify that the system names of different servers are different. If some servers have the same system name, modify their system names, and restart these servers after the modification.

root@server1:~# vi /etc/hostname

root@server1:~# reboot

2.     Install Java.

root@server1:~# apt install openjdk-8-jdk -y

3.     Install LLDPD.

root@server1:~# apt install lldpd -y

root@server1:~# systemctl enable lldpd.service

root@server1:~# systemctl start lldpd.service

root@server1:~# systemctl status lldpd.service //View LLDPD service status

4.     Install NetworkManager.

root@server1:~# apt install network-manager -y

root@server1:~# systemctl enable NetworkManager.service

root@server1:~# systemctl start NetworkManager.service

root@server1:~# systemctl status NetworkManager.service //View NetworkManager service status

Configuring basic environment settings for the Rocky Linux 8.8 (Green Obsidian) system

1.     Verify that the system names of different servers are different. If some servers have the same system name, modify their system names, and restart these servers after the modification.

[root@server1 ~]# vi /etc/hostname

[root@server1 ~]# reboot

2.     Install LSHW.

[root@server1 ~]# yum install lshw -y

3.     Install Java.

Take the x86 CPU architecture as an example.

[root@server1 ~]## yum install java-1.8.0-openjdk.x86_64 -y

4.     Install LLDPD.

[root@server1 ~]# dnf install epel-release

[root@server1 ~]# dnf install lldpd

[root@server1 ~]# systemctl enable lldpd.service

[root@server1 ~]# systemctl start lldpd.service

[root@server1 ~]# systemctl status lldpd.service //View LLDPD service status

5.     Install NetworkManager.

[root@server1 ~]# yum install NetworkManager -y

[root@server1 ~]# systemctl enable NetworkManager.service

[root@server1 ~]# systemctl start NetworkManager.service

[root@server1 ~]# systemctl status NetworkManager.service //View NetworkManager service status

Deploying the SeerFabric agent on servers

 

NOTE:

The deployment steps and commands used are the same for the Ubuntu and CentOS systems. This section takes the Ubuntu system as an example for illustration.

 

1.     Upload the SeerFabric agent installation package to any directory on each server.

The package name is SeerEngine_DC-version_SeerFabricAgent.zip, where the version parameter represents the version number.

2.     Decompress the SeerFabric agent installation package and access the decompressed directory.

root@server1:~# unzip SeerEngine_DC-E6601_SeerFabricAgent.zip

root@server1:~# cd SeerEngine_DC-E6601_SeerFabricAgent

root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# ll

total 22244

drwxr-xr-x. 1 root root      187 Oct 31 13:56 config

-rw-r--r--. 1 root root 18669657 Oct 31 13:56 dc-agent-1.0.0.jar

-rwxr-xr-x. 1 root root  3953824 Oct 31 13:56 jq

-rwxr-xr-x. 1 root root     1143 Oct 31 13:56 start_agent.sh

-rwxr-xr-x. 1 root root      459 Oct 31 13:56 stop_agent.sh

3.     Edit the file named config/agent.config.json by using the vi command. Enter the northbound service VIP, username, and password of the SeerEngine-DC controller to upload GPU information to the SeerEngine-DC controller.

{

  "agent": {

"roce_nic_config": "enable",

"route_config": "enable"

  },

  "dc": {

    "service_ip": "192.168.10.100",

    "login_username": "admin",

    "login_password": "Pwd@12345"

  }

}

Table 3 Parameters

Parameter

Description

agent_master_ip

IP address of the server where the master agent resides. Optional. If this parameter is not configured, the agent will send collected information to the SeerEngine-DC controller.

agent_master_port

Port of the server where the master agent resides. The default is 7008. Optional.

service_ip

Northbound service VIP of the SeerEngine-DC controller.

login_username

Username for logging in to the Web interface of the SeerEngine-DC controller.

login_password

Password for logging in to the Web interface of the SeerEngine-DC controller. After the agent starts, this field will be deleted. This field can be modified, and the modification takes effect after the agent restarts.

 

CAUTION

CAUTION:

If you do not synchronously modify the username and password when modifying the northbound service VIP, the modification to the northbound service VIP will not take effect. In this case, the agent will continue to upload GPU information to the previously configured SeerEngine-DC controller.

 

4.     Start the agent, and execute the start_agent.sh script with the root permissions to start the service.

If the value for the agent_master_ip parameter is the IP address of a NIC on the server, the server starts as the master node. If not, the server starts as a worker node.

¡     If the server starts as the master node, the agent actively sends the collected information to the SeerEngine-DC controller.

¡     If the server starts as a worker node, the agent preferentially sends the collected information to the master mode, and the agent sends the collected information to the SeerEngine-DC controller when the master node operates abnormally.

a.     Add a firewall rule to allow traffic to pass through the specified Agent listening port.

Taking firewalld as an example, execute the following commands:

root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo firewall-cmd --zone=public --add-port=7008/tcp --permanent——7008 is the default port number. Replace it with the port of the server where the master agent resides as needed.

root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo firewall-cmd –reload

This command adds access permissions for TCP traffic through a specific port to the firewall configuration, ensures that the rule still take effect even after the firewall is restarted.

b.     You must manually start the agent service on each server as follows.

root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo ./start_agent.sh

5.     Configure routes and rules:

After the GPU and storage NICs obtain IP addresses, the agent will automatically configure routes and rules for the server GPU and storage NICs. Both VLAN and VXLAN networks are supported.

 

 

NOTE:

·     Start the agent after planning the VLAN or VXLAN network on the SeerEngine-DC controller. This enables the agent to identify the storage NICs to deploy relevant settings.

·     The routes and rules are specified by the agent. Do not edit them arbitrarily to prevent conflicts in route and rule configurations, which can lead to traffic anomalies.

 

6.     Configure RoCE settings:

Parameter network: Before starting the agent, create an RoCE policy for the parameter network fabric on the SeerEngine-DC controller.

Storage network: Before starting the agent, create an RoCE policy for the storage network fabric on the SeerEngine-DC controller. In addition, you must plan the VLAN or VXLAN network appropriately.

 

7.     Upon startup, the agent checks whether the agent.network.config.yml file (configuration file for deploying RoCE parameters) exists on the host.

¡     If the file exists, the agent loads and runs the configuration file to deploy RoCE settings to the parameter NIC.

¡     If the file does not exist, the agent obtains the configured server NIC parameter template (that can be displayed on the RoCE policy details page of SeerEngine-DC) from SeerEngine-DC. It then generates the agent.network.config.yml file locally, loads and runs the configuration file, and deploys the RoCE settings to the parameter NIC.

¡     To obtain the configuration parameters from the server NIC parameter template again, you can use the sudo ./start_agent.sh –c d command to automatically delete the agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again.

¡     If you have edited the CNP or RoCE queue in the RoCE policy on SeerEngine-DC, use the sudo ./start_agent.sh –c d command to delete the existing agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again. Alternatively, manually edit the local agent.network.config.yml file, and then restart the agent to have your modification take effect.

 

CAUTION

CAUTION:

·     After the agent is running correctly, the agent.network.config.yml file will be generated on the host. If the file is not generated, check the logs for troubleshooting.

·     Do not delete the comment content in the agent.network.config.yml file. To edit configuration parameters, first remove the comment.

·     Edit parameters in the configuration file as needed. After modification, you must restart the agent to have the modification take effect.

·     Regardless of whether the agent is running, after the RoCE settings are deployed, they will be automatically executed upon startup.

 

8.     View the agent running state:

¡     To view the service process, enter the ps -ef | grep dc-agent | grep java command on the server. If process information is displayed, the process has started successfully.

¡     View the agent role. The agent.role field in the process information displays the role name.

root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# ps -ef | grep dc-agent | grep java

root     18076     1  0 15:38 pts/0    00:00:20 java -jar -Xmx512m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8000 dc-agent-1.0.0.jar --agent.role=master --dc.serviceIp=192.168.232.89 --dc.username=admin --dc.password=Pwd@12345

 

 

NOTE:

The agent automatically starts upon service restart.

 

Viewing server access information

1.     On the SeerEngine-DC controller, navigate to the Monitor > Topology > Data-Center Topology > Basic Network Topology page.

2.     Right-click a leaf device and select Server Access Info to view the server access information.

3.     If a network anomaly prevents the SeerEngine-DC controller from obtaining accurate GPU information, try restarting the agent service by executing the start_agent.sh script.

Upgrading and uninstalling the software

Upgrading the software

1.     Back up the current software version for use during version rollback.

2.     Decompress the new version, check and edit information in the agent.config.json file, including the master IP, and port, as well as the IP address, account, and password for the SeerEngine-DC controller.

3.     Start the service by executing the start_agent.sh script with root permissions.

Uninstalling the software

Execute the following command to uninstall SeerFabric agent:

root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo ./stop_agent.sh


SeerFabric agent containerized deployment guide

Configuring basic environment settings

Configuring basic environment settings for the Ubuntu 22.04 system

1.     Install LLDPD on the host.

root@server1:~# apt install lldpd -y

root@server1:~# systemctl enable lldpd.service

root@server1:~# systemctl start lldpd.service

root@server1:~# systemctl status lldpd.service //View LLDPD service status

2.     Install NetworkManager on the host.

root@server1:~# apt install network-manager -y

root@server1:~# systemctl enable NetworkManager.service

root@server1:~# systemctl start NetworkManager.service

root@server1:~# systemctl status NetworkManager.service //View NetworkManager service status

Configuring basic environment settings for the Rocky Linux 8.8 (Green Obsidian) system

1.     Install LSHW on the host.

[root@server1 ~]# yum install lshw -y

2.     Install LLDPD on the host.

[root@server1 ~]# dnf install epel-release

[root@server1 ~]# dnf install lldpd

[root@server1 ~]# systemctl enable lldpd.service

[root@server1 ~]# systemctl start lldpd.service

[root@server1 ~]# systemctl status lldpd.service //View LLDPD service status

3.     Install NetworkManager on the host.

[root@server1 ~]# yum install NetworkManager -y

[root@server1 ~]# systemctl enable NetworkManager.service

[root@server1 ~]# systemctl start NetworkManager.service

[root@server1 ~]# systemctl status NetworkManager.service //View NetworkManager service status

Deploying the SeerFabric agent on k8s

 

NOTE:

The deployment steps and commands used are the same for the Ubuntu and CentOS systems. This section takes the Ubuntu system as an example for illustration.

 

1.     Upload the SeerFabric agent installation package to any directory on each server in the k8s cluster. The package name is SeerEngine_DC-version_SeerFabricAgent_K8S.zip, where the version parameter represents the version number.

2.     Decompress the SeerFabric agent installation package and access the decompressed directory.

root@server1:~# unzip SeerEngine_DC-E6601_SeerFabricAgent_K8S.zip

root@server1:~# cd SeerEngine_DC-E6601_SeerFabricAgent_K8S

root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# ll

total 22244

drwxr-xr-x. 1 root root      187 Oct 31 13:56 config

-rw-r--r--. 1 root root 18669657 Oct 31 13:56 dc-agent-1.0.0.jar

-rwxr-xr-x. 1 root root  3953824 Oct 31 13:56 jq

-rwxr-xr-x. 1 root root     1143 Oct 31 13:56 start_agent.sh

-rwxr-xr-x. 1 root root      459 Oct 31 13:56 stop_agent.sh

3.     Edit the file named config/agent.config.json by using the vi command. Enter the northbound service VIP, username, and password of the SeerEngine-DC controller to upload GPU information to the SeerEngine-DC controller.

{

  "agent": {

"roce_nic_config": "enable",

"route_config": "enable"

  },

  "dc": {

    "service_ip": "192.168.10.100",

    "login_username": "admin",

    "login_password": "Pwd@12345"

  }

}

Table 4 Parameters

Parameter

Description

agent_master_ip

IP address of the server where the master agent resides. Optional. If this parameter is not configured, the agent will send collected information to the SeerEngine-DC controller.

agent_master_port

Port of the server where the master agent resides. The default is 7008. Optional.

roce_nic_config

Whether to enable automatic deployment of RoCE settings for the parameter NIC. Options include enable and disable.

route_config

Whether to enable automatic deployment of routes and rules. Options include enable and disable.

service_ip

Northbound service VIP of the SeerEngine-DC controller.

login_username

Username for logging in to the SeerEngine-DC controller.

login_password

Password for logging in to the SeerEngine-DC controller. After the agent starts, this field will be deleted. This field can be modified, and the modification takes effect after the agent restarts.

 

CAUTION

CAUTION:

If you do not change the username and password when you change the northbound service VIP, the modification to the northbound service VIP will not take effect. In this case, the agent will continue to upload GPU information to the previously configured SeerEngine-DC controller.

 

4.     Copy the agent.config.json configuration file to the /usr/data/se-agent/config directory to mount the configuration file into the container. If the directory does not exist, create it manually.

5.     Load the image.

If the Hubor image repository is not installed in the environment, use the following command to import the image:

root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# docker load -i dc-agent-*.tar

If a Hubor image repository is already installed in the environment, perform the following operations:

a.     Log in to Docker Hub or the target repository.

docker login

Enter the username and password as prompted to log in.

b.     Add the Docker image tar package.

docker load -i dc-agent-*.tar

Use the docker load command to add a Docker image file compressed in tar format to the local Docker engine. Use the -i option to specify the file name of the tar package to be loaded. After loading, the image will be available in the image storage of the local Docker engine. You can verify the result by using the docker images command.

c.     Add tags to the image to push it to the target repository.

docker tag local-image:tag username/repository:tag

The local-image:tag field represents the tag of the local image, and the username/repository:tag represents the tag of the image in the target repository.

d.     Push the image to the target repository.

docker push username/repository:tag

e.     Because of image tag modification, you must change the image field value in the se-agent-deployment.yaml file to the Hubor repository image tag.

6.     K8s pulls up the SeerFabric agent container. After all nodes in the cluster have completed the previous operations, execute the following command to pull up the container.

root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# kubectl apply -f se-agent-deployment.yaml

7.     If the value for the agent_master_ip parameter is the IP address of a NIC on the server, the server starts as the master node. If not, the server starts as a worker node.

¡     If the server starts as the master node, the agent will send collected information to the SeerEngine-DC controller.

¡     If the server starts as a worker node, the agent will preferentially send collected information to the master node. If the master node becomes faulty, the agent will send collected information to the SeerEngine-DC controller.

 

 

NOTE:

·     If you do not want to deploy the SeerFabric agent on a specific node, set a taint by executing the kubectl label nodes <node-name> se-agent-switch=off command.

 

8.     Configure routes and rules:

After the GPU and storage NICs obtain IP addresses, the agent will automatically configure routes and rules for the server GPU and storage NICs. Both VLAN and VXLAN networks are supported.

 

 

NOTE:

·     Start the agent after planning the VLAN or VXLAN network on the SeerEngine-DC controller. This enables the agent to identify the storage NICs to deploy relevant settings.

·     The routes and rules are specified by the agent. Do not edit them arbitrarily to prevent conflicts in route and rule configurations, which can lead to traffic anomalies.

 

9.     Configure RoCE settings:

Parameter network: Before starting the agent, create an RoCE policy for the parameter network fabric on the SeerEngine-DC controller.

Storage network: Before starting the agent, create an RoCE policy for the storage network fabric on the SeerEngine-DC controller. In addition, you must plan the VLAN or VXLAN network appropriately.

10.     Upon startup, the agent checks whether the /usr/data/se-agent/config/agent.network.config.yml file (configuration file for deploying RoCE parameters) exists on the host.

¡     If the file exists, the agent loads and runs the configuration file to deploy RoCE settings to the parameter NIC.

¡     If the file does not exist, the agent obtains the configured server NIC parameter template (that can be displayed on the RoCE policy details page of SeerEngine-DC) from SeerEngine-DC. It then generates the agent.network.config.yml file locally, loads and runs the configuration file, and deploys the RoCE settings to the parameter NIC.

¡     To obtain the configuration parameters from the server NIC parameter template again, you can use the sudo ./start_agent.sh –c d command in the container to automatically delete the agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again.

¡     If you have edited the CNP or RoCE queue in the RoCE policy on SeerEngine-DC, use the sudo ./start_agent.sh –c d command in the container to delete the existing agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again. Alternatively, manually edit the local agent.network.config.yml file, and then restart the agent to have your modification take effect.

 

CAUTION

CAUTION:

·     After the agent is running correctly, the agent.network.config.yml file will be generated on the host. If the file is not generated, check the logs for troubleshooting.

·     Do not delete the comment content in the agent.network.config.yml file. To edit configuration parameters, first remove the comment.

·     Edit parameters in the configuration file as needed. After modification, you must restart the agent to have the modification take effect.

·     Regardless of whether the agent is running, after the RoCE settings are deployed, they will be automatically executed upon startup of the container.

 

11.     View the agent running state:

To view the agent running state, execute the following command:

root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# kubectl get pod -n addc-roce

Viewing server access information

1.     On the SeerEngine-DC controller, navigate to the Monitor > Topology > Data-Center Topology > Basic Network Topology page.

2.     Right-click a leaf device and select Server Access Info to view server access information.

3.     If a network anomaly prevents the SeerEngine-DC controller from obtaining accurate GPU information, try restarting the agent service by running the start_agent.sh script.

Upgrading the software

1.     Back up the current software version for use during version rollback.

2.     Decompress the new version, check and edit information in the agent.config.json file, including the master IP, port, automatic deployment status for RoCE settings of the parameter NIC, and automatic deployment status for routes and rules, as well as the IP address, account, and password for the SeerEngine-DC controller.

3.     Copy the configuration file to the /usr/data/se-agent/config directory to overwrite the existing one, import the image file again, and then execute the kubectl apply -f se-agent-deployment.yaml command to view the container operating state.

 


FAQ

When a user changes the VLAN gateway configuration on the SeerEngine-DC controller page and the new and old gateways are in the same network segment, the agent will not detect the change. How can I resolve this issue?

To resolve this issue, log in to the server and delete the ip route configuration corresponding to the NIC. Then, the agent will automatically detect the loss of route configuration and redeploy the ip route configuration based on the gateway IP on the SeerEngine-DC controller.

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
All Support
  • Become A Partner
  • Partner Policy & Program
  • Global Learning
  • Partner Sales Resources
  • Partner Business Management
  • Service Business
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网