- Table of Contents
Title | Size | Download |
---|---|---|
01-Text | 181.52 KB |
Deploying the SeerFabric agent on the baremetal server
Configuring basic environment settings
Configuring basic environment settings for the Ubuntu 22.04 system
Configuring basic environment settings for the Rocky Linux 8.8 (Green Obsidian) system
Deploying the SeerFabric agent on servers
Viewing server access information
Upgrading and uninstalling the software
SeerFabric agent containerized deployment guide
Configuring basic environment settings
Configuring basic environment settings for the Ubuntu 22.04 system
Configuring basic environment settings for the Rocky Linux 8.8 (Green Obsidian) system
Deploying the SeerFabric agent on k8s
Viewing server access information
Overview
The SeerFabric agent is installed on servers within the DC network. Once installed, the agent can collect server information and send collected information to the SeerEngine-DC controller for server configuration.
Prerequisites
Server requirements
Hardware requirements
Table 1 Supported server models
Server models |
Hardware requirements |
H3C R5500G6 H3C R5500G5 H3C R5300G6 H3C R5300G5 |
CPU architecture: x86_64 GPUs: NVIDIA A100, NVIDIA H800, KUNLUNXIN R300/P800, iluvatar MR-V100, iluvatar BI-V150, and MetaX MXC500 RDMA NIC type: Mellanox CX6 and Yunmai Network Card metaScale-200 |
Table 2 Minimum hardware resources
CPU cores |
Memory |
Disk space |
1 core |
2 GB |
5 GB |
Software requirements
Operating system |
Dependencies |
Ubuntu 22.04 |
Java-1.8.0-openjdk, LLDPD, and NetworkManager |
Rocky Linux 8.8 (Green Obsidian) |
LSHW, java-1.8.0-openjdk, LLDPD, and NetworkManager |
Kubernetes environment
SeerFabric agent containerized deployment supports Kubernetes v1.21.14.
Preparing for installation
Before deploying the SeerFabric agent, complete the following tasks:
1. Install the SeerEngine-DC controller.
2. Install and deploy servers.
IMPORTANT: When deploying the network environment between SeerFabric agent and SeerEngine-DC, no intermediary devices or networks that use NAT are allowed. Make sure the SeerFabric agent and SeerEngine-DC can communicate directly through IP addresses, without relying on translation of the NAT gateway. |
Deploying the SeerFabric agent on the baremetal server
Configuring basic environment settings
Configuring basic environment settings for the Ubuntu 22.04 system
1. Verify that the system names of different servers are different. If some servers have the same system name, modify their system names, and restart these servers after the modification.
root@server1:~# vi /etc/hostname
root@server1:~# reboot
2. Install Java.
root@server1:~# apt install openjdk-8-jdk -y
3. Install LLDPD.
root@server1:~# apt install lldpd -y
root@server1:~# systemctl enable lldpd.service
root@server1:~# systemctl start lldpd.service
root@server1:~# systemctl status lldpd.service //View LLDPD service status
4. Install NetworkManager.
root@server1:~# apt install network-manager -y
root@server1:~# systemctl enable NetworkManager.service
root@server1:~# systemctl start NetworkManager.service
root@server1:~# systemctl status NetworkManager.service //View NetworkManager service status
Configuring basic environment settings for the Rocky Linux 8.8 (Green Obsidian) system
1. Verify that the system names of different servers are different. If some servers have the same system name, modify their system names, and restart these servers after the modification.
[root@server1 ~]# vi /etc/hostname
[root@server1 ~]# reboot
2. Install LSHW.
[root@server1 ~]# yum install lshw -y
3. Install Java.
Take the x86 CPU architecture as an example.
[root@server1 ~]## yum install java-1.8.0-openjdk.x86_64 -y
4. Install LLDPD.
[root@server1 ~]# dnf install epel-release
[root@server1 ~]# dnf install lldpd
[root@server1 ~]# systemctl enable lldpd.service
[root@server1 ~]# systemctl start lldpd.service
[root@server1 ~]# systemctl status lldpd.service //View LLDPD service status
5. Install NetworkManager.
[root@server1 ~]# yum install NetworkManager -y
[root@server1 ~]# systemctl enable NetworkManager.service
[root@server1 ~]# systemctl start NetworkManager.service
[root@server1 ~]# systemctl status NetworkManager.service //View NetworkManager service status
Deploying the SeerFabric agent on servers
|
NOTE: The deployment steps and commands used are the same for the Ubuntu and CentOS systems. This section takes the Ubuntu system as an example for illustration. |
1. Upload the SeerFabric agent installation package to any directory on each server.
The package name is SeerEngine_DC-version_SeerFabricAgent.zip, where the version parameter represents the version number.
2. Decompress the SeerFabric agent installation package and access the decompressed directory.
root@server1:~# unzip SeerEngine_DC-E6601_SeerFabricAgent.zip
root@server1:~# cd SeerEngine_DC-E6601_SeerFabricAgent
root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# ll
total 22244
drwxr-xr-x. 1 root root 187 Oct 31 13:56 config
-rw-r--r--. 1 root root 18669657 Oct 31 13:56 dc-agent-1.0.0.jar
-rwxr-xr-x. 1 root root 3953824 Oct 31 13:56 jq
-rwxr-xr-x. 1 root root 1143 Oct 31 13:56 start_agent.sh
-rwxr-xr-x. 1 root root 459 Oct 31 13:56 stop_agent.sh
3. Edit the file named config/agent.config.json by using the vi command. Enter the northbound service VIP, username, and password of the SeerEngine-DC controller to upload GPU information to the SeerEngine-DC controller.
{
"agent": {
"roce_nic_config": "enable",
"route_config": "enable"
},
"dc": {
"service_ip": "192.168.10.100",
"login_username": "admin",
"login_password": "Pwd@12345"
}
}
Table 3 Parameters
Parameter |
Description |
agent_master_ip |
IP address of the server where the master agent resides. Optional. If this parameter is not configured, the agent will send collected information to the SeerEngine-DC controller. |
agent_master_port |
Port of the server where the master agent resides. The default is 7008. Optional. |
service_ip |
Northbound service VIP of the SeerEngine-DC controller. |
login_username |
Username for logging in to the Web interface of the SeerEngine-DC controller. |
login_password |
Password for logging in to the Web interface of the SeerEngine-DC controller. After the agent starts, this field will be deleted. This field can be modified, and the modification takes effect after the agent restarts. |
CAUTION: If you do not synchronously modify the username and password when modifying the northbound service VIP, the modification to the northbound service VIP will not take effect. In this case, the agent will continue to upload GPU information to the previously configured SeerEngine-DC controller. |
4. Start the agent, and execute the start_agent.sh script with the root permissions to start the service.
If the value for the agent_master_ip parameter is the IP address of a NIC on the server, the server starts as the master node. If not, the server starts as a worker node.
¡ If the server starts as the master node, the agent actively sends the collected information to the SeerEngine-DC controller.
¡ If the server starts as a worker node, the agent preferentially sends the collected information to the master mode, and the agent sends the collected information to the SeerEngine-DC controller when the master node operates abnormally.
a. Add a firewall rule to allow traffic to pass through the specified Agent listening port.
Taking firewalld as an example, execute the following commands:
root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo firewall-cmd --zone=public --add-port=7008/tcp --permanent——7008 is the default port number. Replace it with the port of the server where the master agent resides as needed.
root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo firewall-cmd –reload
This command adds access permissions for TCP traffic through a specific port to the firewall configuration, ensures that the rule still take effect even after the firewall is restarted.
b. You must manually start the agent service on each server as follows.
root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo ./start_agent.sh
5. Configure routes and rules:
After the GPU and storage NICs obtain IP addresses, the agent will automatically configure routes and rules for the server GPU and storage NICs. Both VLAN and VXLAN networks are supported.
|
NOTE: · Start the agent after planning the VLAN or VXLAN network on the SeerEngine-DC controller. This enables the agent to identify the storage NICs to deploy relevant settings. · The routes and rules are specified by the agent. Do not edit them arbitrarily to prevent conflicts in route and rule configurations, which can lead to traffic anomalies. |
6. Configure RoCE settings:
Parameter network: Before starting the agent, create an RoCE policy for the parameter network fabric on the SeerEngine-DC controller.
Storage network: Before starting the agent, create an RoCE policy for the storage network fabric on the SeerEngine-DC controller. In addition, you must plan the VLAN or VXLAN network appropriately.
7. Upon startup, the agent checks whether the agent.network.config.yml file (configuration file for deploying RoCE parameters) exists on the host.
¡ If the file exists, the agent loads and runs the configuration file to deploy RoCE settings to the parameter NIC.
¡ If the file does not exist, the agent obtains the configured server NIC parameter template (that can be displayed on the RoCE policy details page of SeerEngine-DC) from SeerEngine-DC. It then generates the agent.network.config.yml file locally, loads and runs the configuration file, and deploys the RoCE settings to the parameter NIC.
¡ To obtain the configuration parameters from the server NIC parameter template again, you can use the sudo ./start_agent.sh –c d command to automatically delete the agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again.
¡ If you have edited the CNP or RoCE queue in the RoCE policy on SeerEngine-DC, use the sudo ./start_agent.sh –c d command to delete the existing agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again. Alternatively, manually edit the local agent.network.config.yml file, and then restart the agent to have your modification take effect.
CAUTION: · After the agent is running correctly, the agent.network.config.yml file will be generated on the host. If the file is not generated, check the logs for troubleshooting. · Do not delete the comment content in the agent.network.config.yml file. To edit configuration parameters, first remove the comment. · Edit parameters in the configuration file as needed. After modification, you must restart the agent to have the modification take effect. · Regardless of whether the agent is running, after the RoCE settings are deployed, they will be automatically executed upon startup. |
8. View the agent running state:
¡ To view the service process, enter the ps -ef | grep dc-agent | grep java command on the server. If process information is displayed, the process has started successfully.
¡ View the agent role. The agent.role field in the process information displays the role name.
root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# ps -ef | grep dc-agent | grep java
root 18076 1 0 15:38 pts/0 00:00:20 java -jar -Xmx512m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8000 dc-agent-1.0.0.jar --agent.role=master --dc.serviceIp=192.168.232.89 --dc.username=admin --dc.password=Pwd@12345
|
NOTE: The agent automatically starts upon service restart. |
Viewing server access information
1. On the SeerEngine-DC controller, navigate to the Monitor > Topology > Data-Center Topology > Basic Network Topology page.
2. Right-click a leaf device and select Server Access Info to view the server access information.
3. If a network anomaly prevents the SeerEngine-DC controller from obtaining accurate GPU information, try restarting the agent service by executing the start_agent.sh script.
Upgrading and uninstalling the software
Upgrading the software
1. Back up the current software version for use during version rollback.
2. Decompress the new version, check and edit information in the agent.config.json file, including the master IP, and port, as well as the IP address, account, and password for the SeerEngine-DC controller.
3. Start the service by executing the start_agent.sh script with root permissions.
Uninstalling the software
Execute the following command to uninstall SeerFabric agent:
root@server1:~/SeerEngine_DC-E6601_SeerFabricAgent# sudo ./stop_agent.sh
SeerFabric agent containerized deployment guide
Configuring basic environment settings
Configuring basic environment settings for the Ubuntu 22.04 system
1. Install LLDPD on the host.
root@server1:~# apt install lldpd -y
root@server1:~# systemctl enable lldpd.service
root@server1:~# systemctl start lldpd.service
root@server1:~# systemctl status lldpd.service //View LLDPD service status
2. Install NetworkManager on the host.
root@server1:~# apt install network-manager -y
root@server1:~# systemctl enable NetworkManager.service
root@server1:~# systemctl start NetworkManager.service
root@server1:~# systemctl status NetworkManager.service //View NetworkManager service status
Configuring basic environment settings for the Rocky Linux 8.8 (Green Obsidian) system
1. Install LSHW on the host.
[root@server1 ~]# yum install lshw -y
2. Install LLDPD on the host.
[root@server1 ~]# dnf install epel-release
[root@server1 ~]# dnf install lldpd
[root@server1 ~]# systemctl enable lldpd.service
[root@server1 ~]# systemctl start lldpd.service
[root@server1 ~]# systemctl status lldpd.service //View LLDPD service status
3. Install NetworkManager on the host.
[root@server1 ~]# yum install NetworkManager -y
[root@server1 ~]# systemctl enable NetworkManager.service
[root@server1 ~]# systemctl start NetworkManager.service
[root@server1 ~]# systemctl status NetworkManager.service //View NetworkManager service status
Deploying the SeerFabric agent on k8s
|
NOTE: The deployment steps and commands used are the same for the Ubuntu and CentOS systems. This section takes the Ubuntu system as an example for illustration. |
1. Upload the SeerFabric agent installation package to any directory on each server in the k8s cluster. The package name is SeerEngine_DC-version_SeerFabricAgent_K8S.zip, where the version parameter represents the version number.
2. Decompress the SeerFabric agent installation package and access the decompressed directory.
root@server1:~# unzip SeerEngine_DC-E6601_SeerFabricAgent_K8S.zip
root@server1:~# cd SeerEngine_DC-E6601_SeerFabricAgent_K8S
root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# ll
total 22244
drwxr-xr-x. 1 root root 187 Oct 31 13:56 config
-rw-r--r--. 1 root root 18669657 Oct 31 13:56 dc-agent-1.0.0.jar
-rwxr-xr-x. 1 root root 3953824 Oct 31 13:56 jq
-rwxr-xr-x. 1 root root 1143 Oct 31 13:56 start_agent.sh
-rwxr-xr-x. 1 root root 459 Oct 31 13:56 stop_agent.sh
3. Edit the file named config/agent.config.json by using the vi command. Enter the northbound service VIP, username, and password of the SeerEngine-DC controller to upload GPU information to the SeerEngine-DC controller.
{
"agent": {
"roce_nic_config": "enable",
"route_config": "enable"
},
"dc": {
"service_ip": "192.168.10.100",
"login_username": "admin",
"login_password": "Pwd@12345"
}
}
Table 4 Parameters
Parameter |
Description |
agent_master_ip |
IP address of the server where the master agent resides. Optional. If this parameter is not configured, the agent will send collected information to the SeerEngine-DC controller. |
agent_master_port |
Port of the server where the master agent resides. The default is 7008. Optional. |
roce_nic_config |
Whether to enable automatic deployment of RoCE settings for the parameter NIC. Options include enable and disable. |
route_config |
Whether to enable automatic deployment of routes and rules. Options include enable and disable. |
service_ip |
Northbound service VIP of the SeerEngine-DC controller. |
login_username |
Username for logging in to the SeerEngine-DC controller. |
login_password |
Password for logging in to the SeerEngine-DC controller. After the agent starts, this field will be deleted. This field can be modified, and the modification takes effect after the agent restarts. |
CAUTION: If you do not change the username and password when you change the northbound service VIP, the modification to the northbound service VIP will not take effect. In this case, the agent will continue to upload GPU information to the previously configured SeerEngine-DC controller. |
4. Copy the agent.config.json configuration file to the /usr/data/se-agent/config directory to mount the configuration file into the container. If the directory does not exist, create it manually.
5. Load the image.
If the Hubor image repository is not installed in the environment, use the following command to import the image:
root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# docker load -i dc-agent-*.tar
If a Hubor image repository is already installed in the environment, perform the following operations:
a. Log in to Docker Hub or the target repository.
docker login
Enter the username and password as prompted to log in.
b. Add the Docker image tar package.
docker load -i dc-agent-*.tar
Use the docker load command to add a Docker image file compressed in tar format to the local Docker engine. Use the -i option to specify the file name of the tar package to be loaded. After loading, the image will be available in the image storage of the local Docker engine. You can verify the result by using the docker images command.
c. Add tags to the image to push it to the target repository.
docker tag local-image:tag username/repository:tag
The local-image:tag field represents the tag of the local image, and the username/repository:tag represents the tag of the image in the target repository.
d. Push the image to the target repository.
docker push username/repository:tag
e. Because of image tag modification, you must change the image field value in the se-agent-deployment.yaml file to the Hubor repository image tag.
6. K8s pulls up the SeerFabric agent container. After all nodes in the cluster have completed the previous operations, execute the following command to pull up the container.
root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# kubectl apply -f se-agent-deployment.yaml
7. If the value for the agent_master_ip parameter is the IP address of a NIC on the server, the server starts as the master node. If not, the server starts as a worker node.
¡ If the server starts as the master node, the agent will send collected information to the SeerEngine-DC controller.
¡ If the server starts as a worker node, the agent will preferentially send collected information to the master node. If the master node becomes faulty, the agent will send collected information to the SeerEngine-DC controller.
|
NOTE: · If you do not want to deploy the SeerFabric agent on a specific node, set a taint by executing the kubectl label nodes <node-name> se-agent-switch=off command. |
8. Configure routes and rules:
After the GPU and storage NICs obtain IP addresses, the agent will automatically configure routes and rules for the server GPU and storage NICs. Both VLAN and VXLAN networks are supported.
|
NOTE: · Start the agent after planning the VLAN or VXLAN network on the SeerEngine-DC controller. This enables the agent to identify the storage NICs to deploy relevant settings. · The routes and rules are specified by the agent. Do not edit them arbitrarily to prevent conflicts in route and rule configurations, which can lead to traffic anomalies. |
9. Configure RoCE settings:
Parameter network: Before starting the agent, create an RoCE policy for the parameter network fabric on the SeerEngine-DC controller.
Storage network: Before starting the agent, create an RoCE policy for the storage network fabric on the SeerEngine-DC controller. In addition, you must plan the VLAN or VXLAN network appropriately.
10. Upon startup, the agent checks whether the /usr/data/se-agent/config/agent.network.config.yml file (configuration file for deploying RoCE parameters) exists on the host.
¡ If the file exists, the agent loads and runs the configuration file to deploy RoCE settings to the parameter NIC.
¡ If the file does not exist, the agent obtains the configured server NIC parameter template (that can be displayed on the RoCE policy details page of SeerEngine-DC) from SeerEngine-DC. It then generates the agent.network.config.yml file locally, loads and runs the configuration file, and deploys the RoCE settings to the parameter NIC.
¡ To obtain the configuration parameters from the server NIC parameter template again, you can use the sudo ./start_agent.sh –c d command in the container to automatically delete the agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again.
¡ If you have edited the CNP or RoCE queue in the RoCE policy on SeerEngine-DC, use the sudo ./start_agent.sh –c d command in the container to delete the existing agent.network.config.yml file and restart the agent. The agent will obtain the server NIC parameter template configuration file from SeerEngine-DC again. Alternatively, manually edit the local agent.network.config.yml file, and then restart the agent to have your modification take effect.
CAUTION: · After the agent is running correctly, the agent.network.config.yml file will be generated on the host. If the file is not generated, check the logs for troubleshooting. · Do not delete the comment content in the agent.network.config.yml file. To edit configuration parameters, first remove the comment. · Edit parameters in the configuration file as needed. After modification, you must restart the agent to have the modification take effect. · Regardless of whether the agent is running, after the RoCE settings are deployed, they will be automatically executed upon startup of the container. |
11. View the agent running state:
To view the agent running state, execute the following command:
root@server1:~/ SeerEngine_DC-E6601_SeerFabricAgent_K8S# kubectl get pod -n addc-roce
Viewing server access information
1. On the SeerEngine-DC controller, navigate to the Monitor > Topology > Data-Center Topology > Basic Network Topology page.
2. Right-click a leaf device and select Server Access Info to view server access information.
3. If a network anomaly prevents the SeerEngine-DC controller from obtaining accurate GPU information, try restarting the agent service by running the start_agent.sh script.
Upgrading the software
1. Back up the current software version for use during version rollback.
2. Decompress the new version, check and edit information in the agent.config.json file, including the master IP, port, automatic deployment status for RoCE settings of the parameter NIC, and automatic deployment status for routes and rules, as well as the IP address, account, and password for the SeerEngine-DC controller.
3. Copy the configuration file to the /usr/data/se-agent/config directory to overwrite the existing one, import the image file again, and then execute the kubectl apply -f se-agent-deployment.yaml command to view the container operating state.
FAQ
When a user changes the VLAN gateway configuration on the SeerEngine-DC controller page and the new and old gateways are in the same network segment, the agent will not detect the change. How can I resolve this issue?
To resolve this issue, log in to the server and delete the ip route configuration corresponding to the NIC. Then, the agent will automatically detect the loss of route configuration and redeploy the ip route configuration based on the gateway IP on the SeerEngine-DC controller.