Download Book

URL Filtering Technology White Paper-6W100-book.pdf(273.79 KB)

Released At: 17-06-2025
Page Views:
Downloads:

Table of Contents

URL Filtering Technology White Paper-6W100

Related Documents

URL Filtering Technology White Paper

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

This document provides generic technical information, some of which might not be applicable to your products

Contents

Overview·· 2

Technical background· 2

URL filtering rule· 4

URL filtering rule type· 4

URL filtering rule matching method· 4

URL category· 5

URL filtering whitelist/blacklist rule· 5

URL filtering policy· 5

URL filtering signature library management 5

URL filtering signature library update· 6

URL filtering signature library rollback· 6

URL filtering cloud query· 6

URL filtering logging for resource access· 6

URL fast auditing· 7

HTTPS URL filtering· 7

Implementation· 7

URL filtering mechanism·· 7

Generate and deploy URL filtering rules· 8

Identify URLs in packets· 9

Match URL filtering rules and return matching results· 9

Process packets· 9

Whitelist mode· 10

URL category mode· 10

URL filtering workflow· 11

Application scenarios· 13

Control company website access through URL filtering· 13

Control campus website access through URL filtering· 14

Overview

Technical background

As network security demands grow and internet traffic surges, cyber attacks become increasingly complex and covert. Traditional detection methods based on ports and protocols can no longer meet modern security needs. Enterprises and organizations face various security threats, including malware, phishing attacks, online fraud, and data breaches.

In this context, DPI technology emerges. It analyzes network traffic in depth, inspecting packet content across all seven layers of the OSI model, not just header information. DPI identifies and manages the actual content of packets, including application-specific commands and behaviors. This capability allows it to detect and block malicious traffic while permitting legitimate traffic to pass through.

Furthermore, the development of DPI technology has led to deep security URL filtering capabilities. As network environments become more complex, filtering by IP address or domain name is insufficient against sophisticated threats. Hackers and cybercriminals can easily create seemingly legitimate websites for phishing or distributing malware. Thus, a more intelligent and refined filtering mechanism is necessary to identify and block harmful websites and URLs.

URL filtering analyzes multiple dimensions, such as content, structure, behavior, and reputation, to determine if a website or URL poses security risks. This method filters URLs based on static blacklist and whitelist, categorizes URLs, and enables cloud integration to effectively combat attacks and unknown threats. Additionally, URL filtering supports custom policies to meet specific needs of various industries and organizations. It protects users from phishing, malware distribution, inappropriate content, and other cybersecurity threats, while also restricting internal users from accessing certain websites.

Benefits

URL filtering, as a DPI deep security protection technology, has the following technical advantages:

· Real-time analysis and processing: URL filtering can monitor and analyze network traffic in real time. Through continuous packet inspection, it can instantly identify and intercept suspicious or non-compliant URLs, ensuring immediate network response capability.

· Dynamic updates: URL filtering typically includes an automatic update mechanism for the signature library and can query the latest URL categories from the cloud, keeping the filtering list up to date to defend against emerging threats.

· High customization: URL filtering supports highly customizable policy settings, allowing administrators to define filtering rules based on the organization's security policies and needs, such as allowing or denying access to specific types of websites.

· Easy integration and expansion: URL filtering is typically designed for easy integration into unified intelligent security policies and can seamlessly cooperate with firewalls, IPS, and security information and event management systems.

· Improving user experience: By blocking access to malicious or inappropriate websites, URL filtering enhances end users' browsing security, reducing the risk of phishing and other online scams, thus improving user experience.

· Supporting regulatory compliance: For organizations that must adhere to specific internet usage regulations, URL filtering helps ensure compliance with relevant laws by restricting access to illegal content.

· Reducing security management costs: Automated URL filtering reduces reliance on manual intervention, improves the efficiency of security operations, and lowers overall cybersecurity management costs.

· Supporting whitelist mode: URL filtering quickly allows access to whitelisted sites and blocks sites outside the whitelist.

· Encrypted traffic detection: URL filtering supports detection and filtering of HTTPS encrypted traffic.

Concepts

URL

A URL is a reference to a resource that specifies the location of the resource on a network and a mechanism for retrieving it. The syntax of a URL is protocol://host [:port]/path/[;parameters][?query]#fragment. Figure 1 shows an example URL.

Figure 1 URL syntax

Table 1 describes the fields in a URL.

Table 1 URL field descriptions

Field	Description
protocol	Transmission protocol, such as HTTP.
host	Domain name or IP address of the server where the indicated resource is located.
[:port]	Optional field that identifies the port number of the transmission protocol. If this field is omitted, the default port number of the protocol is used.
/path/	String that identifies the directory or file where the indicated resource is stored. The path is a sequence of segments separated by zero or multiple forward slashes.
[parameters]	Optional field that contains special parameters.
[?query]	Optional field that contains parameters to be passed to the software for querying dynamic webpages. Each parameter is a <key>=<value> pair. Different parameters are separated by an ampersand (&).
URI	Uniform resource identifier that identifies a resource on a network.

‌

URL filtering

URL filtering is an important measure in network security management. Through URL filtering, network administrators can restrict user access to specific sites based on company policies. This feature helps prevent employees from accessing unsafe or inappropriate websites, reduces cybersecurity risks, and enhances overall network security.

In addition to blocking access to harmful sites, URL filtering helps improve network bandwidth utilization. By restricting access to certain entertainment or social sites, network administrators can reduce traffic consumption and enhance speed and efficiency. Furthermore, for specific industries, URL filtering helps companies meet regulatory requirements and protect intellectual property and trade secrets.

URL filtering plays an important role in network security management and traffic control. It helps companies enhance network security and improve effective utilization of network resources, making it essential in network management.

URL filtering rule

A URL filtering rule matches URLs based on the content in the URI or hostname field.

URL filtering rule type

URL filtering provides the following types of URL filtering rules:

· Predefined URL filtering rules—Signature-based URL filtering rules. The device automatically generates them based on the local URL filtering signatures. In most cases, the predefined rules are sufficient for URL filtering.

· User-defined URL filtering rules—Regular expression- or text-based URL filtering rules that are manfully configured.

URL filtering rule matching method

A URL filtering rule supports the following URL matching methods:

· Text-based matching—Matches the hostname and URI fields of a URL against text patterns.

When performing text-based matching for the hostname field of a URL, the device first determines if the text pattern contains the asterisk (*) wildcard character at the beginning or end.

¡ If the text pattern does not contain the asterisk (*) wildcard character at the beginning or end, the hostname matching succeeds if the hostname of the URL matches the text pattern.

¡ If the text pattern contains the asterisk (*) wildcard character at the beginning, the hostname matching succeeds if the hostname of the URL matches or ends with the text pattern without the wildcard character.

¡ If the text pattern contains the asterisk (*) wildcard character at the end, the hostname matching succeeds if the hostname of the URL matches or starts with the text pattern without the wildcard character.

¡ If the text pattern contains the asterisk (*) wildcard character at both the beginning and the end, the hostname matching succeeds if the hostname of the URL matches or includes the text pattern without the wildcard characters.

Text-based matching for the URI field works in the same way that text-based matching for the hostname field works.

· Regular expression-based matching—Matches the hostname and URI fields of a URL against regular expressions. For example, if you set the regular expression for hostname matching to sina.*cn, URLs that carry the news.sina.com.cn hostname will be matched.

URL category

URL filtering provides the URL categorization feature to facilitate filtering rule management.

You can classify multiple URL filtering rules to a URL category and specify an action for the category. If a matching rule is in multiple URL categories, the system takes the action for the category with the highest severity level.

URL filtering supports the following types of URL categories:

· Predefined URL categories.

The predefined URL categories contain the predefined URL filtering rules. Each predefined URL category has a unique severity level in the range of 1 to 999, and a category name that begins Pre-. Predefined URL categories cannot be modified.

The device supports two levels of predefined URL categories: child URL category and parent URL category.

A predefined parent URL category contains only predefined child URL categories.

· User-defined URL categories.

You can manually create URL categories and configure filtering rules for them. The severity level of a user-defined URL category is in the range of 1000 to 65535. You can edit the filtering rules and change the severity level for a user-defined URL category.

URL filtering whitelist/blacklist rule

The device supports using URL-based whitelist and blacklist rules to filter packets. If the URL in a packet matches a blacklist rule, the packet is dropped. If the URL matches a whitelist rule, the packet is permitted to pass through.

URL filtering policy

A URL filtering policy can contain the following settings:

· URL categories and filtering actions. URL filtering actions include drop, permit, block source, reset, redirect, and logging.

· URL filtering whitelist and blacklist rules.

· URL filtering cloud query.

You can also specify the default action on packets that do not match any filtering rules (including URL categories and URL filtering whitelist and blacklist rules) in the policy.

URL filtering signature library management

The device uses the local URL filtering signature library to identify URLs in the HTTP packets.

You can update the device URL filtering signature library to the most up-to-date version or roll back the library to a version.

URL filtering signature library update

The following methods are available for updating the URL filtering signature library on the device.

Automatic update

The device periodically accesses the company's website and automatically downloads the most up-to-date URL filtering signature file to update its local signature library. If the device can access the signature library service section on the official website, you can use automatic update method to update the URL filtering signature library on the device.

Triggered update

The device downloads the most up-to-date URL filtering signature file from the company's website to update its local signature library immediately you trigger the operation. When the administrator finds that the URL filtering signature library in the signature library service section on the official website has been updated, the administrator can choose to upgrade the URL filtering signature library version in time by triggering the update.

Manual update

If the device cannot access the signature database services on the company's website, use one of the following methods to manually update the URL filtering signature library on the device:

· Local update—Updates the URL filtering signature library on the device by using the locally stored update URL filtering signature file.

· FTP/TFTP update—Updates the URL filtering signature library on the device by using the file stored on the FTP or TFTP server.

To specify the source IP of request packets to the TFTP or FTP server for manual signature library update, specify the source keyword in the url-filter signature update command. For example, if packets from the device must be translated by NAT before accessing the TFTP or FTP server, you must specify a source IP address complied with NAT rules for NAT translation. If NAT translation is performed by an independent NAT device, make sure the IP address specified by the url-filter signature update command can reach the NAT device at Layer 3.

URL filtering signature library rollback

If filtering false alarms or filtering exceptions occur frequently, you can roll back the URL filtering signature library to the factory default version.

URL filtering cloud query

The URL filtering cloud query feature enables the system to send URLs that do not match any local URL filtering rules to the cloud server for further query. This helps improves URL filtering accuracy for HTTP traffic.

The device caches the URL filtering rules returned from the cloud query server in the URL filtering cache. You can set the maximum number of rules that can be cached, and the minimum cache period for the cached rules.

URL filtering logging for resource access

URL filtering logs user access to resources after you specify the logging action for a URL category or as a default action for a URL filtering policy.

You can use either of the following methods to configure URL filtering to log access to specific types of resources:

· Configure URL filtering to log access to only resources in the root directories of websites.

· Enable or disable URL filtering logging for access to resources of specific types.

URL fast auditing

By default, URL filtering inspects and audits URLs in packets and determines the packet processing actions based on the inspection results in the software forwarding process. However, both software forwarding and URL filtering consume CPU resources and the packet forwarding performance will be degraded if the CPU usage is high.

URL fast auditing enables the device to send copies of HTTP packets to the CPU for audit (logging) by the URL filtering module during the hardware forwarding process. URL filtering only logs HTTP packets that match the logging action. All other URL filtering actions are ignored.

HTTPS URL filtering

By default, the device supports only the HTTP URL filtering. To enable filtering on HTTPS traffic, use either of the following methods:

· Use SSL decryption to decrypt the HTTPS traffic and then perform HTTP URL filtering on the decrypted traffic.

· Enable HTTPS URL filtering. This feature performs URL filtering on undecrypted HTTPS traffic. The device directly detects the Client Hello message from the client, and extracts the server name from the Server Name Indication (SNI) extension to match the URL filtering policy.

SSL decryption involves a large number of encryption and decryption operations, which might downgrade device forwarding performance. As a best practice, enable URL filtering on HTTPS traffic by enabling HTTPS URL filtering when the device need to perform URL filtering only on HTTPS traffic.

By default, the device supports only the HTTP URL filtering. To filter HTTPS traffic, enable HTTPS URL filtering. This feature allows the device to directly detect the HTTPS Client Hello message from the client, and extracts the server name from the Server Name Indication (SNI) extension to match the URL filtering policy.

Implementation

URL filtering mechanism

URL filtering determines actions on packets by matching identified URLs with filtering rules. First, the URL filtering module sends filtering rules to the application layer inspection engine. Next, after the engine identifies a URL in the packet, it matches the URL with the filtering rules and returns the matching results to the URL filtering module. Finally, the URL filtering module decides the action to take on the packet based on the matching results.

Figure 2 URL filtering mechanism

Generate and deploy URL filtering rules

The URL filtering module generates corresponding URL filtering rules based on the URL feature database and the administrator's configuration, then delivers them to the DPI engine.

URL filtering rules include the following types:

· Predefined rules

Predefined rules are generated from the URL signature library loaded on the device. The library covers most mainstream websites (i.e., URLs) and their category information. Predefined rules meet the requirements of users in most scenarios.

· User-defined rules

User-defined rules are generated based on user-defined URL categories configured by the administrator. User-defined categories are manually created by the administrator based on specific requirements. They facilitate unified access control for URLs with the same access restrictions.

· Whitelist rules

Whitelist rules are generated from the whitelist configured by the administrator. The whitelist is a collection of URLs that require access permissions from the administrator.

· Blacklist rules

Blacklist rules are generated from the blacklist configured by the administrator. The blacklist is a collection of URLs that require access blocking from the administrator.

Figure 3 Generate and deploy URL filtering rules

Identify URLs in packets

DPI engine parses, decodes, and segments packets to extract URLs, which consist of the Host and request URI fields. These URLs are used for subsequent matching with filtering rules.

Figure 4 Identify URLs in packets

Match URL filtering rules and return matching results

When the engine identifies the URL in the packet, it matches the URL with filtering rules and returns the results to the URL filtering module. The URL filtering module then processes the packet based on the matching results.

Figure 5 Match URL filtering rules and return matching results

Process packets

URL filtering supports two modes: whitelist mode and URL category mode. The URL filtering module processes packets differently in each mode.

Whitelist mode

In this mode, users can only access websites defined in the whitelist. All other websites are blocked. This applies to scenarios with strict restrictions on accessible websites.

URL category mode

In this mode, the URL filtering module controls the websites users can access by category. The administrator can configure different actions for each category, including block-source, drop, permit, redirect, reset, and logging, as well as severity levels. The severity level determines which action to take when a URL belongs to multiple categories, prioritizing the highest severity level. Only user-defined categories support configurable severity levels; predefined categories have their severity levels defined by the URL signature library.

To perform actions on certain URLs of different categories, the administrator can add them to the blacklist or whitelist. The blacklist or whitelist takes precedence over URL categories. This mode is suitable for scenarios that require flexible control over accessible websites.

Local filtering rules matched

If the engine detects that a packet matches URL filtering rules, the URL filtering module looks up the corresponding category information or blacklist/whitelist based on rule IDs, and then processes the packet accordingly. For both user-defined and predefined rules, if multiple rules are matched, the URL filtering module executes the action of the highest severity level among the matched categories.

Figure 6 Match filtering rules

No local filtering rules matched (cloud query performed)

If no URL filtering rules are matched, the URL filtering module sends the URL to the cloud server for querying. The cloud server contains a large number of URLs and their category information. The URL filtering module processes the packet based on the query results returned from the cloud server. If the query succeeds, it executes the corresponding action based on the returned category information. If the query fails, it executes the default action configured by the administrator.

Figure 7 No local filtering rules matched

URL filtering workflow

When HTTP is used to access a network resource through the device, the device will perform URL filtering on HTTP messages.

URL filtering takes effect after you apply a URL filtering policy to a DPI application profile and use the DPI application profile in a security policy rule.

As shown in Figure 8, upon receiving a packet, the device performs the following operations:

1. The device compares the packet with the security policy rules.

If the packet matches a rule that is associated with a URL filtering policy (through a DPI application profile), the device extracts the URL from the packet.

2. The device compares the extracted URL with the whitelist and blacklist rules in the URL filtering policy.

If both the whitelist and blacklist features are enabled, the device uses the following process to handle the packet:

a. If the URL matches a whitelist rule, the packet is permitted to pass through.

b. If the URL does not match a whitelist rule, the device identifies whether the URL matches a blacklist rule.

- If the URL matches a blacklist rule, the packet is dropped.

- If the URL does not match a blacklist rule, the device performs step 3.

If only the whitelist feature is enabled, the device handles the packet as follows:

¡ If the URL matches a whitelist rule, the packet is permitted to pass through.

¡ If the URL does not match a whitelist rule, the device drops the packet.

If both the whitelist and blacklist features are disabled, the device performs step 3.

3. The device compares the extracted URL with the URL filtering rules in the URL filtering policy.

a. If the URL matches a URL filtering rule that belongs to a user-defined URL category, the devices takes the action specified for the URL category. If the URL filtering rule belongs to multiple user-defined URL categories, the action specified for the URL category with the highest severity level applies.

If no matching URL filtering rule belongs to a user-defined URL category, the device moves to step b.

b. If the URL matches a URL filtering rule that belongs to a predefined URL category, the devices takes the action specified for the URL category.

If the URL filtering rule belongs to multiple predefined URL categories, the action specified for the URL category with the highest severity level applies.

4. If the URL does not match any rule in the policy, and cloud query is enabled in the policy, the device identifies whether the URL matches a cached URL filtering rule (history query result from the cloud server, including the URL and its category name).

¡ If a matching cached rule is found for the URL, the device determines the action to take on the packet as described in step b of step 3.

¡ If no matching cached rule is found for the URL, the default action specified for the policy applies. If the default action is not configured, the device permits the packet to pass through. In addition, the device sends the URL to the cloud server for further query and caches the query result.

If the URL does not match any rule in the policy and cloud query is disabled in the URL filtering policy, go to the next step.

5. If the URL does not match any rule in the policy, the default action specified for the policy applies. If the default action is not configured, the device permits the packet to pass through.

Figure 8 URL filtering workflow