HTTP/2 fingerprinting: A relatively-unknown method for web fingerprinting

HTTP/2 fingerprinting is a method by which web servers can identify which client is sending the request to them¹. It can identify the browser type and version, for instance, or whether a script is used. The method relies on the internals of the HTTP/2 protocol which are less widely known that those of its simpler predecessor HTTP/1.1. In this post I will first give a short description of the HTTP/2 protocol, then provide details on how a web server can use the protocol’s various parameters to identify the client. Finally, I will list methods of checking and controlling a client’s HTTP/2 signature.

This is the second part of a two-part series about web fingerprinting. Read the previous post about TLS fingerprinting here.

Back to HTTP/1.1
A short introduction to HTTP/2
- Frames and streams
Client fingerprinting with HTTP/2
Where is HTTP/2 fingerprinting being used?
Controlling your HTTP/2 signature
Checking a client’s HTTP/2 signature
- The TS1 method and library
Concluding

Back to HTTP/1.1

With HTTP/1.1 - the older, more familiar protocol - a client sends a textual request to the server (usually encrypted with TLS). Here’s how Chrome’s request looks like by default:

GET / HTTP/1.1
Host: www.wikipedia.org

sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

The User-Agent header contains the client’s exact version and thus can be used to identify the client. However, this is easy to fake with any http library or command line tool and is no longer considered a reliable method of fingerprinting by any means. A little less known fact is that the Accept header also takes different values depending on the client. This is also easy to fake however.

A short introduction to HTTP/2

HTTP/2 is a major revision of the HTTP protocol and has been around since around 2015. About half of all websites now use HTTP/2², and basically all the popular sites use it by default. A great in-depth overview of the HTTP/2 protocol can be found in this article. I will detail the parts most important to this article.

You can check if a website is running HTTP/2 with the Chrome/Firefox developer tools. For example, in Firefox it would look like the following:

HTTP2 in Firefox dev tools

The primary goal of HTTP/2 is to improve the performance of websites and web applications. It achieves that goal by implementing a few core features:

Multiplexing - Multiple requests and responses can share the same TCP connection simultaneously, thus reducing the time to fetch sites with a large number of resources (images, scripts, etc.).
Prioritization - HTTP/2 supports prioritizing certain requests and responses.
Server push - In HTTP/2, the server can send resources to the client before the client requests them.

The application semantics of the HTTP protocol are not changed however: It is still composed of the familiar request/response model with URIs, HTTP methods, HTTP headers and status codes.

Frames and streams

HTTP/2 is a binary protocol, as opposed to the textual HTTP/1.1. The messages in HTTP/2 are composed of frames, with ten types of frames serving different purposes. Frames are always part of a stream. A single stream is usually used to fetch a single resource from the server (html, script, image, etc.). Frames from multiple streams can be sent and received simultaneously, and thus multiplexing is achieved. A typical HTTP/2 connection would usually look like the following:

Sample HTTP/2 connection

In this illustration the following frames are exchanged:

SETTINGS - This frame is the first frame sent by the client and contains HTTP/2-specific settings. It is part of stream 0, which is the default root stream. No resource is retrieved on stream 0.
WINDOW_UPDATE - Increases the window size of the receiver. More on this later.
HEADERS - Contains the actual request from the client to the server. It contains the URI, the HTTP method and the client’s HTTP headers.
DATA - Contains the response from the server with the requested resource’s data.

Client fingerprinting with HTTP/2

Let’s take a deeper look at some of the frames. Each of the frames contains information that allows clients to be easily fingerprinted by the server.

The `SETTINGS` frame

With the SETTINGS frame, the client informs the server about its HTTP/2 preferenecs. There are six different settings³ with which the client can control parameters such as the maximum number of concurrent streams, the maximum number of HTTP headers, the default window size and whether it supports the server push feature. Each HTTP/2 client uses a different set of settings. The same client will usually use the same set of settings regardless of what the actual HTTP request is.

To see what SETTINGS are sent by a client, I usually use nghttpd, a small HTTP/2 server that can log these parameters. Here are Chrome’s settings taken from the log:

recv SETTINGS frame <length=24, flags=0x00, stream_id=0>
    [SETTINGS_HEADER_TABLE_SIZE(0x01):65536]
    [SETTINGS_MAX_CONCURRENT_STREAMS(0x03):1000]
    [SETTINGS_INITIAL_WINDOW_SIZE(0x04):6291456]
    [SETTINGS_MAX_HEADER_LIST_SIZE(0x06):262144]

Seen here are 4 different settings set by Chrome to some fixed values. Here are Firefox’s settings in comparison:

recv SETTINGS frame <length=18, flags=0x00, stream_id=0>
    [SETTINGS_HEADER_TABLE_SIZE(0x01):65536]
    [SETTINGS_INITIAL_WINDOW_SIZE(0x04):131072]
    [SETTINGS_MAX_FRAME_SIZE(0x05):16384]

Both the kind of settings and their values are different, making the browsers easily distinguishable. As another example, curl sets the SETTINGS_ENABLE_PUSH setting to 0 to disable the server push feature, which makes it distinguishable from a browser. Because the settings aren’t easily controllable by the user, they become a reliable method for client fingerprinting.

The `WINDOW_UPDATE` frame

HTTP/2 implements a mechanism for flow-control. Flow-control gives the receiving side means to regulate the flow of traffic on a per-stream basis. This is implemented using a window size, which is a number specifying how many bytes the receiver can process. There is a window size for each stream and a window size for the connection as a whole. This mechanism is pretty similar to TCP flow-control, but since multiple streams are multiplexed on top of a single TCP connection, HTTP/2 implements its own stream-level flow-control. For a full explanation you may refer to the RFC or to this article.

The stream-level default window size is controlled by the SETTINGS_INITIAL_WINDOW_SIZE in the SETTINGS frame, visible in the settings tables above. You can observe above that Chrome uses 6MB (6291456) and Firefox uses 128KB (131072).

As the client receives data, it can adjust the window size using a WINDOW_UPDATE frame, which increases its window size.

The connection-level window size is 65535 bytes by default and can only be increased by sending a WINDOW_UPDATE frame on the special stream id 0. Most clients will send a WINDOW_UPDATE frame for stream 0 right at the beginning of the connection, immediately after sending the SETTINGS frame. This is how it looks like for Chrome:

recv WINDOW_UPDATE frame <length=4, flags=0x00, stream_id=0>
          (window_size_increment=15663105)

Chrome is in effect increasing the connection-level window size to 15MB (15663105+65535=15MB). Firefox, on the other hand, will increase it to 12MB. curl uses 32MB⁴. Hence this parameter can be used for fingerprinting as well.

The `HEADERS` frame

The HEADERS frame contains, broadly speaking, all the functionality of HTTP/1.1 in a single frame. It contains the server’s host, the resource URI, the method (GET/POST/etc.) and the client’s headers. An important difference, however, is that everything is now considered a “header”. Here’s how it looks like for Chrome:

recv (stream_id=3) :method: GET
recv (stream_id=3) :authority: localhost:8443
recv (stream_id=3) :scheme: https
recv (stream_id=3) :path: /favicon.ico
recv (stream_id=3) sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"
recv (stream_id=3) sec-ch-ua-mobile: ?0
recv (stream_id=3) user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36
recv (stream_id=3) sec-ch-ua-platform: "Linux"
recv (stream_id=3) accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
recv (stream_id=3) sec-fetch-site: same-origin
recv (stream_id=3) sec-fetch-mode: no-cors
recv (stream_id=3) sec-fetch-dest: image
recv (stream_id=3) accept-encoding: gzip, deflate, br
recv (stream_id=3) accept-language: en-GB,en;q=0.9
recv HEADERS frame <length=121, flags=0x25, stream_id=3>

The method is encoded in the special :method header, the host in :authority, the scheme in :scheme and the URI in :path. The interesting thing here is that the order of these pseudo-headers is fixed but different for each client. From the protocol’s standpoint all orders are valid, but each client had decided to order them differently. The header order for some common clients (using the first letter of each pseudo-header to denote it):

Browser	Order
Chrome	`masp`
Firefox	`mpas`
Safari	`mspa`
curl	`mpsa`

This seemingly small difference is again making it easy to fingerprint the clients.

The `PRIORITY` frame

In HTTP/2 the client can define stream priorities. For example, the client may want to prioritize receiving JS scripts over images. This article being long enough, I will not describe this mechanism in full details. However, it is important to know two things:

The client can define a tree of streams, by specifying for each stream a parent stream. This tree defines dependencies for prioritization purposes.
The client can define for each stream a weight, which sets its priority relative to its siblings in the tree.

Both the parent of each stream and its weight are communicated via the PRIORITY frame. Firefox, for example, builds a rather complex tree of streams that looks like the following:

Stream priorities in Firefox

To create this tree Firefox by default will send a PRIORITY frame for streams 3,5,7,9,11,13 defining their parents and weights. Inspecting the nghttpd logs we observe this as follows:

recv PRIORITY frame <length=5, flags=0x00, stream_id=3>
	(dep_stream_id=0, weight=201, exclusive=0)
recv PRIORITY frame <length=5, flags=0x00, stream_id=5>
	(dep_stream_id=0, weight=101, exclusive=0)
recv PRIORITY frame <length=5, flags=0x00, stream_id=7>
	(dep_stream_id=0, weight=1, exclusive=0)
recv PRIORITY frame <length=5, flags=0x00, stream_id=9>
	(dep_stream_id=7, weight=1, exclusive=0)
recv PRIORITY frame <length=5, flags=0x00, stream_id=11>
	(dep_stream_id=3, weight=1, exclusive=0)
recv PRIORITY frame <length=5, flags=0x00, stream_id=13>
	(dep_stream_id=0, weight=241, exclusive=0)

The use of this specific tree structure and these specific weights is thus very indicative of Firefox.

Where is HTTP/2 fingerprinting being used?

HTTP/2 fingerprinting lets the server identify the client reliably before responding with data. Therefore it is used for similar purposes as TLS fingerprinting: Usually by commercial anti-DDOS and anti-bot solutions attempting to block automatic tools while allowing real browsers.

I’ve personally witnessed this method being used in the wild, such that real browsers were handled the real site’s content, but curl-impersonate, for example, got blocked. This was before HTTP/2 impersonation was fully implemented in curl-impersonate.

Controlling your HTTP/2 signature

As seen above, the HTTP/2 protocol contains a lot of details, and the parameters involved are not always configurable by the user. Tools and libraries will usually try to abstract the HTTP/2 details away, and as a result each of these tools created its own unique HTTP/2 signature which cannot be easily altered.

To control your HTTP/2 signatures there are three methods that I’m aware of:

Use a headless browser through a framework such as Puppeteer or Playwright. By using a real browser, you get that browser’s HTTP/2 signature.
curl-impersonate, my own fork of the popular curl tool, that supports impersonating real browsers. In its latest version it has a much better HTTP/2 impersonation support. It can impersonate the HTTP/2 signatures of Firefox and Chrome pretty well, including all the parameters mentioned in this article. Its main advantage is that it combines the correct TLS signature as well.
Write your own HTTP/2 client code through a low-level library such as nghttp2, which gives you full control over all parameters.

Checking a client’s HTTP/2 signature

You may wonder how to check a clien’ts HTTP/2 signature. Unlike TLS fingerprinting which relies on an unencrypted TLS Client Hello packet, the HTTP/2 frames will almost always be encrypted. This makes it a bit harder to inspect. There are two options which I like to use.

Capture the encrypted session in Wireshark while defining the SSLKEYLOGFILE environment variable. Most clients will then write a keylog file which Wireshark can use to decrypt the session. Full instructions are available here. The decrypted frames will look like the following (note the presence of the frames discussed above):
Use nghttpd, a small HTTP/2 server. It is already packaged for most Linux distributions and macOS. To use it, first create a self-signed SSL key and certificate, then run it as follows:
```
nghttpd -v 8443 server.key server.crt
```
Connect a client to https://localhost:8443 and nghttpd will log all the frames it receives with all the parameters.

The TS1 method and library

TS1 is a method and a Python package I developed for the purpose of checking and comparing clients’ signatures. It is available at https://github.com/lwthiker/ts1 or on PyPI.

TS1 takes all the HTTP/2 frames the client sends until, and including, the HEADERS frame, and encodes them into a JSON format that looks like the following (shown is a truncated version):

{
    "frames": [
        {
            "frame_type": "SETTINGS",
            "stream_id": 0,
            "settings": [
                {
                    "id": 1,
                    "value": 65536
                },
                {
                    "id": 4,
                    "value": 131072
                },
                {
                    "id": 5,
                    "value": 16384
                }
            ]
        },
        {
            "frame_type": "WINDOW_UPDATE",
            "stream_id": 0,
            "window_size_increment": 12517377
        },
        {
            "frame_type": "PRIORITY",
            "stream_id": 3,
            "priority": {
                "dep_stream_id": 0,
                "weight": 201,
                "exclusive": false
            }
        },
        {
            "frame_type": "HEADERS",
            "stream_id": 15,
            "pseudo_headers": [
                ":method",
                ":path",
                ":authority",
                ":scheme"
            ]
        }
    ]
}

The JSON is then turned into a canonical form, a compactified form according to certain rules:

{"frames": [{"frame_type": "SETTINGS", "settings": [{"id": 1, "value": 65536}, {"id": 4, "value": 131072}, {"id": 5, "value": 16384}], "stream_id": 0}, {"frame_type": "WINDOW_UPDATE", "stream_id": 0, "window_size_increment": 12517377}, {"frame_type": "PRIORITY", "priority": {"dep_stream_id": 0, "exclusive": false, "weight": 201}, "stream_id": 3}, {"frame_type": "HEADERS", "pseudo_headers": [":method", ":path", ":authority", ":scheme"], "stream_id": 15}]}

then a SHA1 hash of the string is calculated to produce the TS1 signature hash:

c9bb208868a10863867841a2e5bcb3b903719784

Different clients will have different hashes, and the hashes can be easily saved in a database for easy comparison of clients’ signatures.

More details about using the TS1 library can be found in the GitHub page.

Concluding

I will conclude with the same words from the previous post: Fingerprinting has become extremely common throughout the web, and while it is used for legitimate purposes such as blocking DDOS attacks, it is also making the web less open, less private and much more restrictive towards specific web clients. I have witnessed before how websites mark certain browsers as suspicious while letting in others (not intentionally probably), with TLS and HTTP fingerprinting being the main methods to achieve that.

With the added awareness about the prevelance of such techniques, I hope that browsers, web clients and future protocol designers will be more attentive towards these kinds of issues.

This method, though relatively-unknown, is not new. After doing my own research about the subject for curl-impersonate, I found this BlackHat presentation detailing a research with similar conclusions. ↩
https://w3techs.com/technologies/details/ce-http2 ↩
https://httpwg.org/specs/rfc7540.html#SettingValues ↩
Source code reference for curl’s window size ↩

Table of contents