TLS fingerprinting: How it works, where it is used and how to control your signature
In this two-part series of posts I would like to expand about server-side browser fingerprinting. Server-side fingerprinting is a collection of techniques used by web servers to identify which web client is making a request based on network parameters sent by the client. By web client I mean the type of client, as in which browser or CLI tool, and not a specific user like what a cookie identifies.
A different technique from server-side fingerprinting is client-side fingerprinting, which is when Javascript is injected to test the client. This may be the subject of a future post, and I’ll focus on server-side fingerprinting for now.
TLS fingerprinting is a widely-deployed server-side technique. It allows web servers to identify the client to a high degree of accuracy based on the first packet of the connection alone. I will give examples below to demonstrate just how easy it is to tell the client from the its TLS parameters.
This is the first part of a two-part series about web fingerprinting. Read the second post about HTTP/2 fingerprinting here.
Table of contents
- How does TLS fingerprinting work
- Methods for signature calculation
- Where is TLS fingerprinting being used?
- Controlling your TLS signature
- What’s next for TLS fingerprinting?
How does TLS fingerprinting work
TLS is the evolution of SSL, the protocol previously responsible for handling encrypted connections between web clients and servers. SSL is no longer in common use, but its name is still mistakenly used to refer to TLS as well.
Whenever a web client - a browser, script or a command line tool - accesses a TLS-encrypted site (https://...
), it first performs a TLS handshake with the server. Here is a schematic diagram, courtesy of Wikipedia:
The first message is the TLS client hello, sent by the client to server. In this message the client declares to the server what parts of the TLS protocol it supports. The following are examples of parameters sent by the client:
- The versions of the TLS protocol the client supports (from TLS 1.0 up to TLS 1.3).
- The cryptographic algorithms the client supports for data encryption, known as cipher suites.
- The cryptographic algorithms the client supports for digital signatures.
As it happens, each client uses a different TLS library: Firefox uses NSS, Chrome uses BoringSSL, Safari uses Secure Transport, and Python uses OpenSSL. The result is that the above parameters differ significantly between clients. Here is an example of the cipher suites list declared by Chrome in the TLS client hello, as captured by Wireshark:
This list - its contents and the order of ciphers - is different depending on the TLS client in use. In addition to that, TLS is such a complex protocol that it has many extensions, each with its own set of additional parameters 1. To give some examples:
- Some clients support compressing the exchanged certificates through a dedicated TLS extension.
- Some clients support negotiating parameters for the underlying protocol (e.g. HTTP/2) through a dedicated TLS extension called ALPS.
- Some clients add a fake TLS extension called GREASE.
Here is how Chrome’s list of TLS extensions looks like in Wireshark:
For each browser the above list of extensions is different, and the order of extensions may differ as well.
The following is a comparison table demonstrating notable differences in TLS signatures of common clients2:
Chrome | Safari | Firefox | Python | |
---|---|---|---|---|
No. of cipher suites | 16 | 27 | 17 | 43 |
No. of signature algorithms | 8 | 11 | 11 | 20 |
ALPS extension | Yes | No | No | No |
Certificate compression method | Brotli | Zlib | None | None |
GREASE extension | Yes | Yes | No | No |
With this in mind it is obvious that web clients can be easily distinguished based on their TLS signature. The remarkable thing is that this information is all available upon the very first packet of the session to the server. The server can thus infer which client is connected even before responding back with any kind of data. Moreover, until encrypted client hello becomes the standard, any third-party listener on the network can infer this as well.
Methods for signature calculation
JA3
JA3 is a popular method used to formalize the notion of a TLS fingerprint. It takes a Client Hello packet and produces a hash identifying the client.
JA3 works by concatenating multiple fields of the Client Hello and then hashing them. The fields are:
SSLVersion,Cipher,SSLExtension,EllipticCurve,EllipticCurvePointFormat
For example, for a Chrome browser this would be:
771,39578-4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,23130-0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513-39578-21,39578-29-23-24,0
This is then hashed with MD5 to produce the JA3 signature:
e3501e1725c83830dd40f12930cc6eaa
JA3 is de-facto standard in this regard and has been integrated, for example, into Wireshark.
It is important to note that JA3 does not take into account all different parameteres in the Client Hello. This means that it is possible to have two different Client Hellos with the same JA3 signature3.
TS1
TS1 is my take on creating a unique hash per TLS signature. It was inspired by JA3 but is more comprehensive in that it encodes all the parameters of the TLS Client Hello message. I’ve created and used it myself while working on curl-impersonate.
TS1 encodes the parameters of the Client Hello message in JSON format according to certain rules:
{"client_hello": {"ciphersuites": [4865, 4867, 4866, 49195, 49199, 52393, 52392, 49196, 49200, 49162, 49161, 49171, 49172, 156, 157, 47, 53], "comp_methods": [0], "extensions": [{"type": "server_name"}, {"length": 0, "type": "extended_master_secret"}, {"length": 1, "type": "renegotiation_info"}, {"length": 14, "supported_groups": [29, 23, 24, 25, 256, 257], "type": "supported_groups"}, {"ec_point_formats": [0], "length": 2, "type": "ec_point_formats"}, {"length": 0, "type": "session_ticket"}, {"alpn_list": ["h2", "http/1.1"], "length": 14, "type": "application_layer_protocol_negotiation"}, {"length": 5, "status_request_type": 1, "type": "status_request"}, {"length": 10, "sig_hash_algs": [1027, 1283, 1539, 515], "type": "delegated_credentials"}, {"key_shares": [{"group": 29, "length": 32}, {"group": 23, "length": 65}], "length": 107, "type": "keyshare"}, {"length": 5, "supported_versions": ["TLS_VERSION_1_3", "TLS_VERSION_1_2"], "type": "supported_versions"}, {"length": 24, "sig_hash_algs": [1027, 1283, 1539, 2052, 2053, 2054, 1025, 1281, 1537, 515, 513], "type": "signature_algorithms"}, {"length": 2, "psk_ke_mode": 1, "type": "psk_key_exchange_modes"}, {"length": 2, "record_size_limit": 16385, "type": "record_size_limit"}, {"type": "padding"}], "handshake_version": "TLS_VERSION_1_2", "record_version": "TLS_VERSION_1_0", "session_id_length": 32}}
and then calculates its SHA1 hash to produce the TS1 signature:
889b4383dcfee0d3dc4c472d3d40568028842b3e
Different clients will have different hashes, and the hashes can be easily saved in a database for easy comparison of clients’ signatures.
TS1 signatures encode more parameters than JA3, therefore they represent a more accurate picture of the client. Another advantage is that due to the use of JSON, it is future-proof to additional TLS extensions that are not yet defined, and which may hold crucial client-identifying information in the future. The disadvantage of TS1 is that its JSON format is much more verbose than JA3’s simple format.
Where is TLS fingerprinting being used?
TLS fingerprinting is naturally used by anti-bot and anti-DDOS solutions to protect web pages against massive crawling or DDOS attacks. By checking if the client is a browser or a script (i.e. a bot), they can decide whether to allow the request, block it, or introduce an additional Javascript-based challenge to further test the client.
Another interesting use-case which got my attention, though I haven’t seen this by myself, is that of phishing campaigns. A phishing website will use TLS fingerprint to detect if the client is a browser or not. It will serve the phishy content to unsuspecting victims with a browser, but will block automatic crawling by security products attempting to identify phishing websites.
Controlling your TLS signature
Most of the parameters in the TLS client hello message are not controllable by scripts or command line tools. In Python, for example, you can control the cipher suites list, but it pretty much ends there. Even with that in place, the underlying TLS library may not send the exact list you specified, as is the case with Python and OpenSSL.
The best currently-available methods that I’m aware of to control the full TLS signature, are:
- Puppeteer, which allows you to run a headless Chrome browser and control it with a script. By using a real browser, you get the TLS signature of that browser.
- curl-impersonate, my own fork of the popular
curl
tool with support for faking TLS signatures to impersonate a few popular browsers. It also comes with a fork oflibcurl
, calledlibcurl-impersonate
, so you can programatically use it in your code. Another option is to injectlibcurl-impersonate
into an already running application using the regularlibcurl
. You can read about the technical aspects of curl-impersonate in my previous posts (part 1, part 2), and find more documentation in the GitHub repository. An advantage of curl-impersonate is that the correct HTTP/2 fingerprint will be used as well. More on this in the next post. - JA3Transport is a Go library that intends to fake JA3 signatures. I didn’t test it myself.
What’s next for TLS fingerprinting?
TLS fingerprinting has become extremely common throughout the web, and while it is used for legitimate purposes such as blocking DDOS attacks, it is also making the web less open, less private and much more restrictive towards specific web clients.
It is my impression that current tools for faking a client’s TLS signature are still immature. Using curl-impersonate
for example requires you to write your own C code or inject it into existing applications using libcurl.
The best solution would be for one of the TLS libraries to provide more fine-grained control for users. The kind of functionality that might be needed:
- Allowing users to control the order TLS extensions.
- Allowing users to control the exact list of ciphers.
- Supporting the latest TLS extensions that some browsers use.
When this happens, packages for popular programming language can emerge to take advantage of the functionality and to control their TLS signatures.
-
The large number of available TLS extensions can be seen at https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml. ↩
-
Chrome 101, Firefox 100, Safari 15.4, Python 3.8.10 with OpenSSL 1.1.1f and the requests library. ↩
-
For example, the parameters inside the TLS compressed-certificate extension are not taken into account. ↩