This is a continuation of the previous post. If you didn’t read it, please go ahead and read at least until the TL;DR section. In summary, various web services perform TLS fingerprinting to identify whether you run a real browser like Chrome or Firefox or whether it is a tool like
curl or a Python script. I created
curl-impersonate, a modified version of
curl that performs TLS handshakes which are identical to Firefox’s, thereby tricking said services to believe it is a real browser.
After uploading the repository I posted it to Hacker News. On the thread someone suggested that
They should really be impersonating Chrome. If this takes off, Firefox has such a small user share that I could see sites just banning Firefox altogether, like they do with Tor
- I re-compiled
curlwith BoringSSL, Chrome’s TLS library.
- I tweaked curl’s TLS code to perform a similar TLS handshake to Chrome, enabling some Google-specific TLS extensions on the way.
- This still being detected by TLS fingerprinters, I had to dive deeper into the encrypted session.
- Two small but crucial differences in the HTTP/2 frames revealed further how those fingerprinters work.
- I then patched the HTTP/2 code as well to impersonate Chrome.
- You can find the updated
curl-impersonate, with full Chrome 98 impersonation, in the GitHub repository.
Let’s look at the details.
The first part of impersonating a browser is using the same TLS library. Otherwise you are going to hit a wall of missing features and varying implementations as we shall see below. For Firefox I used NSS as mentioned in the previous post. Chrome uses BoringSSL, described as “a fork of OpenSSL that is designed to meet Google’s needs.”. At first, looking at Curl’s list of SSL libraries, I didn’t find BoringSSL and concluded that it was not supported. But it really is supported. You just replace OpenSSL with BoringSSL at build time and it works:
The full build procedure is in the Dockerfile.
The Client Hello message
The first message sent by TLS clients is called Client Hello. It contains a list of parameters and extensions, all of which can be used to fingerprint the client. For example, the ja3 method calculates a hash of some of them to create a unique fingerprint for each client. Our goal here is to match curl’s Client Hello and make it completely identical to Chrome’s. Here’s the important part of Chrome’s Client Hello message (Chrome 98, Windows 10, non-incognito):
Handshake Protocol: Client Hello Handshake Type: Client Hello (1) Length: 508 Version: TLS 1.2 (0x0303) Random: b46aad... Session ID Length: 32 Session ID: 74c03b... Cipher Suites Length: 32 Cipher Suites (16 suites) Compression Methods Length: 1 Compression Methods (1 method) Extensions Length: 403 Extension: Reserved (GREASE) (len=0) Extension: server_name (len=17) Extension: extended_master_secret (len=0) Extension: renegotiation_info (len=1) Extension: supported_groups (len=10) Extension: ec_point_formats (len=2) Extension: session_ticket (len=0) Extension: application_layer_protocol_negotiation (len=14) Extension: status_request (len=5) Extension: signature_algorithms (len=18) Extension: signed_certificate_timestamp (len=0) Extension: key_share (len=43) Extension: psk_key_exchange_modes (len=2) Extension: supported_versions (len=7) Extension: compress_certificate (len=3) Extension: application_settings (len=5) Extension: Reserved (GREASE) (len=1) Extension: padding (len=203)
The process of matching curl’s Client Hello consists of:
- Matching the Ciphers Suites list, by using curl’s built-in
- Enabling, disabling and modifying various extensions by modifying curl’s TLS code.
I detailed some of the process in the previous post, the main difference now being the use of BoringSSL instead of NSS. There were, however, some interesting Google-specific extensions to be dealt with.
As can be seen above, Chrome adds two extensions called
GREASE before and after the main extension list. Firefox doesn’t do that, and in fact I don’t think NSS even supports it. The purpose of GREASE is to ensure TLS servers are future-proof by mixing in non-existent extensions, expecting the servers to ignore them until they become supported. There is a good explanation in this Cloudflare blog post. To enable GREASE in
curl, all that was needed was to call a single function:
Because we are using the same BoringSSL implementation as Chrome, this adds the GREASE extensions at exactly the same place.
Chrome adds the
compress_certificate extension. This is how it looks like:
Extension: compress_certificate (len=3) Type: compress_certificate (27) Length: 3 Algorithms Length: 2 Algorithm: brotli (2)
Chrome is telling the server here that it supports receiving certificates compressed using the Brotli compression algorithm. Brotli was developed at Google and is the
br in the
Accept-Encoding: gzip, deflate, br HTTP header that most browsers send out today. Going through the Chromium source code we find that this TLS extension is enabled in cert_compression.cc. Again, it is a matter of a single line:
DecompressBrotliCert is a simple proxy function between BoringSSL and the Brotli library. Copying the one-liner and the function over to curl enables the
In the previous post I mentioned the ALPN extension which allows the client and server to decide whether to use HTTP/1.1 or HTTP/2 during the TLS handshake. It’s being used by both Firefox and Chrome. Google had taken this one step forward and suggested the ALPS extension, which allows the client to send its HTTP/2 SETTINGS during the TLS handshake (more about SETTINGS later). This is the
application_settings extension in the Client Hello. As of this writing, it is a non-standard TLS extension, but Google being Google, they love experimenting with our browsers and Chrome already adds it to its extension list. Here is the commit enabling ALPS in Chrome about a year ago.
In the end, it was again a matter of adding a one-liner to curl, and now curl supports ALPS as well1:
Comparing the TLS fingerprint
By the end of this process, the Client Hello is identical. Here is Chrome’s TLS fingerprint from ja3er.com:
And here is ours:
Remarkably, even with an identical TLS fingerprint, Protectify was still able to identify and block our dear
curl-impersonate (Protectify is the fake name of the company from the previous post). To understand how, we must dive deeper into the encrypted TLS session.
Decrypting the TLS session
To inspect what’s inside the TLS session we first need to capture it in Wireshark and decrypt it. This is easily done by defining the
SSLKEYLOGFILE environment variable. Both Chrome and Firefox would then write a keylog file to the specified location. You can then feed this file to Wireshark and it would decrypt the session for you. Handy!
Here’s how a decrypted Chrome session to wikipedia.org looks like:
The session begins as follows:
- Chrome sends the Client Hello message.
- The server responds with the Server Hello message.
- The server sends its certificate and the TLS handshake is done.
- The client and server immediately begin an HTTP/2 session (Remember ALPN?).
- Chrome sends a
- Chrome sends a
HEADERSframe with the
The SETTINGS frame
SETTINGS frame is used to notify the server about a few HTTP/2 specific settings. Here’s how it looks like in Chrome:
Stream: SETTINGS, Stream ID: 0, Length 30 ... Settings - Header table size : 65536 Settings - Max concurrent streams : 1000 Settings - Initial Windows size : 6291456 Settings - Max header list size : 262144 Settings - Unknown (10858) : 1359919199
Therein lies our first problem. Curl’s
SETTINGS look completely different:
Stream: SETTINGS, Stream ID: 0, Length 18 ... Settings - Max concurrent streams : 100 Settings - Initial Windows size : 33554432 Settings - Enable PUSH : 0
There are four notable differences:
- Curl is sending different values for its settings.
- Curl is missing
Header table sizeand
Max header list size.
- Curl disables HTTP/2 server push because the command line curl doesn’t support it. This sticks out like a sore thumb in the SETTINGS frame.
- Chrome throws in a random setting in the end (Shown as
Unknown). My guess is that this is another Google invention with similar purpose to TLS GREASE explained above.
The HEADERS frame
In HTTP/2, the
HEADERS frame combines the method (e.g. GET), the URI and the HTTP headers all into a unified format. Here’s Chrome’s
Stream: HEADERS, Stream ID: 1, Length 438, GET / ... Header: :method: GET Header: :authority: wikipedia.org Header: :scheme: https Header: :path: / ... (Regular HTTP headers follow)
It always begins with the pseudo-headers
:path whose meaning is clear. But here’s the funny thing. curl sends them out in a different order! Look:
Stream: HEADERS, Stream ID: 1, Length 434, GET / ... Header: :method: GET Header: :path: / Header: :scheme: https Header: :authority: wikipedia.org ...
This is completely fine from an HTTP standpoint, but is being leveraged to fingerprint our client. curl, Firefox, Chrome - each sends them out in a different order.
You can’t control the order of the pseudo-headers from the curl command line. It’s hard-coded into curl’s code, and it’s always the same. Luckily, the fix is simple and involves re-ordering them into the desired order.
After matching the TLS signature and the HTTP/2 signature,
curl-impersonate now behaves similarly enough to Chrome to trick TLS fingerprinters. In the repository you may find curl_chrome98, a wrapper script that launches
curl-impersonate with all the correct headers and flags to make it impersonate Chrome 98 on a Windows 10 machine.
Impersonating browsers is an endless cat-and-mouse game. The rapid release of new browser versions means TLS signatures change by the month. Tomorrow Chrome may come up with another Google-specific extension, or start using Encrypted Client Hello, or even turn on HTTP3 by default. Each such change will require a different set of modifications for
curl-impersonate to work.
curl adds the ALPS extension to the Client Hello. For ALPS to fully work the server needs to respond with an encrypted ALPS extension, and the client to send its application settings back (e.g. the HTTP2 SETTINGS frame). I couldn’t test how curl behaves in this situation as no server seems to support it right now, not even google.com. ↩