Analyzing a stock exchange's API
This was a fun afternoon reverse engineering project so I figured I’d write a bit about it.
I’m developing a web app, Pumbaa Backtester, which is a small tool to simulate the historical performance of index-based investments. As part of the development I wanted to fetch long-term historical data for an ETF traded at a medium-size stock exchange. I won’t write exactly which one, but if you are curious you’ll figure it out.
Each day a closing price for the ETF is determined, which is pretty much like the price of a stock at the end of the trading day. What I needed are closing prices since the ETF was created 22 years ago. Browsing a bit at the stock exchange’s site I got to the following form:
Great! This gives the data I want. The goal is to automate fetching these prices - I want it to be done automatically once a day. So let’s fire up Firefox network monitor (Ctrl+Shift+E) and see what happens when we press “Search”:
Looks simple enough - an API with the parameters isin
(unique id of the ETF), minDate
and maxDate
.
First attempts
If we attempt to access the API with curl:
we get back an empty JSON response. At this point the most likely possibility is that we are missing one of the HTTP headers, it can be a Cookie header or something else. Looking at the original request’s headers, everything is quite standard except for the trio Client-Date
, X-Client-TraceId
and X-Security
:
These are not documented on MDN so they must be something unique to this API. You could wonder if we could just send the exact same headers again, and yes it works for a few minutes, but then stops working. We’ll have to find out the logic behind them.
Client-Date
is simple enough, it’s just the current time. The other two are 16-byte hex encoded strings, so maybe they are just random UUIDs? Let’s try:
Nope, another empty JSON. There must be some logic then that generates these headers in Javascript.
Finding the origin
Searching for the string X-Client-TraceId
through the JS scripts that the page uses, we find the culprit:
The script main-es2015.3f13e42ead3dc41c6dc3.js
is a one-line, minified script, probably generated by webpack. Why would a page with a single form need 3MB of Javascript is really beyond me. Anyway, after beautifying it we can look at the snippet that generates the three headers:
At first I tried to approach this like a programmer, understanding where each variable comes from. But in a 90k-line script where everything is called t
, i
, and r
it’s quite impossible. It doesn’t help that the surrounding code looks like some form of alien code:
So let’s just use some common sense and go header-by-header:
Client-Date
This is the current time, converted to a string with Javascript’s toISOString()
function.
X-Security
Here is the snippet again for convenience:
We can guess that it’s a hash of the current time, after being converted to the format YYYYMMDDHHmm
. Which hash? The result is 16-byte long so the most probable candidate is md5. Let’s check:
Doesn’t match… maybe we need to use the local time instead?
It matches! So we got this header as well.
X-Client-TraceId
Here’s the relevant part again:
Leveraging what we found out already, this header is generated as follows:
- The current time,
e
, is concatenated to two unknown strings,t
andsalt
. X-Client-TraceId
is the md5 hash of the result.
Now the fastest thing to do is to use a Javascript debugger to find out what t
and salt
are.
The Firefox debugger (Ctrl+Shift+Z) lets us beautify the script and put a breakpoint on this line. Hitting “Search” again the breakpoint is triggered, and we can see the variables’ values:
So apparently:
t
is the requested URL, including the query string.salt
is a fixed string, in this casew4icATTGtnjAZMbkL3kJwxMfEAKDa3MN
. Apparently it appears in the source code as-is so it must be constant.X-Client-TraceId
is the md5 oftime + url + salt
.
Now we have all the information needed to generate valid requests to the API:
- Take the current time and hash it to generate
X-Security
. - Construct the URL with the parameters, add it to the time and salt and hash everything together to generate
X-Client-TraceId
.
And it works! Here is a Python snippet to generate the headers for a given URL:
Concluding
What was the purpose of these headers? I’m really not sure. It could be protection against bots or maybe a user-tracking mechanism. Anyway, it didn’t take much work to understand it. I guess if you are exposing your API on the internet, expect someone to figure it out and use it.