How to easily scrap numerical data? (or find an API endpoint)

Brice Vergnou
4 min readJan 16, 2022

A quick note here: I’m going to use my latest project as an example here, which is a Pokedex based on strategy. You can have a look at the website here if you want.

Photo by Stephen Phillips - Hostreviews.co.uk on Unsplash

Context

So you are making a project based on data, either to analyze or make a model out of it. But after cleaning your data, you realize you lack some. Therefore, you seek data from various sources and finally find a website with the data you’re looking for, so you decide to scrap the data. You then open the dev tool and hover the div block to find the class name.

Picture from the author — Website Pikalytics

But quickly enough, you realize the only thing you are going to get is a headache because the class name does not make sense, or is non-existent.

“How am I supposed to both scrap the name and the value”

Step 1: Find the API endpoint

Most of the time, websites displaying numerical data retrieve it from an API. And when they do, it is surprisingly simple to find from where once you know where to look.

1.1 Open the dev tool

Yes, the same that gave us a headache a bit before. If you don’t know how to open it, right-click on the page and click on Inspect, or simply press F12.

1.2 Open the Network tab

This will show all the files loaded when you open the website (style sheets, scripts, images, or even the data we’re looking for ! ). If nothing shows up, just press F5 or CTRL+R.

1.3 Find the right file

If the website uses an API, it should be pretty easy as it is one of the only files which does not have a fancy name. If you hover it, you should be able to see an “api” somewhere in the URL.

You can also use the filter section and either type “ip.json” in the text bar, or select the “Fetch/XHR” option. In my case, it worked with the Fetch/XHR button because, as you can see, the type displayed is xhr (XMLHttpRequest).

1.4 Open the link

Just right-click your file and press “open in a new tab”. Then you should have the json-formatted data you were looking for!

If you want the data nicely displayed, you can add an extension for this purpose: JSON formatter for Chrome or JSONView for all browsers, but I didn’t test this one.

2. Retrieve the data

Great, you have the data you’ve been looking for, but how to retrieve it?
Well, there are uncountable ways of doing this depending on your programming language. Fortunately, there are quick and easy tutorials out there :

  • And more…

As JSON is a dictionary, you can turn several JSON files into an SQL database, a Pandas DataFrame….and merge it to the rest of the data you already had for your project.

Thanks a lot for reading this article, it is the first one I have actually written and would like to improve, so any feedback (either positive or negative) is welcome!

Connect with me

--

--