Every website needs data to function properly, this might be to display the latest price of products, report the current news from around the world, book a hotel, or simply report the latest weather forecast around every city, around the world.
Getting access to every data mentioned comes at a cost, which might be to source the data manually which is unrealistic considering the time it will take to get the data and then implement it in your website, and then update it when new information presents itself, or better still, such data can still be scraped by using popular scrapping methods such as Puppeteer, Cheerio or Playwright. This can sometimes not be effective as most companies do not want their data to be scraped by third-party websites. So what do we do? This is where a platform like Bright Data comes into play.
With Bright Data Companies or developers can get access to large datasets or APIs by utilizing the BrightData proxy network to scrape unscrapable data and return it in CSV or JSON format which can then be implemented on any website.
In this blog, we’ll be learning how to scrape data using the Bright Data Web Scraper IDE to scrape data from Ikea.
By the end of this blog, you should be able to scrape data using the Bright Data Web Scraper IDE.
Prerequisites
This tutorial assumes the following
- A knowledge of JavaScript
- A basic knowledge of web scraping
- How to use an API
Getting Started
To get started, head on to the Bright Data official website and sign up with Google.
From the Model Pop-up, select Continue with Google. This will navigate you to the Bright Data Dashboard.
Next, from the Dashboard, select the datasets and web scraper IDE panel
Next, from the Web Scraper IDE, click the Get Started button.
The Web Scraper IDE comes with premade API templates from large companies such as LinkedIn, Amazon, eBay, Walmart, YouTube, TikTok, Ikea, and a host of others that you can scrape and implement on your project.
Next, from the Model pop, scroll to the bottom hover on the Ikea Products, and click Use template.
Bright Data’s Web Scraper IDE is a JavaScript-integrated Development Environment (IDE) with ready-made functions and coding templates for developing scrapers quickly at scale and collecting publicly available online data in real-time.
Your Web Scraper IDE should look just like this.
Before we start scraping, let's break down the Web Scraper IDE to know how it works.
Web Scraper IDE Interface
The Web Scraper IDE interface is primarily, made of four sections;
Interaction code
Here you can write code to interact with the website. This is also where you should filter navigation to unwanted pages.
Also, the interaction code shows how Bright Data's Web Scraper IDE is interacting with Ikea’s website based on the template we choose then the result will be parsed down to the parse code.
Parser code
Here you can parse the HTML results from the interactions you have done.
The parser code takes the result from the interaction code and gets the specified values from Ikea’s website such as prices, image URLs, names, and model numbers.
We can then access this via the Debugging Tabs.
Debugging Tabs:
This section contains tabs for performing the following functions.
The selected input tab contains the input URL and the values returned by the parser code. We can copy and paste the link on our browser tab to see what the data is or we can quickly click the preview buttons.
For example, by clicking the preview button from the input tab, requests will be performed on the output tab, run log and browser networks as indicated below:
output: show properties and values of the data in a table format. These values can be downloaded with the download button at the bottom right corner.
Run log: all activities are logged here
Browser Networks: this shows the get request, with the status code of the once that passed and failed. However, the URL is also loaded on the browser preview section.
Browser Preview: Here we can preview the input activities we have scraped.
Once the debugging section has finished loading, the output is loaded on the browser preview for previewing.
We can check the HTML structure of the preview page by clicking the See HTML button or open the page in a new tab by clicking the Pop-out button.
To run the entire code, click on the finished button as indicated below.
This will run the entire code and make sure everything is working, then a collector will be generated (This is the list of datasets will want to run). Generating an API tokenBy now you should be navigated to the initiate manually panel. Now let's generate an API so that we can use it to run the script provided to get the scraped data.
Click on the > gear icon and select then select > account setting.
From the API token add and generate your token. You can learn more about how to generate your API token here.
Once you have generated your token, go back arrow and select the initiate by API tab.
Generating Scrape dataset via CLI interface.
Now you should have your API token, copy the Window CMD curl command by clicking the copy icon.
Next, paste the code on a Notepad and replace the API token text with the actual API generated.
Next, escape special characters in the URLs: Characters like &
, ?
, =
, and others should be properly encoded in the URLs. Use %26
for &
, %3F
for ?
, %3D
for =
, and so on.
You should get something like this after you have properly formatted your URL
Before
After
From the green highlighted command we have properly formatted the code command. Now copy and paste the entire command on your CLI as indicated below and hit enter
The result API section should be activated by now.
Copy and paste in Notepad, replace the API_TOKEN text with your actual API, and then paste it on your Windows CLI.
and if by any means you get the {"status":"building","message":"Dataset is not ready yet, try again in 10s"}
re-paste the code again.
Next, your API dataset should display on the command line.
Conclusion
Utilizing proxy network tools like Bright Data's can improve your website performance. It reduces the endless blocking experience that most developers have when trying to scrape data.
To scrape data without getting blocked, we discussed how Bright Data can be used to unblock and scrape publicly available data, how to use the Bright Data template, and generate your dataset.