Build awareness and adoption for your software startup with Circuit.

How to Scrape Walmart Data with Ease: A Step-by-Step Guide

How to seamlessly scrape and collect public web data from Walmart using Node.js, while avoiding website blocks.

Introduction

Web scraping is a powerful tool for extracting relevant information from websites and using it for various purposes. It has become an indispensable strategy for businesses looking to stay ahead in the competitive domain. Scraping data from e-commerce websites can provide a lot of insight to help you track market trends and pricing and help you understand customer preferences.

Walmart is one of the top players in the e-commerce domain, and its marketplace is a treasure trove of data. Properly scraping Walmart and utilizing the data can help you extract relevant information for analyzing trends, prices, or customer preferences.

Scraping data from e-commerce websites, however, is not as easy as it sounds. These websites implement a lot of anti-scraping mechanisms in the form of complex website structures to keep competitors or scrapers away from collecting data. However, choosing the right tool can help you overcome these challenges, ensuring an obstruction-free collection of data without any hindrance.

Nimble is a solution that can significantly reduce the effort required to scrape websites like Walmart. Nimble's in-built proxy network ensures you don't get detected as a bot, helping you seamlessly collect relevant data. In addition, Nimble has a dedicated scraper for E-commerce platforms, its E-commerce API, which comes equipped with AI-powered data structuring that accurately reads and organizes dynamic web data, ensuring a smooth structuring of the data.

Nimble API - Collect Public Data Effortlessly from Any Website

This article aims to guide you on leveraging Nimble to gather data from Walmart quickly and export the data to a CSV file. You will scrape a page of hair dryers and generate a CSV file containing the product ID, name, image URL, price, and rating from the scraped data.

But before diving into scraping the data, let's discuss the challenges you might face when scraping Walmart.

Challenges of Scraping Walmart Data

Scraping e-commerce websites like Walmart introduces challenges primarily because of sophisticated web architecture and strict anti-scraping measures. Understanding these challenges is crucial for any business or individual gathering data from these retail sites. Some of the significant problems you might face while scraping sites like Walmart are:

  • Complex Website Structure: Walmart's website is a complex maze of product pages, categories, and user reviews. Navigating this structure to access relevant data requires a scraper capable of handling deep and dynamic web pages.
  • Dynamic Content: The content on Walmart's website, including prices and product availability, is highly dynamic. It changes frequently, sometimes several times a day. Capturing real-time data accurately is a significant challenge for conventional scraping methods.
  • Anti-Scraping Technologies: Walmart employs advanced anti-scraping techniques to block or mislead scrapers. These include IP blocking, CAPTCHA challenges, and rendering important data through JavaScript, which is difficult for basic scrapers to process.
  • Data Volume and Diversity: The volume and variety of data on Walmart's website are overwhelming. Efficiently extracting and organizing this data into a usable format is daunting.

When coming up against such challenges, Nimble can prove to be a lifesaver. It's a robust web scraping tool designed to overcome these challenges. Nimble simplifies the process of data extraction from complex sites like Walmart by offering features such as:

  • Advanced Navigation Capabilities: Nimble can navigate complex website structures like Walmart's, ensuring comprehensive data extraction.
  • Real-time Data Processing: Nimble's E-commerce API allows real-time product data collection from various e-commerce platforms, making it adept at handling dynamic content.
  • Fast and Easy Integration: You can use any platform or language to start quickly with Nimble using its industry standard REST API for e-commerce.
  • Scalability and Flexibility: Nimble can handle large volumes of data efficiently, and it offers flexible data delivery options, delivering structured data directly to various storage solutions like S3/GCS buckets.
  • Batch Processing: Nimble supports batch processing, enabling the scraping of up to 1,000 URLs in a single request. This feature is particularly beneficial for collecting large amounts of data in a streamlined manner.

Get Started with Nimble

👉Learn more about the capabilities of Nimble's AI-powered scraper

Utilizing these features for scraping data from e-commerce websites like Walmart gives you a massive edge over traditional methods.

Now that you understand the features and how they can help you overcome the challenges, let's set up the tools required for this tutorial.

Tools and Setup for Scraping Walmart Data

This guide will help you leverage the power of Nimble's E-commerce API to extract data from Walmart. The API is a fully managed service that allows you to scrape data using a simple REST API interface. It has proxies built in to support robust unblocking. You can easily access data from sites like Amazon or Walmart using this without getting blocked.

To get started with the E-commerce API, you must have a Nimble account. You can sign up for a free trial from here (click on 'Start Free Trial' under the 'Free Trial' plan) and get 100 credits to start your web scraping journey.

For the code part, you'll be using Node.js. Setting up a project with Node.js is very straightforward. Just open a new directory with your favorite code editor and run this command from the terminal:

npm init -y 

Once you run the above command, it will initialize your project with NPM. Now, you are ready to install the necessary packages.

This article aims to scrape a specific type of product (hair dryers) and generate a CSV file containing the product name, product image, price, and the average rating of the product. Because of this, you'll need packages like HTTP clients and JSON to CSV parsers.

The primary packages required for the example are:

  • Axios: The HTTP client will help you send requests and process the response
  • @json2csv/node: ********For converting JSON to CSV file
  • Dotenv: To access the variables stored in the .env file

To install these two packages using NPM, run the following command in your terminal:

npm i axios @json2csv/node dotenv

Once the installation is complete, you are ready to start scraping.

How to Scrape Walmart Data?

Nimble provides an easy-to-use playground from the dashboard to help you get the scraped JSON data. You can visit the playground from the Nimble dashboard to get an idea of how the cURL will look and how the JSON response is structured.

You can choose the E-commerce API, paste the URL, click on the Run Request button, and get the output. You can also get an idea of how the Axios request should look by looking at the cURL command.

Here's a screenshot of the dashboard:

Nimble for Scraping Walmart

The Nimble API uses a basic authorization header. This authorization header is generated by converting the username and password to base64 format. The username and password must be in this format username:password to generate the correct authorization header.

Let's write some code now.

Inside the project directory, create a new file called app.js, and import and initialize the necessary packages here:

require("dotenv").config();

const axios = require("axios");
const fs = require("fs");
const { AsyncParser } = require("@json2csv/node");

You can use the built-in Buffer from Node to generate the authorisation token. Here's how to do it:

const token = Buffer.from(
  process.env.NIMBLE_USERNAME + ":" + process.env.NIMBLE_PASSWORD
).toString("base64");

The URL that will be scraped is this: https://www.walmart.com/search?q=hair+dryer&typeahead=hair+dry&page=1, and the page shows the list of hair dryers.

Walmart List of Hair Dryers

The endpoint for scraping e-commerce websites like Amazon or Walmart using Nimble is this: https://api.webit.live/api/v1/realtime/ecommerce.

Here's an example of a function that sends a request to this API to scrape data from Walmart:

async function scrapeWalmartData(credential, data) {
  const url = "https://api.webit.live/api/v1/realtime/ecommerce";

  try {
    const response = await axios.post(url, data, {
      headers: {
        Authorization: `Basic ${credential}`,
        "Content-Type": "application/json",
      },
    });
    return response.data;
  } catch (error) {
    throw error;
  }
}

The above scrapeWalmartData function takes two parameters:

  • credential: The authorization token for accessing the Nimble API. The token variable will be passed in this case.
  • data: The data object contains the necessary options for the API, including the URL to scrape along with other information like the format of the response, country, locale, etc.

Let's take a look at the usage of the above function:

const requestData = {
  parse: true,
  vendor: "walmart",
  url: "https://www.walmart.com/search?q=hair%20dryer&typeahead=hair%20dry",
  format: "json",
  render: true,
  country: "ALL",
  locale: "en",
};

(async () => {
  try {
    const data = await scrapeWalmartData(token, requestData);
    console.log(data);
  } catch (error) {
    console.error("Error:", error);
  }
})();

The above snippet uses an IIFE(Immediately Invoked Function Expression) code to run the scrapeWalmartData function. The token and the requestData object are passed to the function. For now, the data is only logged into the console. You'll get a JSON object in return when you run the command. This JSON object contains all the necessary information like the headers that are present, HTML content, query time, input URL, etc. This object also contains an object called parsing. This object holds the parsed data from the website. Here's a complete structure of the response:

Structure of Walmart Data

From the above image, you can see that the SearchResult key stores an array of objects. Each object inside this array holds the product details of the scraped URL. Here's how a product object looks in the case of Walmart data:

Structure of a Search Result

The object contains much more data, but the screenshot could capture only a portion.

To create a CSV file using the name, image, ID, price, and rating, you can update the IIFE code as follows:

(async () => {
  try {
    const data = await scrapeWalmartData(token, requestData);

    const fields = ["id", "name", "image", "price", "averageRating"];

    const parser = new AsyncParser({ fields });

    const csv = await parser
      .parse(data.parsing.entities.SearchResult)
      .promise();

    fs.writeFile("output.csv", csv, function (err) {
      if (err) throw err;
      console.log("CSV file successfully saved.");
    });
  } catch (error) {
    console.error("Error:", error);
  }
})();

In the updated code, an instance of the AsyncParser class is created with the name of parser, passing the fields array as a configuration option. This fields array contains the fields used to create the CSV file.

The parse method of the parser instance is called with data.parsing.entities.SearchResult as an argument because the array of objects is stored inside this key. The parse method returns a promise, which is then awaited using the await keyword. This ensures the code execution waits until the parsing is complete before moving forward. The resulting CSV data is stored in the csv variable.

Finally, the fs.writeFile function is called to write the csv data to a file named output.csv. An error will be thrown if there is an error during the file-writing process. Otherwise, a success message will be logged to the console. If any error occurs within the try block, the catch block will catch it, and the error message will be logged to the console. And you are all set.

Try to run the code now using the command:

node app.js

If the code runs successfully, you'll be able to see this message in the terminal:

$ node app.js
CSV file successfully saved.

You'll also find a new CSV file called output.csv in the root directory of your project. The CSV should look like this:

Walmart Product Data as CSV

You can find the code for this example in this gist. This CSV contains the ID of the product, the name, image URL, price, and rating of each of the products.

If you want to scrape multiple pages at once, you can either write a loop that scrapes the details of each page (which might not be the best solution) or use the power of batch processing that Nimble allows.

Utilizing a batch request significantly enhances workflow efficiency for scraping across multiple products. This approach consolidates what would otherwise be individual requests for each product into a singular, comprehensive request. It can handle up to 1,000 product URLs at once, ensuring a smooth and efficient data collection.

Moreover, the gathered data can be conveniently stored in your chosen cloud storage solution. You can read more about batch processing from Nimble's official documentation.

Conclusion

The potential uses of Walmart data are vast and varied. From market analysis, sentiment analysis, and competitive research to price optimization and trend forecasting, the insights from this data can significantly empower your business strategies.

With Nimble's advanced scraping capabilities, accessing and utilizing Walmart's rich data becomes simple and efficient. The tool's real-time data gathering, AI-powered parsing, and batch processing features, among others, make the complicated task of scraping large-scale e-commerce data manageable and effective. You can streamline your data acquisition process by choosing Nimble, ensuring that you stay ahead of your competitors.

Sign up now to get started with your scraping journey with Nimble.




Continue Learning