If you’re tasked with extracting data from an e-commerce website, whether as a data analyst or marketer, there are two approaches you might likely take:

Manual data gathering: This approach is time-consuming and tedious.
Web scraping: Writing a script for automated data extraction.

While web scraping is more efficient, it has its challenges. You’ll often need to inspect web pages to locate the relevant data, and frequent updates to website element classes or anti-scraping mechanisms can cause your scripts to break, requiring constant maintenance.

Robotic Process Automation (RPA) is a better alternative, which automates the entire data extraction process. Tools like Bright Data’s Web Scraper API — a prime example of RPA — can log into a designated URL, navigate through pages, extract specific data, and transform it into the desired format seamlessly.

What You’ll Learn in This Article

How to extract data from any website using the Web Scraper API.
How to collect product reviews and details from e-commerce platforms like Amazon.
How to use extracted data for competitor analysis.

What Is RPA and How Does It Work?

RPA involves deploying software bots that mimic human actions, such as clicking, typing, and navigating websites. Unlike traditional web scraping methods that directly query web elements, RPA bots interact with websites in the same way a human would, bypassing many anti-scraping defences. These bots rely on predefined rules and workflows to extract, organize, and store data efficiently.

For example, an RPA bot designed for e-commerce data extraction could:

Log in to an e-commerce platform.
Search for a specific product category or keyword.
Extract product details such as prices, reviews, and availability.
Export the collected data to a structured format like Excel or a database.

Bright Data has an RPA tool called the Web Scraper API, which you will use in this tutorial. The Web Scraper API simplifies web data extraction with advanced features such as automated IP rotation, CAPTCHA solving, and data parsing into structured formats. What sets it apart is its specialised capabilities, including bulk request handling, data discovery, and automated validation. These features, combined with technologies like Residential Proxies and JavaScript Rendering, make it a powerful tool for seamless and efficient data collection.

What are the benefits of using RPA over traditional web scraping for data extraction?

Dynamic Interaction: Bots can adapt to changes in website layouts or structures by following user-defined workflows, reducing the risk of failure when element classes are updated.
Scalability: Whether you’re extracting data from a single webpage or thousands of pages, RPA can handle large-scale operations without compromising accuracy.
Compliance-Friendly: Unlike aggressive scraping methods that may violate website terms of service, RPA mimics human browsing behaviour, making it a more compliant approach.
Low Maintenance: With proper configuration, RPA workflows require minimal updates, even if websites implement changes to their design or structure.

Using Bright Data’s RPA tool to collect product prices and reviews from Amazon

The Bright Data Web Scraper API provides a robust, scalable solution for extracting data from websites, including dynamic e-commerce platforms like Amazon. Here’s how you can use it to collect product prices and reviews.

Getting Started with the Web Scraper API

Before diving into the implementation, ensure you have the following:

Bright Data Account: Sign up at Bright Data.
API Key: Obtain your unique API key from the Bright Data dashboard to authenticate your requests.

Steps to Extract Data from Amazon Using the Scraper API

1. Define Your Target Data

Decide on the specific data points you want to extract, such as:

Product names
Prices
Customer reviews and ratings
Product descriptions

2. Find the Appropriate Scraper Template

Bright Data’s scraper marketplace offers prebuilt scrapers for various use cases. For this tutorial, you’ll use two templates:

“Amazon Reviews — collect by URL”
“Amazon Products — collect by URL”

If you don’t find a suitable prebuilt scraper, you can create one using the no-code or coding method.

Using the “Amazon Reviews — Collect by URL” Template

Access the Template:

Open the Bright Data marketplace and select the “Amazon Reviews — collect by URL” template.
Choose Scraper API and proceed.
2. Set Up Data Collection:
Add the Amazon product URL in the “Data Collection APIs” section.
Paste your API token and copy the generated cURL request.
3. Fetch Your Snapshot ID:
Use Postman to create a new POST request. Paste the cURL request and send it.
Copy the returned snapshot ID for further steps.
A snapshot ID is a unique identifier used by the Bright Data Web Scraper API to reference a specific dataset or data snapshot. It allows you to retrieve consistent and structured data that has already been processed and stored by the API. So, instead of fetching raw data directly from a website in real time, the snapshot ID ensures access to pre-scraped data, reducing latency and improving efficiency.

4. Retrieve Reviews:

In the Bright Data dashboard for the Amazon review, paste your snapshot ID and copy the cURL request.
In Postman, create a new GET request and paste the cURL request with the snapshot ID.
Send the request and inspect the response to confirm successful data extraction.
The response will include essential data such as:
Review ID
Rating
Author Name
Review Header
Review Text
Review Date
Verified Purchase
Helpful Count

Using the “Amazon Products — Collect by URL” Template

Repeat the same steps as above for the “Amazon Products — collect by URL” template. The response will include essential data such as:

ASIN
Title
Brand
Price
Ratings
Reviews count
Availability
Seller Name
With the Web Scraper API, you don’t need to worry about changes to website structures or anti-scraping mechanisms. Your data remains accessible whenever you need it.

Before proceeding to the next step, ensure you have your snapshot ID for both the review and products.

Building a Product Analysis System using the Web Scraper API

Step 1: Set Up the Environment

Install Node.js Make sure you have Node.js installed. You can download it from Node.js official site. Verify installation:

node -vnpm -v

2. Create a Project Directory

Create and navigate to a new project directory:

mkdir product-analysis-systemcd product-analysis-system

Open the project with your preferred code editor

3. Initialise the Project

Initialise a new Node.js project:

npm init -y

This will create a package.json file.

4. Install Required Packages

Add the necessary dependencies:

npm install axios winston moment danfojs-node dotenv

5. Set Up Environment Variables

Create a .env file in the project directory to store your Bright Data API key:

BRIGHT_DATA_API_KEY=your_api_key_here

Step 2: Understand the Project Structure

Core Functionalities

Fetching Product Data: The system interacts with the Bright Data API to retrieve product reviews and details.
Data Processing: Converts raw data into structured objects for analysis.
Data Analysis: Performs analysis like rating distribution, purchase verification stats, and review trends.
Report Generation: Summarizes the analysis into a report.

Step 3: Code Walkthrough

Environment Configuration

require('dotenv').config();

Loads the .env file for securely managing sensitive information like the API key.

2. Logger Setup

Using the winston library for logging:

const winston = require('winston');
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' }),
  ],
});

Uses the winston library to log information, errors, and warnings.
Logs are stored in error.log and combined.log files.

3. Data Models

ProductReview Class: Defines and structures review data:

class ProductReview {
  constructor({ review_id, rating, author_name, review_header, review_text, review_posted_date, is_verified, helpful_count }) {
    this.review_id = review_id;
    this.rating = rating;
    this.author_name = author_name;
    this.review_header = review_header;
    this.review_text = review_text;
    this.review_date = moment(review_posted_date, 'MMMM DD, YYYY').toDate();
    this.verified_purchase = is_verified;
    this.helpful_count = helpful_count;
  }
}

Models each review as an object.
Converts review_posted_date to a JavaScript Date object.

ProductDetails Class: Models product details:

class ProductDetails {
  constructor({ asin, title, brand, final_price, rating, reviews_count, availability, seller_name }) {
    this.asin = asin;
    this.title = title;
    this.brand = brand;
    this.price = parseFloat(final_price);
    this.rating = parseFloat(rating);
    this.reviews_count = reviews_count;
    this.availability = availability;
    this.seller_name = seller_name;
  }
}

Models product details like title, price, and reviews_count.

4. Data Fetching

DataFetcher Class — Handles API requests:

class DataFetcher {
  constructor(baseUrl, apiKey) {
    this.baseUrl = baseUrl;
    this.axiosInstance = axios.create({
      baseURL: baseUrl,
      timeout: 10000,
      headers: {
        Authorization: `Bearer ${apiKey}`,
        Accept: 'application/json',
      },
    });
  }

  async makeRequest(endpoint) {
    try {
      const response = await this.axiosInstance.get(endpoint);
      return response.data;
    } catch (error) {
      logger.error(`API request failed: ${error.message}`);
      throw new Error(`Failed to fetch data: ${error.message}`);
    }
  }

  async getProductReviews(snapshotId) {
    return this.makeRequest(`/datasets/v3/snapshot/${snapshotId}?format=json`);
  }

  async getProductDetails(snapshotId) {
    return this.makeRequest(`/datasets/v3/snapshot/${snapshotId}?format=json`);
  }
}

Fetches data from the Bright Data API using axios.
Requires snapshot IDs (s_m60gwuw4181n8ikiob and s_m5y480dm2l7109x73g in this case) to get reviews and details.

5. Data Processing

DataProcessor Class — Converts raw API data into usable formats:

class DataProcessor {
  static processReviews(rawReviews) {
    return rawReviews.map((review) => {
      try {
        return new ProductReview(review);
      } catch (error) {
        logger.warn(`Error processing review: ${error.message}`);
        return null;
      }
    }).filter((review) => review !== null);
  }

  static processProductDetails(rawDetails) {
    try {
      return new ProductDetails(rawDetails[0]);
    } catch (error) {
      logger.error(`Error processing product details: ${error.message}`);
      throw error;
    }
  }
}

Converts raw API data into ProductReview and ProductDetails objects.

6. Data Analysis

DataAnalyzer Class — Analyzes review trends and statistics using danfojs-node:

class DataAnalyzer {
  constructor(reviews, productDetails) {
    this.reviews = reviews;
    this.productDetails = productDetails;
    this.reviewsDataFrame = new DataFrame(
      reviews.map((review) => ({
        ...review,
        review_date: review.review_date.toISOString(),
      }))
    );
  }

  getRatingDistribution() { /* Calculate rating counts */ }

  getVerifiedPurchaseStats() { /* Calculate verified vs. unverified reviews */ }

  getReviewTrends() { /* Trends by month */ }
}

Performs statistical analysis using danfojs-node.

7. Report Generation

ReportGenerator Class — Creates a comprehensive analysis report:

class ReportGenerator {
  constructor(analyzer) {
    this.analyzer = analyzer;
  }

  generateSummaryReport() {
    return {
      product_summary: { /* Summary of product details */ },
      review_analysis: { /* Summary of review data */ },
    };
  }
}

Combines all analyses into a comprehensive report.

8. Main Function

The function brings everything together:

async function main() {
  const fetcher = new DataFetcher('https://api.brightdata.com', process.env.BRIGHT_DATA_API_KEY);
  const [rawReviews, rawProductDetails] = await Promise.all([
    fetcher.getProductReviews('s_m60gwuw4181n8ikiob'),
    fetcher.getProductDetails('s_m5y480dm2l7109x73g'),
  ]);

  const reviews = DataProcessor.processReviews(rawReviews);
  const productDetails = DataProcessor.processProductDetails(rawProductDetails);

  const analyzer = new DataAnalyzer(reviews, productDetails);
  const reportGenerator = new ReportGenerator(analyzer);

  const report = reportGenerator.generateSummaryReport();
  console.log(JSON.stringify(report, null, 2));
}

Fetches, processes, analyses and generates the report.

Step 4: Run the Program

Ensure the .env file contains your API key.
Run the program:

node index.js

3. Review the output in the console or logs for errors.

You can find the complete code for this tutorial here. Feel free to extend this project, for example, you can design a dashboard for these tutorials or incorporate AI to help with sentiment analysis.

Final Thoughts

By using RPA tools like Bright Data’s Web Scraper API, you can simplify the data extraction process, overcome the challenges of traditional web scraping, and focus on utilising the extracted data for strategic business decisions.

You can sign up to Bright Data to test the Web Scraper API for free now!

Web Data Extraction Simplified: Using RPA to Collect Product Prices and Reviews

Access e-commerce data without inspecting websites or battling anti-scraping mechanisms using Bright Data’s Web Scraper API.

What You’ll Learn in This Article

What Is RPA and How Does It Work?

What are the benefits of using RPA over traditional web scraping for data extraction?

Using Bright Data’s RPA tool to collect product prices and reviews from Amazon

Getting Started with the Web Scraper API

Steps to Extract Data from Amazon Using the Scraper API

Using the “Amazon Reviews — Collect by URL” Template

Using the “Amazon Products — Collect by URL” Template

Building a Product Analysis System using the Web Scraper API

Step 1: Set Up the Environment

Step 2: Understand the Project Structure

Step 3: Code Walkthrough

Step 4: Run the Program

Final Thoughts

Comments

Promote your content

Join our developer community

Main Menu

Web Data Extraction Simplified: Using RPA to Collect Product Prices and Reviews

Access e-commerce data without inspecting websites or battling anti-scraping mechanisms using Bright Data’s Web Scraper API.

What You’ll Learn in This Article

What Is RPA and How Does It Work?

What are the benefits of using RPA over traditional web scraping for data extraction?

Using Bright Data’s RPA tool to collect product prices and reviews from Amazon

Getting Started with the Web Scraper API

Steps to Extract Data from Amazon Using the Scraper API

Using the “Amazon Reviews — Collect by URL” Template

Using the “Amazon Products — Collect by URL” Template

Building a Product Analysis System using the Web Scraper API

Step 1: Set Up the Environment

Step 2: Understand the Project Structure

Step 3: Code Walkthrough

Step 4: Run the Program

Final Thoughts

Comments

Promote your content

Join our developer community