The online retail or e-commerce industry has seen a fast pace of growth since its launch. This boom is also witnessing the rise of 10-minute delivery models, revolutionizing how customers interact with e-commerce platforms.
This article will explore how to scrape data from e-commerce platforms using Python and the E-commerce Data API.
What is E-commerce Scraping?
E-commerce web scraping involves extracting publicly available data from e-commerce platforms such as Amazon, Walmart, and Flipkart. This data can be used to compare prices, track competitors, understand customer preferences, make data-driven decisions, and stand out in the fierce competition.
It offers various use cases for businesses to grow their digital presence, including price monitoring, market trends forecasting, price prediction, product data enrichment, and more.
So far, we have covered the basics of e-commerce scraping. Let us now explore how we can implement it with Python.
Is it legal to scrape E-Commerce platforms?
In short, it is legal to scrape e-commerce platforms as long as the data being extracted is publicly available. The data generally used by businesses includes product information, customer reviews, and pricing data, which is available to everyone and is completely legal to scrape since you are not accessing any private information of the platform or its users.
However, it is important to respect the website’s terms of service before extracting the data and to avoid overloading their servers with excessive requests, which can severely affect not only the website but also your data collection process.
Prerequisites
For those who have not installed Python, you can download it from here. After downloading Python, we will install the libraries we will use in this project.
pip install requests
So, now we are done with the setup. Let’s create a new file in our project folder and start the project.
Building the Scraper
To build our scraper, we need to first import the library we installed earlier.
import requests
As we will use the EcommerceAPI to retrieve the data, you will need an API Key from its dashboard to collect the data. If you haven’t registered already, you can sign up to get the API Key and 1000 free credits for testing purposes.
After successfully registering, you can add the API Key to your code.
api_key = "xxxx8977ac"
For the sake of this tutorial, we will be scraping Walmart.
Making an API request on EcommerceAPI is straightforward. You just need to pass the API key and the platform URL to scrape the results.
base_url = "https://api.ecommerceapi.io/walmart_search"
params = {
"api_key": api_key,
"url": "https://www.walmart.com/search?q=football"
}
Now that we have the base URL and parameters ready, we will establish an HTTP GET connection using Python’s Requests library.
response = requests.get(base_url, params=params)
print(response.json())
This will return you the meta information and the search results from the Walmart search page. However, we only need the list of products from the search results to access the pricing information.
If you examine the returned response, you will find that the products are within the search results array. Let’s access it.
data = response.json()
search_results = data.get('search_results', [])
print(search_results)
This will give you the following output:
Alternatively, you can loop through each item to retrieve the pricing and other details of the product.
# Extract and print the current_price for each product
for product in search_results:
for item in product['item']:
print(f"Product Title: {item['title']}")
print(f"Current Price: {item['current_price']}")
print('-' * 40)
Easier isn’t it? You don’t even need to parse complex HTML structures; the ready-made JSON data is available to you within seconds.
Here is the complete code:
import requests
api_key = "xxxx8977ac"
base_url = "https://api.ecommerceapi.io/walmart_search"
params = {
"api_key": api_key,
"url": "https://www.walmart.com/search?q=football"
}
response = requests.get(base_url, params=params)
data = response.json()
search_results = data.get('search_results', [])
print(search_results)
for product in search_results:
for item in product['item']:
print(f"Product Title: {item['title']}")
print(f"Current Price: {item['current_price']}")
print('-' * 40)
Conclusion
The web scraping community has developed various techniques to extract data from e-commerce platforms, making it easier than ever. Techniques include bypassing CAPTCHAs or other blocking mechanisms using configurations that help our IP address avoid getting blocked by the website.
However, if you need to perform this method at scale, relying on a single IP with basic infrastructure may not suffice. In such cases, using an e-commerce scraper API would be ideal. It helps you collect data at scale without facing obstructions and at an economical price.
In this article, we learned how to use Python for scraping e-commerce platforms. With this basic technique, you can develop your scraper to perform data extraction at scale.