The Scenario: A Competitor Price Bot That Lasted 10 Minutes
It was a Tuesday morning. My team had just finished a custom Python scraper to track competitor pricing on a major e-commerce site. We tested it locally with a few requests. Everything worked perfectly. We deployed it to a cheap VPS with a datacenter IP and walked away.
Ten minutes later, the error logs were full of 403s. The bot was completely blocked. All our data was useless.
The site didn’t just block us. They served us a fake empty page with no prices. Our entire monitoring system was blind. We needed a fix, and fast.
The Problem: Datacenter IPs Are Too Obvious
The root cause was simple. The e-commerce site was using a basic bot detection system that flagged any IP range belonging to a known cloud provider (AWS, DigitalOcean, Hetzner).
Datacenter IPs are easy to identify. They have clean, predictable network ranges, they don’t change often, and they are used by thousands of other scrapers at the same time. If you send 100 requests per minute from one of those IPs, you are screaming “I am a bot.”
The site’s anti-bot system wasn’t sophisticated. It just checked the IP against a blacklist of datacenter ranges. We were caught in the first batch.
What Went Wrong: The Specific Headers and Rate Limits That Got Us Banned
Our initial script had three fatal mistakes:
- No IP rotation. We fired all requests from a single datacenter IP.
- Perfect consistency. We sent requests every 30 seconds, on the dot. A human never clicks with that rhythm.
- Missing browser headers. We didn’t spoof a realistic
User-AgentorAccept-Languageheader.
The site’s first line of defense was a simple rate limiter. After about 30 requests, it returned a 429 (Too Many Requests). We ignored it and retried immediately. That triggered a permanent block.
The Step-by-Step Fix: Switching to a Residential Proxy Pool
Here is exactly what we did, in order.
Step 1: Stop the bot immediately. We paused the scraper to avoid getting the VPS IP permanently blacklisted.
Step 2: Choose a proxy type. Datacenter proxies were out. We needed IPs that looked like real home broadband connections. We evaluated two options: ISP proxies (fast, clean IPs from ISP partnerships) and true residential proxies (IPs from actual home devices, slower but more anonymous). For this project, we needed speed for price updates, so we went with a high-quality ISP proxy pool from a reputable provider.
Step 3: Configure rotation. We set the proxy pool to rotate the IP with every single request. This was critical. Even if one IP got flagged, the next request came from a completely different household.
Step 4: Fix the request headers. We added a realistic User-Agent (Chrome on Windows), a Accept-Language: en-US,en;q=0.9, and a random delay between requests (between 8 and 15 seconds).
Step 5: Add error handling. We wrote code to detect 403 and 429 responses. If a request failed, the bot would wait 60 seconds, switch to a new proxy IP, and retry up to 3 times.
Step 6: Test with a small batch. We ran the scraper on 50 product pages first. No blocks. Then 500. Still clean.
The Results: From 10 Minutes to 3 Months of Stable Data
After the fix, the bot ran for three months without a single block. We collected over 2 million price data points. The residential IPs were never flagged because each IP made very few requests per day. To the e-commerce site, they looked like real shoppers browsing products.
The cost was higher than datacenter proxies (about 3x more per GB of traffic), but the data was reliable. A broken bot costs more than a good proxy.
Lessons Learned: What I’d Do Differently Next Time
- Test the anti-bot system first. Before writing the full scraper, I should have sent a single request through a residential proxy to see if the site even blocked datacenter IPs.
- Start slow. We launched the bot at full speed. A gradual ramp-up (e.g., 10 requests per minute, then 20) would have been less suspicious.
- Monitor response quality. The site served a fake empty page instead of a 403. We nearly missed it. I now always check if the response contains actual product data.
- Don’t cheap out on proxies for production. Testing is fine on datacenter IPs. Production scraping of competitive data needs residential or ISP proxies.
Practical Checklist: Setting Up a Residential Proxy for Web Scraping
- [ ] Choose a provider that offers a pool of residential or ISP IPs (not just datacenter).
- [ ] Set rotation to change IPs per request or per batch.
- [ ] Add realistic browser headers (User-Agent, Accept-Language, Referer).
- [ ] Implement random delays (8–15 seconds is a good baseline).
- [ ] Write error handling for 403, 429, and empty responses.
- [ ] Start with a small test batch (50–100 requests) before scaling.
- [ ] Monitor response quality for fake or empty pages.
- [ ] Log all proxy IPs used to identify any that get blocked.
FAQ
Q: Are residential proxies legal to use for web scraping?
A: Yes, as long as you comply with the site’s terms of service and local laws. You are accessing public data, but you must respect rate limits and not overload servers.
Q: What’s the difference between residential and ISP proxies?
A: Residential proxies route traffic through actual home devices (slower, more anonymous). ISP proxies are IPs owned by an ISP but routed through a datacenter (faster, slightly less anonymous). For scraping, ISP proxies are often a good middle ground.
Q: Can I use free residential proxies?
A: Avoid them. Free proxies are slow, unreliable, often blacklisted, and can steal your data. Paid pools are worth the cost for any serious project.
Q: How many IPs do I need?
A: For a bot making 1000 requests per day, a pool of 50–100 IPs is usually enough. For high-volume scraping, you might need hundreds or thousands.





