{"id":6507,"date":"2023-10-18T14:47:43","date_gmt":"2023-10-18T14:47:43","guid":{"rendered":"https:\/\/royadata.io\/blog\/?p=6507"},"modified":"2023-10-18T14:47:43","modified_gmt":"2023-10-18T14:47:43","slug":"how-to-prevent-proxy-banned","status":"publish","type":"post","link":"http:\/\/royadata.io\/blog\/how-to-prevent-proxy-banned\/","title":{"rendered":"How to Avoid Proxies Get banned or blocked? (2022)"},"content":{"rendered":"<blockquote>\n<p>You don\u2019t want to get your proxies banned when harvesting data or web scraping, Right? How your proxy IPs are detected and Can we avoid proxies being flagged? of course!<\/p>\n<\/blockquote>\n<p><picture class=\"aligncenter size-full wp-image-861 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked.jpg.webp 926w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-300x165.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-768x422.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20926%20509'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 926px) 100vw, 926px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20926%20509'%3E%3C\/svg%3E\" alt=\"how to avoid proxy ban\" width=\"926\" height=\"509\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked.jpg 926w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-300x165.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-768x422.jpg 768w\" data-sizes=\"(max-width: 926px) 100vw, 926px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-861\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked.jpg.webp 926w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-300x165.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-768x422.jpg.webp 768w\" sizes=\"(max-width: 926px) 100vw, 926px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked.jpg\" alt=\"how to avoid proxy ban\" width=\"926\" height=\"509\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked.jpg 926w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-300x165.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/how-to-avoid-proxies-get-blocked-768x422.jpg 768w\" sizes=\"(max-width: 926px) 100vw, 926px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Pretty much any time you\u2019re using high-quality proxies in any significant number, you\u2019re doing it because you want to use some kind of bot. You\u2019re harvesting data, you\u2019re performing bulk search queries, or something of the sort.<\/p>\n<p>This is all perfectly legitimate, of course. It\u2019d be a different story if you were trying to use that proxy list and set of bots to DDoS someone, but that\u2019s just not a good idea. For one thing, it\u2019d be a very mediocre, ine\ufb00ective DDoS. You really need a <a href=\"https:\/\/royadata.io\/blog\/botnet\/\">botnet<\/a> for that.<\/p>\n<p>Anyways, the point is, <strong>you don\u2019t want to get your proxies banned while you\u2019re in the middle of using them to harvest data<\/strong>. Your data will end up incomplete and, in instances where the data changes frequently, you\u2019ll end up with an unusable table. By the time you\u2019ve set up new proxies to harvest the rest, the first chunk may have changed.<\/p>\n<p>That\u2019s not always the case. Still, it\u2019s universally an annoyance at best when a proxy IP gets banned out from under you. It prevents the smooth operation of your task, it drags you out of whatever else you were doing to fix it, and it wastes time. So, why not take steps to avoid getting those IPs banned in the first place?<\/p>\n<p>To understand ban avoidance, <strong>you need to know how proxy IPs are detected in the first place<\/strong>. Think about what, to a site like Google or Amazon, ends up looking like a red flag.<\/p>\n<ul>\n<li><em>\u00a0A bunch of similar queries coming in all at once.<\/em><\/li>\n<li><em>\u00a0A bunch of similar queries coming in from the same identified browser.<\/em><\/li>\n<li><em>\u00a0A bunch of similar queries coming from irrelevant geolocations.<\/em><\/li>\n<li><em>\u00a0A bunch of queries searching using high-risk terms.<\/em><\/li>\n<\/ul>\n<p>These are the sorts of actions that get an IP flagged, but they\u2019re also the sorts of actions you might be performing.<\/p>\n<p>If you wanted to scrap the top 10 pages of Google search results to analyze the titles of blog posts for a certain search term, all on one website, you\u2019d want to use the site: operator, right? Operators like that may eventually <a href=\"https:\/\/support.google.com\/a\/answer\/1217728?hl=en\">trigger captchas<\/a>, and failure can get an IP blocked.<\/p>\n<p><picture class=\"aligncenter size-full wp-image-503 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned.jpg.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-300x149.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-768x380.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20396'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 800px) 100vw, 800px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20396'%3E%3C\/svg%3E\" alt=\"Prevent Your Proxies from Being Banned\" width=\"800\" height=\"396\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned.jpg 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-300x149.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-768x380.jpg 768w\" data-sizes=\"(max-width: 800px) 100vw, 800px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-503\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned.jpg.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-300x149.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-768x380.jpg.webp 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned.jpg\" alt=\"Prevent Your Proxies from Being Banned\" width=\"800\" height=\"396\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned.jpg 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-300x149.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Prevent-Your-Proxies-from-Being-Banned-768x380.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Let\u2019s talk about the various steps you should take to avoid being flagged, shall we?<\/p>\n<hr\/>\n<h2 id=\"set-a-unique-user-agent-for-each-ip\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Set_a_Unique_User-Agent_for_Each_IP\"><\/span>Set a Unique User-Agent for Each IP<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/royadata.io\/blog\/user-agent\/\">A user agent<\/a> is part of a data string, a header, that accompanies communications from your computer to the server of the website you visit.<\/p>\n<p>The user agent includes some anonymous information about your configuration; essentially, just your language and the browser edition you\u2019re running. They will often include Windows version as well, and sometimes other data.<\/p>\n<p>Someone using an up to date Chrome installation in English will have the same user agent<br \/>\ndata as someone else using the same software. Someone using the same version of Chrome but in French will get a slightly di\ufb00erent user agent.<\/p>\n<p><picture class=\"aligncenter size-full wp-image-502 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent.jpg.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-300x155.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-768x397.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20414'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 800px) 100vw, 800px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20414'%3E%3C\/svg%3E\" alt=\"Get a slightly different user agent\" width=\"800\" height=\"414\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent.jpg 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-300x155.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-768x397.jpg 768w\" data-sizes=\"(max-width: 800px) 100vw, 800px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-502\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent.jpg.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-300x155.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-768x397.jpg.webp 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent.jpg\" alt=\"Get a slightly different user agent\" width=\"800\" height=\"414\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent.jpg 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-300x155.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Get-a-slightly-different-user-agent-768x397.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The problem with the user agent is that it\u2019s an identifying piece of information, no matter how anonymous it is. If Google sees 10 search queries performed on the same second, all from the same two-updates-back version of Firefox, all looking for the same sort of information, it can reasonably assume that those 10 queries are part of one query made by 10 bots.<\/p>\n<p>User-agent information can vary from connection to connection, and from bot to bot. <strong>You can change it up personally to <a href=\"https:\/\/www.privateproxyreviews.com\/luminati-multilogin\/\">configure each of your proxies to use a di\ufb00erent user agent<\/a><\/strong>. This further obfuscates the connection between each of them, so it looks more like legitimate traffic. The more you can avoid patterns, the better o\ufb00 you are.<\/p>\n<p>The Electronic Frontier Foundation did an interesting study on how identifying this \u201canonymous\u201d information can really be. You can see some examples of user-agent strings, and what kind of information they convey, <a href=\"https:\/\/www.eff.org\/deeplinks\/2010\/01\/tracking-by-user-agent\"><strong>in their post here<\/strong><\/a>.<\/p>\n<hr\/>\n<h2 id=\"avoid-high-risk-geolocations\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Avoid_High-Risk_Geolocations\"><\/span>Avoid High-Risk Geolocations<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/royadata.io\/blog\/ip-address\/\">IP addresses<\/a> are just that; addresses. They are identifying information about the origin of the connection being received. I can tell by the IP address alone what country a user is coming from.<\/p>\n<p>Now, a proxy server filters that, obviously. <strong>It changes your IP by essentially becoming a middleman in the communications<\/strong>. I can be in California, sending a connection to New York, but if I use a proxy IP in Algeria, that server in New York will see tra\ufb03c coming from Algeria. They can\u2019t see beyond the proxy server, so they don\u2019t see that I\u2019m actually in California.<\/p>\n<p>Now, Algeria may seem like a <a href=\"https:\/\/www.maxmind.com\/en\/ip-authentication-with-ip-geolocation-proxy-detection\"><strong>strange source for tra\ufb03c<\/strong><\/a>, and it is. Tra\ufb03c from strange locations is a warning sign of many things, from proxy usage to fraud. If you\u2019ve ever gotten a phone call claiming to be from your bank, but with a Nigerian Prince on the other end, you know how big of an issue this kind of spoofed communication can be.<\/p>\n<p>The solution to this problem is to use <a href=\"https:\/\/royadata.io\/blog\/best-proxy-services\/\"><strong>high-quality proxies<\/strong><\/a> in non-high-risk countries. Ditch the Russian, the Ukrainian, and the middle eastern proxies.<\/p>\n<p>Instead, opt for proxies that tend to originate from North America or Western Europe. These areas are much more likely to be browsing local sites than people from Russia.<\/p>\n<p>Always try to consider the service area of the site you\u2019re targeting. If you\u2019re trying to harvest data from Google, try to avoid using proxies from a location that has its own version of Google.<\/p>\n<p>Related:\u00a0<a href=\"https:\/\/royadata.io\/blog\/why-the-harvester-on-your-scrapebox-isnt-working\/\">Why the Harvester on Your ScrapeBox Isn\u2019t Working<\/a>?<\/p>\n<p>Yes, a lot of people will still use the main .com version of Google rather than the non-American version, but it\u2019s still one more warning sign. This alone won\u2019t get your proxy banned, most of the time, but combined with other signals it can be a deciding factor.<\/p>\n<hr\/>\n<h2 id=\"set-a-native-referrer-source\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Set_a_Native_Referrer_Source\"><\/span>Set a Native Referrer Source<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The referrer is a different sort of information, but again, it\u2019s another piece of information you\u2019re giving the site that receives your tra\ufb03c. As with the above, any information you send can be used to identify what you\u2019re doing.<\/p>\n<p><picture class=\"aligncenter wp-image-860 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer.jpg.webp 770w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-300x221.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-768x566.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20621%20458'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 621px) 100vw, 621px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20621%20458'%3E%3C\/svg%3E\" alt=\"referrer\" width=\"621\" height=\"458\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer.jpg 770w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-300x221.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-768x566.jpg 768w\" data-sizes=\"(max-width: 621px) 100vw, 621px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-860\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer.jpg.webp 770w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-300x221.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-768x566.jpg.webp 768w\" sizes=\"(max-width: 621px) 100vw, 621px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer.jpg\" alt=\"referrer\" width=\"621\" height=\"458\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer.jpg 770w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-300x221.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/referrer-768x566.jpg 768w\" sizes=\"(max-width: 621px) 100vw, 621px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>In this case, a referrer is where the site thinks you came from. If I go to a new tab in my browser and type in <strong>www.google.com<\/strong> and hit enter, that shows up as direct traffic with no referrer.<\/p>\n<p>That\u2019s fine, but it works best for just homepages like that. If I type in a full search query string, that\u2019s a lot less plausible. Google would expect the people landing on a results page to be coming from their homepage, so showing it as direct traffic is a warning sign.<\/p>\n<p>Likewise, if you\u2019re <strong><a href=\"https:\/\/royadata.io\/blog\/scrape-amazon\/\">scraping data from Amazon<\/a>,<\/strong> they would expect you to be referred by Amazon, not direct traffic.<\/p>\n<p>The worse issue is if your referrer somehow gets set to some other site or even your own site. Then Google or Amazon or whoever will be able to see a bunch of di\ufb00erent queries coming in very quickly, all referred by your site. That makes it painfully easy to identify as <a href=\"https:\/\/royadata.io\/blog\/bot-traffic\/\">bot traffic<\/a> and makes it very likely that they will block it.<\/p>\n<p>The solution here is to make sure your referrer is set to be native and sensible to the location you\u2019re querying. If you\u2019re sending a bunch of traffic to various search results pages on Google, you want it to look like your tra\ufb03c is coming from <strong>www.google.com<\/strong>, so that\u2019s what you should set.<\/p>\n<hr\/>\n<h2 id=\"set-a-rate-limit-on-requests\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Set_a_Rate_Limit_on_Requests\"><\/span>Set a Rate Limit on Requests<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><strong>Rate limits are perhaps the number one tip to avoiding having your proxies blocked<\/strong>. One of the dirty little secrets about the internet is that websites, in general, don\u2019t care all that much about bots. There are a lot of bots floating around.<\/p>\n<p>Google\u2019s search spiders are bots, as are all the other spiders for all the other search engines out there. There are bots going around searching for common security holes. There are bots just browsing content and clicking links. There are malicious bots and benign bots, and for the most part, they just exist.<\/p>\n<p>The only time bots become an issue is when they start to cause trouble. Malicious bots attempting to brute force a web login is one example. That right there illustrates why rate limits are a good thing.<\/p>\n<p>Think about it; when a bot is making 10 requests a second, it\u2019s trying to do something either in bulk or very quickly. <strong>Legitimate bots like Google\u2019s scrapers don\u2019t need to be in that kind of hurry<\/strong>. Malicious bots can be caught and blocked at any time, so you need to hurry to try to get in first.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/proxies-for-scraping-google\/\">Proxies for Preventing Bans and Captchas When Scraping Google<\/a><\/li>\n<\/ul>\n<p>With your own data scraping, chances are you\u2019re not trying to be malicious. You want to harvest your data quickly, though, because the longer it takes to harvest the volumes you\u2019re scraping, the longer it takes to complete your project.<\/p>\n<p>By <a href=\"https:\/\/developer.rackspace.com\/blog\/rate-limiting-with-repose-the-restful-proxy-service-engine\/\"><strong>implementing a rate limit<\/strong><\/a>, what you\u2019re doing is telling the web server that even if you look like a bot, you\u2019re not trying to do anything malicious. You\u2019re not worth watching. Heck, you might not even be a bot at all. It\u2019s that element of plausible deniability that keeps your proxies safe.<\/p>\n<hr\/>\n<h2 id=\"run-requests-asynchronously\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Run_Requests_Asynchronously\"><\/span>Run Requests Asynchronously<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This is the other tip I would say is tied for #1 most useful. If you have 100 queries to make, and you have 10 bots on 10 proxies to do it, you might think that you would just send 1 query per bot per second, have the whole thing over in as long as it takes for the server to respond, and you\u2019re good to go.<\/p>\n<p>From the perspective of the server, though, that\u2019s 10 nearly identical queries arriving instantly. That\u2019s a huge warning sign because no legitimate user browses in that fashion. Real people \u2013 what you\u2019re trying to imitate, more or less \u2013 browse from one item to the next to the next.<\/p>\n<p>If you have 10 bots, then, what you should be doing is staggering them out so there is a 1-2 second delay in between queries. Ten bots should look like ten individual users with different browsing habits, not as ten identical users offset by a second from each other.<\/p>\n<p>The problem here is one of the patterns. Every form of anti-fraud and anti-bot in the world is attempting to detect how bots are di\ufb00erent from people, and generally, that\u2019s patterns.<\/p>\n<p>When your bots are operating in a pattern, particularly if it\u2019s a lot happening in a short span of time, it becomes easier to detect. The more volume, the more rigid the pattern, the easier it is.<\/p>\n<p>Asynchronous requests, combined with rate limits, stretch out those patterns so they get lost in the noise. Change up identifying information so even that\u2019s not the same, and the patterns can almost disappear.<\/p>\n<p>You may be like:\u00a0<a href=\"https:\/\/royadata.io\/blog\/proxies-for-followliker\/#the-best-proxy-setting-for-followliker\">What is the best proxy setting for followliker?<\/a><\/p>\n<hr\/>\n<h2 id=\"avoid-red-flag-search-operators\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Avoid_Red_Flag_Search_Operators\"><\/span>Avoid Red Flag Search Operators<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Google has a lot of search operators, but some of them require more caution to use than<\/p>\n<p>others. For example, a normal query, you can tab back through pages all day with no issue. Performing site searches can get you hit with a captcha trap, like what happened to <a href=\"https:\/\/nakedsecurity.sophos.com\/2016\/12\/06\/are-you-human-or-a-bot-googles-invisible-recaptcha-will-decide\/\"><strong>this guy searching for LinkedIn resumes<\/strong><\/a>. Searching with the intitle or inurl parameters is even worse, typically because those operators are used to find pirated material.<\/p>\n<p>If at all possible, try to avoid using search operators when you\u2019re running bot bulk searches via proxies. They can be a red flag that greatly emphasizes the problems other parts of this list bring up.<\/p>\n<p>If you can\u2019t avoid using search operators, like you\u2019re searching a specific site or searching a specific character string in URLs, you will need to take the previous steps and turn them up to 11.<\/p>\n<p>Use longer timers, run even more asynchronously, pick better locations, and so forth. In fact, it might even be better to use more proxies, so you can use more user agents and di\ufb00erent configurations to harvest your data, to further minimize the risk of getting caught.<\/p>\n<p>You may be like:\u00a0<a href=\"https:\/\/royadata.io\/blog\/how-to-scrape-linkedin-using-proxies\/\">How to Scrape Data from Linkedin Using Proxies<\/a><\/p>\n<hr\/>\n<h2 id=\"rotate-through-a-proxy-list\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Rotate_Through_a_Proxy_List\"><\/span>Rotate Through a Proxy List<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This is another great tip, because it further minimizes patterns. <strong>If it\u2019s hard to detect a pattern across five bots on five proxies, use ten<\/strong>. If it\u2019s harder to detect with ten, use 20. Of course, if you\u2019re trying to use 20 all at once, you run into issues with the volume of similar queries establishing a di\ufb00erent pattern. So what you do is run, instead of 20 all at once, one set of 5, then another, then another, then another, then back to the first.<\/p>\n<p><strong>A rotating proxy list, if the list is su\ufb03ciently long enough, will minimize the number of duplicates you have<\/strong>, making it even harder to detect. Of course, a su\ufb03ciently long proxy list might be expensive, so you have to balance out how much you\u2019re willing to pay for access to a high-quality list versus how much you\u2019re willing to deal with the effects of getting caught.<\/p>\n<p>Speaking of getting caught, di\ufb00erent sites have different means of dealing with bots. Google, for example, will time you out for 14 minutes unless you fill out a captcha.<\/p>\n<p>This might be an incentive to use a <a href=\"https:\/\/royadata.io\/blog\/best-captcha-breaking-service-with-proxies\/\">captcha breaker<\/a>, and that\u2019s your call. Some of them work, some of them are difficult to get working properly, and some are hit or miss. It\u2019s also difficult to get past Google\u2019s \u201cI am not a robot\u201d checkbox as well.<\/p>\n<p>Honestly, in many cases, it\u2019s best to just wait and watch, keep an eye on your bots, and fix issues when they come up. You can always do the captcha manually and then re-initialize the bot.<\/p>\n<p>Related: <a href=\"https:\/\/royadata.io\/blog\/how-backconnect-proxies-work\/\">How Backconnect &#038; Rotating Proxies Work?<\/a><\/p>\n<hr\/>\n<h2 id=\"use-a-supplier-that-replaces-proxies\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Use_a_Supplier_that_Replaces_Proxies\"><\/span>Use a Supplier that Replaces Proxies<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Some proxy suppliers don\u2019t care if an IP gets blocked, temporarily or permanently. They have disclaimers about the usage of their proxies.<\/p>\n<p>You see this a lot with public proxy lists in particular; they are used and abused so much that some sites even go out specifically to harvest proxy IPs and ban them before they can be used against them. This is why it can take ages to find a public proxy that works, and finding enough to harvest data is nearly impossible.<\/p>\n<p><strong>Private proxy lists are the better deal here, for two reasons<\/strong>.<\/p>\n<p>One, they don\u2019t have the past usage and prior history. Essentially, you\u2019re not starting on strike 2 like you might be with a public proxy list.<\/p>\n<p>Two, many private proxy list managers will o\ufb00er you a list of X number of proxies and will keep them rotated to keep them fresh. If a proxy is banned, they will replace the proxy, so that there\u2019s always that selection available. In other words, the lists don\u2019t degrade.<\/p>\n<p>Honestly, it\u2019s not all that difficult to harvest data from a site like Google as long as you set things up properly. It\u2019s only when you don\u2019t put thought into it, when you slam traffic into their face that screams bot, that you end up being blocked. Of course, you can always just <a href=\"http:\/\/www.bishopfox.com\/blog\/2014\/08\/searchdiggity-avoid-bot-detection-issues-leveraging-google-bing-shodan-apis\"><strong>use APIs<\/strong><\/a> to get your data, but sometimes what you want isn\u2019t available.<\/p>\n<hr\/>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/proxies-for-bypassing-blocked-search-engines\/\">Why Your Proxies Are Blocked by Search Engines<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/residential-proxy-guide\/#do-residential-proxies-ever-get-blocked\">Do Residential proxies ever get blocked?<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/seo-proxies\/\">Scraping Search Engines without Block and Captchas!<\/a><\/li>\n<li><strong><a href=\"https:\/\/royadata.io\/blog\/scrape-a-website-never-get-blacklisted\/\">How to Scrape a Website and Never Get Blocked!<\/a><\/strong><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>You don\u2019t want to get your proxies banned when harvesting data or web scraping, Right? How your proxy IPs are detected and Can we avoid proxies being flagged? of course! Pretty much any time you\u2019re using high-quality proxies in any significant number, you\u2019re doing it because you want to use some kind of bot. You\u2019re &#8230; <a title=\"How to Avoid Proxies Get banned or blocked? (2022)\" class=\"read-more\" href=\"http:\/\/royadata.io\/blog\/how-to-prevent-proxy-banned\/\" aria-label=\"More on How to Avoid Proxies Get banned or blocked? (2022)\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":683,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6507"}],"collection":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/comments?post=6507"}],"version-history":[{"count":0,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6507\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media\/683"}],"wp:attachment":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media?parent=6507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/categories?post=6507"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/tags?post=6507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}