{"id":5995,"date":"2023-10-18T14:47:43","date_gmt":"2023-10-18T14:47:43","guid":{"rendered":"https:\/\/royadata.io\/blog\/?p=5995"},"modified":"2023-10-18T14:47:43","modified_gmt":"2023-10-18T14:47:43","slug":"avoid-getting-blocked-with-python","status":"publish","type":"post","link":"http:\/\/royadata.io\/blog\/avoid-getting-blocked-with-python\/","title":{"rendered":"How to Avoid Getting Blocked with Python: 8 Tips And Tricks"},"content":{"rendered":"<blockquote>\n<p>Do you want to avoid getting blocked while scraping data from the web or carrying out other tasks using Python? Then you are on the right page, as the article below discusses the key methods of avoiding blocks in Python.<\/p>\n<\/blockquote>\n<p><picture class=\"aligncenter size-full wp-image-20264 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-300x167.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-768x426.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20555'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20555'%3E%3C\/svg%3E\" alt=\"Avoid Getting Blocked with Python\" width=\"1000\" height=\"555\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-300x167.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-768x426.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20264\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-300x167.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-768x426.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python.jpg\" alt=\"Avoid Getting Blocked with Python\" width=\"1000\" height=\"555\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-300x167.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Getting-Blocked-with-Python-768x426.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Web automation makes our tasks on the Internet easier. Some tasks are even impossible to carry out without web automation, especially when required at a large scale.<\/p>\n<p>Even with the importance of web automation to the web, web automation, in general, is hated by most web services. No website wants automation access \u2014 not for scraping its data or making purchases in an automated manner.<\/p>\n<p>If you engage in web scraping or other forms of automation, you will agree with me that blocks are normal, except you take conscious steps to avoid them. Fortunately for us, you can actually avoid getting blocked.<\/p>\n<p>If you are a Python developer looking to avoid getting blocked with Python, this article has been written for you. It is important you know that you need to put some things into consideration and use some technics into play to successfully avoid getting blocked as websites are becoming smarter at detecting bot-related activities.<\/p>\n<p>One thing you need to know for sure is that if you know how a website detects bot activities, you can bypass the checks and make your bot look human as possible.<\/p>\n<hr\/>\n<h2 id=\"8-proven-tips-to-avoid-getting-blocked-with-python\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"8_Proven_Tips_to_Avoid_Getting_Blocked_with_Python\"><\/span>8 Proven Tips to Avoid Getting Blocked with Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p>Python is just one of the programming languages used to develop web scrapers. However, it is actually one of the popular languages for bot development in general. Even if you are not a Python developer, the methods described here can be applied to your programming language of choice. Below are some of the ways you can avoid getting blocked with Python.<\/p>\n<h3 id=\"1-use-rotating-proxies\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"1_Use_Rotating_Proxies\"><\/span><strong>1: <a href=\"https:\/\/royadata.io\/blog\/rotating-proxies\/\">Use Rotating Proxies<\/a><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><picture class=\"aligncenter size-full wp-image-20311 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie.png.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-300x176.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-768x450.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20586'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20586'%3E%3C\/svg%3E\" alt=\"Exploring-Rotating-Proxie\" width=\"1000\" height=\"586\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie.png 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-300x176.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-768x450.png 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20311\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie.png.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-300x176.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-768x450.png.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie.png\" alt=\"Exploring-Rotating-Proxie\" width=\"1000\" height=\"586\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie.png 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-300x176.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Exploring-Rotating-Proxie-768x450.png 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The most elementary method of <a href=\"https:\/\/royadata.io\/blog\/scrape-a-website-never-get-blacklisted\/\">avoiding blocks<\/a> when carrying out automation on the web is by using proxies. Proxies are basically intermediary servers that provide you with alternative IP addresses.<\/p>\n<p>For their rotating counterparts, you are not just provided with one IP address \u2014 the IP address assigned to you is frequently changed. <a href=\"https:\/\/royadata.io\/blog\/how-to-rotate-proxies-with-python\/\">Frequent change of IP address<\/a> is quite important if you must avoid getting blocked.<\/p>\n<p>It turns out that each website has a request limit permit per IP address. If you try sending more requests from the same IP address, you will most likely get blocked. This request limit is not made known to the public and varies depending on the website and task.<\/p>\n<p>But one thing we know that is certain \u2014 frequent changes of IP will help you avoid blocks due to sending too many requests from one IP address. Bots, by nature, send too many requests within a short period of time, and they need rotating proxies to scale through anti-spam systems of websites.<\/p>\n<p>We recommend you make use of a high-quality <a href=\"https:\/\/royadata.io\/blog\/residential-proxies\/\">residential proxy network<\/a> with automatic IP rotation support. Bright Data and Smartproxy are some of the top recommended residential proxy networks with huge IP pools, good location support, and are quite undetectable.<\/p>\n<div class=\"su-note\" style=\"border-color:#e5e5e5;border-radius:5px;-moz-border-radius:5px;-webkit-border-radius:5px;\">\n<div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#ffffff;border-color:#ffffff;color:#000000;border-radius:5px;-moz-border-radius:5px;-webkit-border-radius:5px;\">\n<ul>\n<li><a class=\"thirstylink\" title=\"luminati\" href=\"###luminati\/\"  rel=\"nofollow noopener noreferrer\" data-linkid=\"78\" data-nojs=\"false\"><strong>BrightData<\/strong> (Luminati Proxy)<\/a> \u2013 Best Proxy Overall <<strong>Experts&#8217; #1 for Scraping<\/strong>><\/li>\n<li><a class=\"thirstylink\" title=\"smartproxy\" href=\"###smartproxy\/\"  rel=\"nofollow noopener noreferrer\" data-linkid=\"83\" data-nojs=\"false\"><strong>Smartproxy<\/strong><\/a> \u2013 Fast Residential Proxy pool <Best Value Choice><\/li>\n<li><a class=\"thirstylink\" title=\"soax\" href=\"###soax\/\"  rel=\"nofollow noopener noreferrer\" data-linkid=\"3329\" data-nojs=\"false\"><strong>Soax<\/strong><\/a> \u2013 Best Mobile Proxy pool <Cleanest for Instagram automation><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<p>For some tasks, residential proxies will not work \u2014 you will need mobile proxies. You can purchase rotating mobile proxies from Bright Data too. Soax is another provider of rotating mobile proxies that work. Using proxies in Python code is simple. Below is a sample code using the third-party request library.<\/p>\n<pre>import requests\n\n\n\nproxies = {\n\n\n\n'http': 'http:\/\/proxy.example.com:8080',\n\n\n\n'https': 'http:\/\/secureproxy.example.com:8090',\n\n\n\n}\n\n\n\nurl = 'http:\/\/mywebsite.com\/example'\n\n\n\n\n\nresponse = requests.post(url, proxies=proxies)<\/pre>\n<hr\/>\n<h3 id=\"2-use-captcha-solver\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"2_Use_Captcha_Solver\"><\/span><strong>2: Use Captcha Solver<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><picture class=\"aligncenter size-full wp-image-20262 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-300x193.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-768x495.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20580'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20580'%3E%3C\/svg%3E\" alt=\"Use Captcha Solver\" width=\"900\" height=\"580\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-300x193.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-768x495.png 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20262\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-300x193.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-768x495.png.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver.png\" alt=\"Use Captcha Solver\" width=\"900\" height=\"580\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-300x193.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Captcha-Solver-768x495.png 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Websites are increasingly becoming smarter by the day, and just using proxies is not enough. Even with proxies, they can make a guess whether you are a bot or not. Some of the popular forms of blocks you will experience as a bot developer are <a href=\"https:\/\/royadata.io\/blog\/how-to-avoid-captcha\/\">Captchas<\/a>.<\/p>\n<p>And when you are hit with one, unless you are able to solve it, your task will end for that moment. How to deal with this is simple \u2014 make use of a captcha solver. With Captcha solvers, you are able to solve captcha the captchas that appear, thereby allowing you to continue your automation task without hindrance.<\/p>\n<p>When it comes to solving captchas, there are many <a href=\"https:\/\/www.privateproxyreviews.com\/captcha-solving-service\/\"  rel=\"noopener noreferrer\">captcha-solving services<\/a> in the market. 2Captcha and DeathByCaptcha are some of the popular options available to you. While some of the captchas can be solved via AI, most of the captchas nowadays require humans, and as such, these captcha solvers empty human captcha solvers from third-world countries to help solve captchas.<\/p>\n<p>For this reason, do not expect to get free captcha solvers that work, especially when dealing with complex captchas that can\u2019t be solved by using AI.<\/p>\n<hr\/>\n<h3 id=\"3-set-custom-user-agents-and-other-relevant-headers-and-rotate-them\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"3_Set_Custom_User_Agents_and_Other_Relevant_Headers_%E2%80%94-_and_Rotate_Them\"><\/span><strong>3: Set Custom User Agents and Other Relevant Headers \u2014- and Rotate Them<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><picture class=\"aligncenter size-full wp-image-20261 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers.png.webp 899w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-300x150.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-768x384.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20899%20450'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 899px) 100vw, 899px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20899%20450'%3E%3C\/svg%3E\" alt=\"Set Custom User Agents and Other Relevant Headers\" width=\"899\" height=\"450\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers.png 899w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-300x150.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-768x384.png 768w\" data-sizes=\"(max-width: 899px) 100vw, 899px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20261\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers.png.webp 899w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-300x150.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-768x384.png.webp 768w\" sizes=\"(max-width: 899px) 100vw, 899px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers.png\" alt=\"Set Custom User Agents and Other Relevant Headers\" width=\"899\" height=\"450\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers.png 899w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-300x150.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Set-Custom-User-Agents-and-Other-Relevant-Headers-768x384.png 768w\" sizes=\"(max-width: 899px) 100vw, 899px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>One of the easiest way web services detect bots is by their user agents and other relevant headers. Python is a popular programming language for web scraping, and websites know the default headers set by Python and its popular HTTP libraries.<\/p>\n<p>Take, for instance, the requests library use \u201cpython-requests\/2.25\u201d as the default user agent string. This will give you right away. In the past, I tried scraping\u00a0 Amazon without setting a custom user agent header using Python, and I was blocked. After setting the user agent to that of my Chrome browser, the request went through.<\/p>\n<p>The user agent is meant to identify the client. Since websites only allow regular users, you are better off using the user agent of popular browsers. <a href=\"https:\/\/www.whatismybrowser.com\/guides\/the-latest-user-agent\/\"  rel=\"noopener noreferrer nofollow\">Here is a web page<\/a> you can find details of user agents of popular web browsers. It is also important you know that aside from the user agent, there are also other relevant headers you need to set.<\/p>\n<p>This differs depending on the websites. Use the Network tools in the Developer Tools of your browser to check the necessary headers set by your browser when sending a request to your website of target.<\/p>\n<p>Some of the popular request headers include \u201cAccept\u201d, \u201cAccept-Encoding\u201d, and \u201cAccept-Language.\u201d The request headers that are unique and a must for your target website will be revealed to you if you use the developer tool. Just setting user agent is not enough. You also need to rotate the user agent. Below is a code on how to set the user agent string in Python.<\/p>\n<pre>import requests\n\n\n\nheaders = {\"User-Agent\": \"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/70.0.3538.77 Safari\/537.36\"}\n\n\n\nresponse = requests.get(\"http:\/\/www.kite.com\", headers=headers)<\/pre>\n<hr\/>\n<h3 id=\"4-use-a-headless-browser\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"4_Use_a_Headless_Browser\"><\/span><strong>4: Use a Headless Browser<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><picture class=\"aligncenter size-full wp-image-20258 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-300x150.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-768x384.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20450'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20450'%3E%3C\/svg%3E\" alt=\"Headless Browser\" width=\"900\" height=\"450\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-300x150.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-768x384.png 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20258\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-300x150.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-768x384.png.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser.png\" alt=\"Headless Browser\" width=\"900\" height=\"450\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-300x150.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Headless-Browser-768x384.png 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Another method you can use to avoid getting blocked is by using headless browsers. Headless browsers are software that acts like real browsers but without the browser User Interface (UI). They are mostly used for automated testing and web automation in general.<\/p>\n<p>In the past, the only reason why you would need to make use of headless browsers for web scraping or other forms of automation was if the website of target depends on JavaScript to render its content. In the current times, websites use JavaScript to collect various data, which it uses for generating browser fingerprints or simple monitor behavior.<\/p>\n<p>If you use regular HTTP libraries like the requests HTTP library for Python, your website of target can tell you are using a bot and not a browser. For Python developers, Selenium is the tool for the job. Selenium automates web browsers so that your bot that acts like a real browser.<\/p>\n<p>It could trigger events just as clicks, scrolls, and all kinds of events. This will even reduce the occurrence of captchas because of how real your activities will be. The only major issue associated with using Selenium or any other tool for automating browsers is that it is slower compared to using regular HTTP libraries.<\/p>\n<hr\/>\n<h3 id=\"5-set-random-delays-between-requests\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"5_Set_Random_Delays_Between_Requests\"><\/span><strong>5: Set Random Delays Between Requests<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>One of the reasons why you get blocked easily is that your bot is sending too many web requests within a short period of time. If you are logged into an account on a website, then just know that proxies will not help you \u2014 you are known. Instead of trying to use proxies, you can as well throttle the speed at which you send requests.<\/p>\n<p>As stated earlier, most websites will block you if you surpass their request limit. The only major way to deal with this is by setting delays in your code. For python, you can use the \u201csleep\u201d method in the \u201ctime\u201d class to set delays between requests. Aside from seeing delays, you are also better off making the delays random, as sending requests at the same intervals will also give you out as a bot.<\/p>\n<hr\/>\n<h3 id=\"6-avoid-honeypots\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"6_Avoid_Honeypots\"><\/span><strong>6: Avoid Honeypots<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><picture class=\"aligncenter size-full wp-image-20257 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots.png.webp 930w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-300x179.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-768x457.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20930%20554'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 930px) 100vw, 930px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20930%20554'%3E%3C\/svg%3E\" alt=\"Avoid Honeypots\" width=\"930\" height=\"554\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots.png 930w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-300x179.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-768x457.png 768w\" data-sizes=\"(max-width: 930px) 100vw, 930px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20257\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots.png.webp 930w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-300x179.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-768x457.png.webp 768w\" sizes=\"(max-width: 930px) 100vw, 930px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots.png\" alt=\"Avoid Honeypots\" width=\"930\" height=\"554\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots.png 930w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-300x179.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Avoid-Honeypots-768x457.png 768w\" sizes=\"(max-width: 930px) 100vw, 930px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Websites are becoming sneaky with their <a href=\"https:\/\/royadata.io\/blog\/web-scraping-practices\/\">anti-scraping techniques<\/a>. One of the ways they detect web scrapers is by setting honeytraps. Honeytraps are basically adding invisible links to a page. The link is disguised so that regular Internet users will not see them.<\/p>\n<p>The link will have either its CSS attribute for display to none {display:none} or visibility to hidden {visibility:hidden}. With these attribute values, the links aren\u2019t visible to the eyes, but automated bots will see them. Once there is a visit to such a URL, the website will block further requests.<\/p>\n<p>Sometimes, they can get even smarter. Instead of using any of the aforementioned attributes, they will just set the URL cooler to white if the cooler of the background is white. This way, web scrapers looking to avoid URLs with their display or visibility value set to make them invisible will still get trapped.<\/p>\n<p>For this reason, you should get all URLs to be crawled programmatically and make sure it does not have attributes or CSS settings that will make them hidden. Anyone detected should be avoided to avoid getting detected and blocked.<\/p>\n<hr\/>\n<h3 id=\"7-scrape-google-cache-instead\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"7_Scrape_Google_Cache_Instead\"><\/span><strong>7: Scrape Google Cache Instead<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><picture class=\"aligncenter size-full wp-image-20260 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-300x169.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-768x432.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20506'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20506'%3E%3C\/svg%3E\" alt=\"Scrape Google Cache Instead\" width=\"900\" height=\"506\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-300x169.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-768x432.png 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20260\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-300x169.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-768x432.png.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead.png\" alt=\"Scrape Google Cache Instead\" width=\"900\" height=\"506\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-300x169.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrape-Google-Cache-Instead-768x432.png 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Sometimes, your target site might just be a difficult nut to crack. If you do not want to deal with the hassles of trying to avoid getting blocked, you can scrape from the Google index.<\/p>\n<p>Fortunately for us, Google keeps a cache of the pages available in its index. And the good news is it is not as protected as the Google Search platform itself. You can scrap from this index and save yourself the headache of dealing with anti-spam systems. To scrap from the Google cache, use this URL: \u201c<a href=\"https:\/\/webcache.googleusercontent.com\/search?q=cache:YOUR_URL\">http:\/\/webcache.googleusercontent.com\/search?q=cache:YOUR_URL<\/a>\u201c. Replace the YOUR_URL with the URL of your target page.<\/p>\n<p>However, it is important you know that not all pages are available in Google Cache. Any webpage not available on Google, such as password-protected pages, can\u2019t be found in Google Cache.<\/p>\n<p>Also important is the fact that some websites, even though available on Google, stop Google from caching their pages for public access. The issue of freshness is also something to consider. If the data on a page changes often, the Google cache is useless in this case \u2014 and for unpopular websites, this is even worse because of the long delays between crawls.<\/p>\n<hr\/>\n<h3 id=\"8-use-scraping-apis\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"8_Use_Scraping_APIs\"><\/span><strong>8: Use Scraping APIs<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><picture class=\"aligncenter size-full wp-image-20263 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-300x145.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-768x371.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20435'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20435'%3E%3C\/svg%3E\" alt=\"Use Scraping APIs\" width=\"900\" height=\"435\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-300x145.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-768x371.png 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-20263\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs.png.webp 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-300x145.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-768x371.png.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs.png\" alt=\"Use Scraping APIs\" width=\"900\" height=\"435\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-300x145.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Scraping-APIs-768x371.png 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The last resort for you to avoid getting blocked is using a scraping API. Scraping APIs are REST APIs that help you extract data from websites without you dealing with the issues of blocks. Most scraping APIs handle proxy management, headless browsers, and captchas. Some even come with parsers to make the extraction of data points easier for you.<\/p>\n<p>And one good thing with scraping APIs is that you only get to pay for successful requests \u2014 this makes them strive more to deliver, as that is only when they make money. With scraping APIs, you only get to focus on data and not blocks.<\/p>\n<p>It also helps you avoid worrying about managing <a href=\"https:\/\/royadata.io\/blog\/web-scraping-tools\/\">web scrapers<\/a> and website changes. Currently, ScraperAPI, ScrapingBee, and WebScraperAPI are the best scraping APIs out there. They are also affordable too.<\/p>\n<blockquote>\n<p><a href=\"https:\/\/royadata.io\/blog\/web-scraping-api\/\">Top 3 Best Web Scraping APIs<\/a><\/p>\n<\/blockquote>\n<blockquote>\n<ul>\n<li id=\"apify-proxy\" class=\"ftwp-heading\"><a class=\"thirstylink\" title=\"apify\" href=\"###apify\/\"  rel=\"nofollow noopener noreferrer\" data-linkid=\"9501\" data-nojs=\"false\"><b>Apify Proxy<\/b><\/a><\/li>\n<li><a class=\"thirstylink\" title=\"scraperapi\" href=\"###scraperapi\/\"  rel=\"nofollow noopener noreferrer\">ScraperAPI<\/a><\/li>\n<li><a class=\"thirstylink\" title=\"scrapingbee\" href=\"###scrapingbee\/\"  rel=\"nofollow noopener noreferrer\">ScrapingBee<\/a><\/li>\n<\/ul>\n<\/blockquote>\n<hr\/>\n<pre style=\"text-align: center;\"><strong>Conclusion<\/strong><\/pre>\n<p>The methods described above are some of the best methods you can use to avoid getting blocked when automating your tasks in Python. One good thing about the methods described above is that they are not unique to Python.<\/p>\n<p>The methods to avoid getting blocked when carrying out web scraping or other forms of automation is not unique to any programming language. You can apply them in other languages as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Do you want to avoid getting blocked while scraping data from the web or carrying out other tasks using Python? Then you are on the right page, as the article below discusses the key methods of avoiding blocks in Python. Web automation makes our tasks on the Internet easier. Some tasks are even impossible to &#8230; <a title=\"How to Avoid Getting Blocked with Python: 8 Tips And Tricks\" class=\"read-more\" href=\"http:\/\/royadata.io\/blog\/avoid-getting-blocked-with-python\/\" aria-label=\"More on How to Avoid Getting Blocked with Python: 8 Tips And Tricks\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":182,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/5995"}],"collection":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/comments?post=5995"}],"version-history":[{"count":0,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/5995\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media\/182"}],"wp:attachment":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media?parent=5995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/categories?post=5995"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/tags?post=5995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}