{"id":6296,"date":"2023-10-18T14:47:43","date_gmt":"2023-10-18T14:47:43","guid":{"rendered":"https:\/\/royadata.io\/blog\/?p=6296"},"modified":"2023-10-18T14:47:43","modified_gmt":"2023-10-18T14:47:43","slug":"use-chrome-headless-and-dedicated-proxies-to-scrape-any-website","status":"publish","type":"post","link":"http:\/\/royadata.io\/blog\/use-chrome-headless-and-dedicated-proxies-to-scrape-any-website\/","title":{"rendered":"Use Chrome Headless and Dedicated Proxies to Scrape Any Website"},"content":{"rendered":"<p><picture class=\"aligncenter wp-image-609 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies.jpg.webp 1333w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-300x179.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-768x458.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-1024x611.jpg.webp 1024w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201137%20678'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1137px) 100vw, 1137px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201137%20678'%3E%3C\/svg%3E\" alt=\"Use Chrome Headless and Dedicated Proxies\" width=\"1137\" height=\"678\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies.jpg 1333w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-300x179.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-768x458.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-1024x611.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-1200x716.jpg 1200w\" data-sizes=\"(max-width: 1137px) 100vw, 1137px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-609\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies.jpg.webp 1333w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-300x179.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-768x458.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-1024x611.jpg.webp 1024w\" sizes=\"(max-width: 1137px) 100vw, 1137px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies.jpg\" alt=\"Use Chrome Headless and Dedicated Proxies\" width=\"1137\" height=\"678\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies.jpg 1333w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-300x179.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-768x458.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-1024x611.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Use-Chrome-Headless-and-Dedicated-Proxies-1200x716.jpg 1200w\" sizes=\"(max-width: 1137px) 100vw, 1137px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Everybody knows Google Chrome is the market leader when it comes to web browsing.<\/p>\n<blockquote>\n<p>At least, that\u2019s what the statistics say: an overwhelming 61.20 percent of Internet users are browsing with Chrome as of 2018.<\/p>\n<\/blockquote>\n<p>If Chrome is the leading web browser, then it makes sense that Chrome Headless will be the leading browser for <a href=\"https:\/\/medium.com\/@briananderson2209\/best-automation-testing-tools-for-2018-top-10-reviews-8a4a19f664d2\">automated application testing<\/a>, <a href=\"https:\/\/www.privateproxyreviews.com\/web-scraping-python-scraper-tools\/\">web scraping<\/a>, and more. Google\u2019s release of <strong><a href=\"https:\/\/github.com\/GoogleChrome\/puppeteer\"  rel=\"noopener noreferrer\">Puppeteer<\/a><\/strong>, the Node.js API that makes automating web actions simple for Chrome users, sets the stage for easy, robust web scraping.<\/p>\n<p><picture class=\"aligncenter wp-image-610 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies.jpg.webp 1341w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-300x165.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-768x423.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-1024x564.jpg.webp 1024w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201125%20619'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1125px) 100vw, 1125px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201125%20619'%3E%3C\/svg%3E\" alt=\"Pyramid of puppeteer of proxies\" width=\"1125\" height=\"619\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies.jpg 1341w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-300x165.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-768x423.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-1024x564.jpg 1024w\" data-sizes=\"(max-width: 1125px) 100vw, 1125px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-610\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies.jpg.webp 1341w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-300x165.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-768x423.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-1024x564.jpg.webp 1024w\" sizes=\"(max-width: 1125px) 100vw, 1125px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies.jpg\" alt=\"Pyramid of puppeteer of proxies\" width=\"1125\" height=\"619\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies.jpg 1341w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-300x165.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-768x423.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyramid-of-puppeteer-of-proxies-1024x564.jpg 1024w\" sizes=\"(max-width: 1125px) 100vw, 1125px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The only problem is there is no built-in privacy or identity protection. You need to be able to use all the great features Chrome o\ufb00ers while being able to automate nearly anything you want and be able to do so anonymously through <a href=\"https:\/\/royadata.io\/blog\/proxy-server\/\">proxies<\/a>.<\/p>\n<blockquote>\n<p><strong>This can prove to be a challenge.<\/strong><\/p>\n<\/blockquote>\n<p><strong>It turns out there are ways to use Chrome Headless with proxies to safely scrape website data without exposing your identity<\/strong>. To find out, we\u2019ll have to start with what makes Chrome what it is, and what makes a headless browser what it is.<\/p>\n<hr\/>\n<h2 id=\"what-is-chrome-headless\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"What_is_Chrome_Headless\"><\/span><a href=\"https:\/\/royadata.io\/blog\/headless-browser\/\">What is Chrome Headless?<\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Google describes <strong><a href=\"https:\/\/developers.google.com\/web\/updates\/2017\/04\/headless-chrome\"  rel=\"noopener noreferrer\">Chrome Headless<\/a><\/strong>\u00a0as, \u201c\u2026a way to run the Chrome browser in a headless environment. Essentially, running Chrome without chrome!\u201d<\/p>\n<p>Perfect. Clear as mud.<\/p>\n<p>A better way to think of Chrome Headless is running Chrome without the graphic interface.<\/p>\n<p>That\u2019s actually the definition of a headless browser \u2013 instead of clicking on things, you execute code through a command-line interface or network communication.<\/p>\n<p>There are plenty of reasons why you may want to do that. Automating Google Chrome is otherwise very di\ufb03cult to do, which is why web application developers have historically used other browsers like the now-defunct PhantomJS for the purpose. But if so much of the Internet uses Chrome, it just makes sense to have Headless Chrome in your toolkit.<\/p>\n<p><picture class=\"aligncenter wp-image-611 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies.jpg.webp 1332w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-300x179.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-768x459.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-1024x612.jpg.webp 1024w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201161%20694'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1161px) 100vw, 1161px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201161%20694'%3E%3C\/svg%3E\" alt=\"Phantom JS of proxies\" width=\"1161\" height=\"694\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies.jpg 1332w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-300x179.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-768x459.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-1024x612.jpg 1024w\" data-sizes=\"(max-width: 1161px) 100vw, 1161px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-611\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies.jpg.webp 1332w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-300x179.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-768x459.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-1024x612.jpg.webp 1024w\" sizes=\"(max-width: 1161px) 100vw, 1161px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies.jpg\" alt=\"Phantom JS of proxies\" width=\"1161\" height=\"694\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies.jpg 1332w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-300x179.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-768x459.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Phantom-JS-of-proxies-1024x612.jpg 1024w\" sizes=\"(max-width: 1161px) 100vw, 1161px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>If you need to scrape data from a variety of websites and need a browser that can handle all the HTML, CSS, and JavaScript without generating error after error, Headless Chrome is the solution for you. Once you learn how to use proxies with it, you\u2019ll be able to scrape just about any website on the Internet with ease and style.<\/p>\n<hr\/>\n<h2 id=\"how-chrome-headless-works\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"How_Chrome_Headless_Works\"><\/span>How Chrome Headless Works<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The basic environment o\ufb00ered by Chrome Headless should be familiar to anyone who has spent any time using another headless browser. You have a command-line interface perfectly suited to quickly scanning and scraping website data.<\/p>\n<p>What you have in addition to this is Google\u2019s fast, modern JavaScript engine and DevTools API. You have Google engineering ensuring you get support for every website on the Internet.<\/p>\n<p>But controlling Chrome Headless requires using specific libraries. There are lots of options here, and Puppeteer is one of the most popular, since it, too, is a Google product. Although there are libraries that use a variety of languages, NodeJS APIs are generally recommended because that\u2019s the same language as the data interpreted on the webpages you plan on scraping.<\/p>\n<p>If you want to use a simple, no-nonsense API that is designed for web scraping and not much else, you can use <strong><a href=\"https:\/\/nickjs.org\/\"  rel=\"noopener noreferrer\">NickJS<\/a><\/strong> . If you want all-purpose mapping through a complete API that is somewhat similar to Google\u2019s DevTools, go with Puppeteer. You can even use a <strong><a href=\"https:\/\/docs.google.com\/document\/d\/1rlqcp8nk-ZQvldNJWdbaMbwfDbJoOXvahPCDoPGOwhQ\/edit\"  rel=\"noopener noreferrer\">C++ API<\/a><\/strong> if you want.<\/p>\n<p><picture class=\"aligncenter size-full wp-image-612 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless.jpg.webp 1288w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-300x156.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-768x398.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-1024x531.jpg.webp 1024w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201288%20668'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1288px) 100vw, 1288px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201288%20668'%3E%3C\/svg%3E\" alt=\"Working procedure of Chrome Headless\" width=\"1288\" height=\"668\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless.jpg 1288w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-300x156.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-768x398.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-1024x531.jpg 1024w\" data-sizes=\"(max-width: 1288px) 100vw, 1288px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-612\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless.jpg.webp 1288w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-300x156.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-768x398.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-1024x531.jpg.webp 1024w\" sizes=\"(max-width: 1288px) 100vw, 1288px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless.jpg\" alt=\"Working procedure of Chrome Headless\" width=\"1288\" height=\"668\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless.jpg 1288w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-300x156.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-768x398.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Working-procedure-of-Chrome-Headless-1024x531.jpg 1024w\" sizes=\"(max-width: 1288px) 100vw, 1288px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>But in order to fully leverage Chrome Headless for scraping, you need to be able to use it with proxies. As mentioned in the introduction to this article, that can be tricky.<\/p>\n<hr\/>\n<h2 id=\"how-to-use-proxies-with-chrome-headless\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"How_to_Use_Proxies_with_Chrome_Headless\"><\/span>How to Use Proxies with Chrome Headless<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If you use a custom proxy that requires authentication, you may have found the Puppeteer library lacking full support. You need to use the page.authenticate() function to pass your log-in credentials into the Chrome login dialog box.<\/p>\n<p>Example code would look like this:<\/p>\n<blockquote>\n<p>const puppeteerSmme=require(\u2018puppeteer&#8217;);<\/p>\n<p>(async() => {<\/p>\n<p>const proxyUrl = \u2018http:\/\/proxy.example.com:8000&#8242;;<\/p>\n<p>const username = \u2018bob&#8217;;<br \/>\nconst password = \u2018password123&#8242;;<\/p>\n<p>const browser = await puppeteer.launch({<\/p>\n<p>args: [\u2018\u2013proxy-server=${proxyUrl}&#8217;],<\/p>\n<p>headless: false,<br \/>\n});<\/p>\n<p>const page = await browser.newPage();<\/p>\n<p>await page.authenticate({ username, password });<\/p>\n<p>await page.goto(\u2018https:\/\/www.example.com&#8217;);<\/p>\n<p>await browser.close();<br \/>\n})();<\/p>\n<\/blockquote>\n<p>This is a simple way to use a proxy on Headless Chrome for web scraping. However, it can\u2019t do everything you may need your authenticated proxy browser to do. For instance, there is a chance it will hang up on a page that requires authentication since it\u2019s not clear from the code how the headless browser will handle multiple authentication requests.<\/p>\n<p>In that case, you can use <strong>Apify\u2019s <a href=\"https:\/\/github.com\/apifytech\/proxy-chain\"  rel=\"noopener noreferrer\">proxy-chain<\/a><\/strong> package. Essentially, this package ensures that you can anonymize an authenticated proxy through Puppeteer by pushing it through a local proxy server first. It supports HTTP proxy forwarding and tunneling through HTTP CONNECT \u2013 so you can also use it when accessing HTTPS and FTP.<\/p>\n<p><picture class=\"aligncenter wp-image-613 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies.jpg.webp 1415w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-300x132.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-768x338.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-1024x450.jpg.webp 1024w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201063%20467'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1063px) 100vw, 1063px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201063%20467'%3E%3C\/svg%3E\" alt=\"HTTP CONNECT of proxies\" width=\"1063\" height=\"467\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies.jpg 1415w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-300x132.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-768x338.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-1024x450.jpg 1024w\" data-sizes=\"(max-width: 1063px) 100vw, 1063px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-613\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies.jpg.webp 1415w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-300x132.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-768x338.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-1024x450.jpg.webp 1024w\" sizes=\"(max-width: 1063px) 100vw, 1063px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies.jpg\" alt=\"HTTP CONNECT of proxies\" width=\"1063\" height=\"467\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies.jpg 1415w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-300x132.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-768x338.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-CONNECT-of-proxies-1024x450.jpg 1024w\" sizes=\"(max-width: 1063px) 100vw, 1063px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Here is what using Apify\u2019s proxy-chain package with Puppeteer looks like in Node.JS:<\/p>\n<blockquote>\n<p>const puppeteer=require(\u2018puppeteer&#8217;);consl<\/p>\n<p>proxyChain = require(\u2018proxy-chain&#8217;);<\/p>\n<p>(async() => {<\/p>\n<p>const oldProxyUrl = \u2018http:\/\/bob:<a href=\"\/cdn-cgi\/l\/email-protection\" class=\"__cf_email__\" data-cfemail=\"f181908282869e8395c0c3c2b181839e8988df9489909c819d94df929e9c\">[email\u00a0protected]<\/a>:8000&#8242;;<\/p>\n<p>const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);<\/p>\n<p>\/\/ Prints an IP address\u2028 console.log(newProxyUrl);<\/p>\n<p>const browser = await puppeteer.launch({<\/p>\n<p>args: [\u2018\u2013proxy-server=${newProxyUrl}&#8217;],<\/p>\n<p>});<\/p>\n<p>const page = await browser.newPage();<\/p>\n<p>await page.goto(\u2018https:\/\/www.example.com&#8217;);<\/p>\n<p>await page.screenshot({<\/p>\n<p>path: \u2018example.png&#8217;<\/p>\n<p>});<\/p>\n<p>await browser.close();<\/p>\n<p>})();<\/p>\n<\/blockquote>\n<p>This will cover most of the web scraping requests you\u2019re likely to make. There are additional, <a href=\"https:\/\/www.npmjs.com\/package\/proxy-chain\"><strong>more advanced options<\/strong><\/a> for custom responses and connecting to external APIs, but in many cases, this code will do the job.<\/p>\n<hr\/>\n<p>You maybe like to read,<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/proxy-api-for-scraping\/\">Best Proxy APIs for Scraping<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/how-to-build-a-web-crawler-using-selenium-proxies\/\">Building a Web Crawler Using Selenium and Proxies<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/scrapy-vs-selenium-vs-beautifulsoup-for-web-scraping\/\">Scrapy Vs. Beautifulsoup Vs. Selenium for Web Scraping<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"tips-for-web-scraping-with-chrome-headless\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Tips_for_Web_Scraping_with_Chrome_Headless\"><\/span>Tips for Web Scraping with Chrome Headless<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now that you have the ability to <a href=\"https:\/\/royadata.io\/blog\/proxies-for-puppeteer\/\">use proxies with Puppeteer<\/a>, it\u2019s time to look at ways to make your headless scraping experience run smoother and more successfully. <strong>Use these headless scraping tips for web scraping more e\ufb03cient.<\/strong><\/p>\n<hr\/>\n<h3 id=\"extract-data-with-jquery\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Extract_Data_with_jQuery\"><\/span>Extract Data with jQuery<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<blockquote>\n<p>Why go through the di\ufb03culty of setting customized data scraping paths when you can use jQuery for the purpose?<\/p>\n<\/blockquote>\n<p>Any website that gives you its Document Object Model (DOM), can be scraped this way, since the DOM is just a structured tree of elements containing all the data on a given page. Use jQuery to scrape that and you have immediate access to the data you\u2019re looking for.<\/p>\n<p><picture class=\"aligncenter size-full wp-image-614 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery.jpg.webp 1305w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-300x134.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-768x344.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-1024x459.jpg.webp 1024w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201305%20585'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1305px) 100vw, 1305px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201305%20585'%3E%3C\/svg%3E\" alt=\"Extract Data with jQuery\" width=\"1305\" height=\"585\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery.jpg 1305w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-300x134.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-768x344.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-1024x459.jpg 1024w\" data-sizes=\"(max-width: 1305px) 100vw, 1305px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-614\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery.jpg.webp 1305w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-300x134.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-768x344.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-1024x459.jpg.webp 1024w\" sizes=\"(max-width: 1305px) 100vw, 1305px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery.jpg\" alt=\"Extract Data with jQuery\" width=\"1305\" height=\"585\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery.jpg 1305w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-300x134.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-768x344.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Extract-Data-with-jQuery-1024x459.jpg 1024w\" sizes=\"(max-width: 1305px) 100vw, 1305px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>When dealing with websites that already use jQuery, you only need to read through a couple of lines of page data to find what you\u2019re looking for. Otherwise, you can load it yourself.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/data-parsing\/\">What is Data Parsing and Parsing Techniques involved?<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/web-scraping-practices\/\">Best Web Scraping Practices &#038; Techniques Tips<\/a><\/li>\n<\/ul>\n<hr\/>\n<h3 id=\"solve-captchas-on-the-fly\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Solve_CAPTCHAs_on_the_Fly\"><\/span>Solve CAPTCHAs on the Fly<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>What happens when you\u2019re busy extracting data from various websites and suddenly you get your proxies stuck behind CAPTCHA pages? In most cases, you have to call it quits for a while and wait until the website no longer asks for human authentication. This is the most reliable and recommended solution.<\/p>\n<p>But if you learn how to make an HTTP request between your scraper code and a CAPTCHA solving service, you can solve CAPTCHAs automatically, both using optical character recognition and actual human beings for the purpose (although this method is not recommended).<\/p>\n<p>While the majority of CAPTCHAs can be solved through artificial intelligence, most of these services employ a small number of humans to solve the tougher captchas on a full-time salary. In any case, the toughest ones generally don\u2019t take longer than half a minute to solve.<\/p>\n<p>There are lots of APIs for <a href=\"https:\/\/royadata.io\/blog\/best-captcha-breaking-service-with-proxies\/\">CAPTCHA solving services<\/a> available through any Google search, and they generally cost a few dollars for every thousand CAPTCHAs solved. Again, this is not recommended and you\u2019ll end up wasting money in the long run on CAPTCHA solving instead of simply using a larger quantity of proxies.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/proxies-for-scraping-google\/\">Proxies for Preventing Bans and Captchas When Scraping Google<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/scrape-a-website-never-get-blacklisted\/\">How to Scrape a Website and Never Get Blacklisted<\/a><\/li>\n<\/ul>\n<p><picture class=\"aligncenter size-full wp-image-615 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly.jpg.webp 1115w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-300x134.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-768x344.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-1024x458.jpg.webp 1024w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201115%20499'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1115px) 100vw, 1115px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201115%20499'%3E%3C\/svg%3E\" alt=\"Solve CAPTCHAs on the Fly\" width=\"1115\" height=\"499\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly.jpg 1115w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-300x134.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-768x344.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-1024x458.jpg 1024w\" data-sizes=\"(max-width: 1115px) 100vw, 1115px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-615\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly.jpg.webp 1115w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-300x134.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-768x344.jpg.webp 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-1024x458.jpg.webp 1024w\" sizes=\"(max-width: 1115px) 100vw, 1115px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly.jpg\" alt=\"Solve CAPTCHAs on the Fly\" width=\"1115\" height=\"499\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly.jpg 1115w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-300x134.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-768x344.jpg 768w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Solve-CAPTCHAs-on-the-Fly-1024x458.jpg 1024w\" sizes=\"(max-width: 1115px) 100vw, 1115px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>If you\u2019re using <a href=\"https:\/\/hub.phantombuster.com\/reference#buster-solvecaptchabase64\"><strong>PhantomBusters\u2019 Buster Library<\/strong><\/a>, there is a simple way to make automatic calls to multiple CAPTCHA solving services directly from your Chrome Headless.<\/p>\n<p>That kind of automation turns a time-consuming task into a simple one. These few lines of code can potentially turn your proxy script into a real human for thirty seconds or so.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/phantombuster-proxies\/\">Phantombuster Proxies for Web Scraper &#038; Automation Tools<\/a><\/li>\n<\/ul>\n<hr\/>\n<h3 id=\"now-youre-ready-to-scrape\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Now_Youre_Ready_to_Scrape\"><\/span>Now You\u2019re Ready to Scrape<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>As you gain experience in the world of scraping with Chrome Headless, you\u2019ll begin to know where and when you need to set human-like delays to keep your scraper in good graces.<\/p>\n<hr\/>\n<ul>\n<li><a href=\"https:\/\/www.stupidproxy.com\/puppeteer-proxies\/\"  rel=\"noopener noreferrer\"><span id=\"How_to_connect_Puppeteer_with_Luminatis_Super_Proxies\">How to connect Puppeteer with Luminati\u2019s Super Proxies<\/span><\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/rotating-proxies-api-with-curl\/\">How to Use Rotating Proxy API &#038; Proxy lists with CURL for data mining<\/a><\/li>\n<li><a href=\"https:\/\/www.privateproxyreviews.com\/avoid-ip-ban-scraping-never-blocked-blacklisted\/#use-selenium-or-puppeteer-headless-browser\">Headless browser to prevent getting blacklisted or blocked when scraping<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Everybody knows Google Chrome is the market leader when it comes to web browsing. At least, that\u2019s what the statistics say: an overwhelming 61.20 percent of Internet users are browsing with Chrome as of 2018. If Chrome is the leading web browser, then it makes sense that Chrome Headless will be the leading browser for &#8230; <a title=\"Use Chrome Headless and Dedicated Proxies to Scrape Any Website\" class=\"read-more\" href=\"http:\/\/royadata.io\/blog\/use-chrome-headless-and-dedicated-proxies-to-scrape-any-website\/\" aria-label=\"More on Use Chrome Headless and Dedicated Proxies to Scrape Any Website\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":475,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6296"}],"collection":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/comments?post=6296"}],"version-history":[{"count":0,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6296\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media\/475"}],"wp:attachment":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media?parent=6296"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/categories?post=6296"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/tags?post=6296"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}