{"id":6309,"date":"2023-10-18T14:47:43","date_gmt":"2023-10-18T14:47:43","guid":{"rendered":"https:\/\/royadata.io\/blog\/?p=6309"},"modified":"2023-10-18T14:47:43","modified_gmt":"2023-10-18T14:47:43","slug":"scrape-amazon","status":"publish","type":"post","link":"http:\/\/royadata.io\/blog\/scrape-amazon\/","title":{"rendered":"7 Things to Know Before Scraping Amazon Product Results"},"content":{"rendered":"<p><picture class=\"aligncenter size-full wp-image-407 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon.png.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-300x169.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-768x432.png.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20450'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 800px) 100vw, 800px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20450'%3E%3C\/svg%3E\" alt=\"scrape Amazon\" width=\"800\" height=\"450\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon.png 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-300x169.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-768x432.png 768w\" data-sizes=\"(max-width: 800px) 100vw, 800px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-407\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon.png.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-300x169.png.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-768x432.png.webp 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon.png\" alt=\"scrape Amazon\" width=\"800\" height=\"450\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon.png 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-300x169.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/scrape-Amazon-768x432.png 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>There are a lot of reasons you might want to scrape data from Amazon. As a competing retailer, you might want to keep a database of their pricing data, so you can try to match them. You might want to keep an eye on competitors selling through the Amazon Marketplace. Maybe you want to aggregate review scores from around the Internet, and Amazon is one of the sources you\u2019ll want to use. You could even be selling on Amazon yourself, and using the scraper to keep ahead of others doing the same.<\/p>\n<p>I don\u2019t recommend some of the more black hat uses for data scraping. If you\u2019re scraping product descriptions to use for your own site, all you\u2019re doing is shooting yourself in the foot as far as SEO is concerned. You should avoid basing your business model on scraped Amazon data; more on that later.<\/p>\n<p>There are a lot of pieces of software out there designed to help you scrape Amazon data, as well as some that are general use screen scraping tools. You can get a lot of mileage out of them, but always exercise caution. You\u2019ll want to be very sure of the validity of a piece of software before you drop $400 on it. Oh, and if the product you\u2019re researching is primarily marketed through low-view YouTube videos with a\ufb03liate links in the descriptions, I recommend staying away.<\/p>\n<p>On the other hand, you can use scripts rather than software. Scripts have the benefit of being infinitely configurable, as well as open-source by definition. As long as you have some idea of what you\u2019re doing with the code, you can read it to make sure there\u2019s nothing tricky going on, and you can change it to work exactly the way you want it to. Of course, that relies on you having enough code knowledge to be able to create and change a script. Knowing <a href=\"https:\/\/royadata.io\/blog\/php-detect-proxy-anonymity-level\/\">PHP<\/a>, <a href=\"https:\/\/royadata.io\/blog\/rotating-proxies-api-with-curl\/\">CURL<\/a>, XML, JS and other languages is a good idea.<\/p>\n<p>If you\u2019ve looked into this before, you might ask yourself why you would want to use some third party scraper or a script you barely understand when you could just use <a href=\"https:\/\/aws.amazon.com\/api-gateway\/\"  rel=\"noopener noreferrer\">Amazon\u2019s API<\/a>. It\u2019s true that, for some purposes, Amazon\u2019s API is a good alternative. However, it doesn\u2019t provide you with all of the data you might want. The API is primarily designed for a\ufb03liates to use it for advertising in a custom way, using some method that isn\u2019t covered by a\ufb03liate links or the product box widgets Amazon provides. There\u2019s a lot of data it won\u2019t let you harvest.<\/p>\n<p>Technically, scraping data has been against Amazon\u2019s policy for a long time. It wasn\u2019t until 2012 that they really started enforcing it, however, so a lot of people got away with scraping data for a long time. Finally, when they did, many people considered it an <a href=\"http:\/\/resources.distilnetworks.com\/h\/i\/68216674-disruption-for-sellers-who-use-amazon-repricing-tools\"  rel=\"noopener noreferrer\"><strong>insulting disruption<\/strong><\/a> to their business model, never considering how they were never in the right from the start.<\/p>\n<p><picture class=\"aligncenter size-full wp-image-326 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product.jpg.webp 936w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-300x142.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-768x363.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20936%20442'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 936px) 100vw, 936px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20936%20442'%3E%3C\/svg%3E\" alt=\"Know Before Scraping Amazon Product\" width=\"936\" height=\"442\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product.jpg 936w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-300x142.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-768x363.jpg 768w\" data-sizes=\"(max-width: 936px) 100vw, 936px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-326\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product.jpg.webp 936w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-300x142.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-768x363.jpg.webp 768w\" sizes=\"(max-width: 936px) 100vw, 936px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product.jpg\" alt=\"Know Before Scraping Amazon Product\" width=\"936\" height=\"442\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product.jpg 936w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-300x142.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Know-Before-Scraping-Amazon-Product-768x363.jpg 768w\" sizes=\"(max-width: 936px) 100vw, 936px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The point I\u2019m trying to make here is that if you\u2019re scraping data from Amazon, what you\u2019re doing is against their terms of service. That means you\u2019re always at risk of numerous penalties. Usually, Amazon just shrugs and bans your IP. However, if you\u2019ve been an especially tenacious pest or are using their data in a way they don\u2019t approve of, they are perfectly within their rights to take you to court over it. This is, obviously, something to be avoided.<\/p>\n<p>All of that said, Amazon seems to have slackened up in recent years. <a href=\"https:\/\/sellercentral.amazon.com\/forums\/message.jspa?messageID=2723828\"  rel=\"noopener noreferrer\">This thread<\/a>\u00a0from 2014 indicates that Amazon doesn\u2019t bother with enforcing low-scale scraping blocks. They have automated systems that will slap you with a ban if you cross their path, but they aren\u2019t actively and persistently seeking out and banning all data scrapers. It makes sense; a retailer of their size has so much data to filter through on an hourly basis that it would be impossible to ban every single data scraper.<\/p>\n<p>Before you continue, here are seven things you should know about making Amazon the target of your data scraping. By keeping them in mind, you should be able to keep yourself safe from both automated bans and legal action.<\/p>\n<hr\/>\n<h2 id=\"1-amazon-is-very-liberal-with-ip-bans\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"1_Amazon_is_Very_Liberal_with_IP_Bans\"><\/span>1. Amazon is Very Liberal with IP Bans<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><picture class=\"aligncenter size-full wp-image-325 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans.jpg.webp 931w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-300x102.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-768x261.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20931%20316'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 931px) 100vw, 931px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20931%20316'%3E%3C\/svg%3E\" alt=\"Amazon is Very Liberal with IP Bans\" width=\"931\" height=\"316\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans.jpg 931w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-300x102.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-768x261.jpg 768w\" data-sizes=\"(max-width: 931px) 100vw, 931px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-325\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans.jpg.webp 931w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-300x102.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-768x261.jpg.webp 768w\" sizes=\"(max-width: 931px) 100vw, 931px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans.jpg\" alt=\"Amazon is Very Liberal with IP Bans\" width=\"931\" height=\"316\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans.jpg 931w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-300x102.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Amazon-is-Very-Liberal-with-IP-Bans-768x261.jpg 768w\" sizes=\"(max-width: 931px) 100vw, 931px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The first thing to keep in mind if you\u2019re going to be harvesting data from Amazon is that Amazon very much is liberal with their bans. You won\u2019t be harvesting data while logged into an account, at least, not if you\u2019re smart. That means the only way you\u2019ll be able to be banned is through an IP ban.<\/p>\n<p>The nice thing is that IP bans can be circumvented. The bad thing is that an IP ban from Amazon is not a punishment to be taken lightly. As far as I know, such bans are permanent.<\/p>\n<p>There\u2019s a prevailing attitude across the web, at least among tech circles, that IP bans are ine\ufb00ectual and ine\ufb03cient. They can also be detrimental to the site that uses them, depending on how broad an IP ban they use. You don\u2019t want to ban a whole IP block; you might be eliminating an entire neighborhood from your potential customer base. At the same time, anyone dedicated to getting around an IP ban can do it with relative ease.<\/p>\n<p>This is, more or less, true. IP bans are easy enough to get around, specifically if you\u2019re expecting and preparing for a ban in the first place. That\u2019s what proxy servers are for.<\/p>\n<p>A proxy server, in case you aren\u2019t aware, is a way to filter your IP address. The website, in this case Amazon, will see your connection as coming from the proxy server rather than your home connection. If they ban you, they <a href=\"https:\/\/royadata.io\/blog\/how-to-prevent-proxy-banned\/\">ban the proxy<\/a>, and you can just use a di\ufb00erent proxy.<\/p>\n<p>Therefore, in order to maximize your chances of successfully harvesting all of the data you need on an ongoing basis, you\u2019ll need numerous proxy servers. You want to be able to cycle through them to avoid any one IP being flagged for bot-like activity. You also want to have backups in case any of your proxies are banned, so you can keep harvesting without issues.<\/p>\n<hr\/>\n<h2 id=\"2-amazon-is-very-good-at-detecting-bots\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"2_Amazon_is_Very_Good_at_Detecting_Bots\"><\/span>2. Amazon is Very Good at Detecting Bots<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The number one mistake that scrapers make when harvesting data from Amazon, or any other site with a high profile and a plan to ban scrapers, is using <a href=\"https:\/\/www.stupidproxy.com\/web-scraping-tools\/\">scraper software<\/a> without configuring it properly.<\/p>\n<p>Think about it. If you were tasked with detecting bots and filtering them out from legitimate tra\ufb03c, what would you look for? There are simple things, like the user agent and whether or not it identifies itself as a bot. Those are easily spoofed, though.<\/p>\n<p>A more accurate way of detecting bots is by their behavior. A poorly programmed bot will try to make as many requests as possible as quickly as possible or will make them on a fixed timer. Bots are, by definition, robotic. They repeat actions, they make the same set of actions in the same order with the same timing again and again.<\/p>\n<p>Amazon is very good at distinguishing between bot actions and human actions. Therefore, to avoid your bots being banned, you need to mimic human behavior as much as possible. Don\u2019t be repetitive. Don\u2019t be predictable. Vary your actions, your timing, and your IP. It\u2019s harder to identify a bot when it only accesses a couple of pages. From your end, you have an unbroken stream of data; from their end, a hundred di\ufb00erent users came and performed in a similar way.<\/p>\n<p>Safer for you, harder for them to handle.<\/p>\n<hr\/>\n<h2 id=\"3-you-absolutely-must-follow-laws-and-keep-a-low-profile\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"3_You_Absolutely_Must_Follow_Laws_and_Keep_a_Low_Profile\"><\/span>3. You Absolutely Must Follow Laws and Keep a Low Profile<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>There are some regulations that apply to bots of all sorts, including content scrapers. I hesitate to call them laws, because there\u2019s very little actual legal precedent about all of this. One of the big high profile cases, though, is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Craigslist_Inc._v._3Taps_Inc.\"><strong>Craigslist versus Padmapper<\/strong><\/a>. Padmapper took Craigslist data for real estate listings and laid it over a Google Maps interface. This is a decent business model, and Padmapper still exists today, but Craigslist took o\ufb00ense to the use of their data in a way that didn\u2019t benefit them.<\/p>\n<p>In the end, Craigslist won that case, though it didn\u2019t go to a court judgment. Instead, it was settled out of court. This is a good lesson to take to heart. If you step on the wrong toes \u2013 and don\u2019t comply with cease and desist letters immediately \u2013 you can be the subject of legal action, and you\u2019re usually in the wrong. You can read about that case and others <a href=\"https:\/\/www.quora.com\/What-is-the-legality-of-web-scraping\"  rel=\"noopener noreferrer\"><strong>here<\/strong><\/a>.<\/p>\n<p>Data scrapers on their own do not violate any laws unless you\u2019re harvesting private data, or you\u2019re harvesting at a rate that is disruptive to the operation of the site, such as a DDoS attack would do. Your scraper must act as a public visitor and cannot access internal data or data that an account is required to access.<\/p>\n<p>Otherwise, all of the restrictions placed upon you are more about the way you use that data, rather than the way you obtain it.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/scraping-craigslist\/\">The Ultimate Guide to Scraping Craigslist Data with Software<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"4-never-sell-scraped-data-or-use-it-to-make-a-profit\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"4_Never_Sell_Scraped_Data_or_Use_it_to_Make_a_Profit\"><\/span>4. Never Sell Scraped Data or Use it to Make a Profit<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><picture class=\"aligncenter size-full wp-image-327 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data.jpg.webp 933w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-300x144.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-768x369.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20933%20448'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 933px) 100vw, 933px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20933%20448'%3E%3C\/svg%3E\" alt=\"Never Sell Scraped Data\" width=\"933\" height=\"448\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data.jpg 933w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-300x144.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-768x369.jpg 768w\" data-sizes=\"(max-width: 933px) 100vw, 933px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-327\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data.jpg.webp 933w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-300x144.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-768x369.jpg.webp 768w\" sizes=\"(max-width: 933px) 100vw, 933px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data.jpg\" alt=\"Never Sell Scraped Data\" width=\"933\" height=\"448\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data.jpg 933w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-300x144.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Never-Sell-Scraped-Data-768x369.jpg 768w\" sizes=\"(max-width: 933px) 100vw, 933px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>I say \u201cdon\u2019t use it to make a profit\u201d but what I really mean is don\u2019t use it as a foundation of your business model. Harvesting pricing data so you know what deals exist and what price point you can use to undercut people is fine. Harvesting product data that you use as your own to sell your own products is not.<\/p>\n<p>I mentioned above that you shouldn\u2019t copy product descriptions, because you\u2019ll end up shooting yourself in the foot. This is because of Google\u2019s algorithm, which heavily penalizes copied content. Google knows, obviously, that the product descriptions originated on Amazon. When they see your content, they\u2019ll penalize it, because it\u2019s just low-e\ufb00ort copying from a bigger retailer.<\/p>\n<p>Essentially, there are three core rules you should abide by when using the data you scrape.<\/p>\n<p>1. Never harvest or use data that\u2019s not normally open to the public without an account.<\/p>\n<p>2. Never sell harvested data or make some attempt to profit o\ufb00 of it via a third party.<\/p>\n<p>3. Never base your business model on the data you scrape from any site, Amazon included.<\/p>\n<p>The first rule is very important, because it protects you from the issues that come up with data privacy. That\u2019s the kind of thing that can really get you in trouble if you violate it, and it\u2019s the kind of thing that has actual laws attached.<\/p>\n<p>The second and third are just ways to hide the fact that you scraped Amazon data and to make it less likely that Amazon will target you with any sort of legal action.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/proxies-for-scraping-google\/\">Proxies for Preventing Bans and Captchas When Scraping Google<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"5-always-review-scraping-software-before-using\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"5_Always_Review_Scraping_Software_Before_Using\"><\/span>5. Always Review Scraping Software Before Using<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This is just a general tip for any time you\u2019re getting software from online, particularly in a gray hat or black hat arena. Things like scraping software may not be illegal, but they have a bad reputation, and as such are often the targets of malicious agents.<\/p>\n<p>The first thing you do is, before you buy, investigate. Make sure that there are positive reviews that don\u2019t look like they were paid for by the scraper software developers themselves. You should also look into pricing. There are high-quality scrapers that cost a ton, but there are also pretty good scrapers for cheap or free.<\/p>\n<p>If you\u2019re getting a script or open-source code, you\u2019ll want to look into the code yourself or pay someone to give it an overview for you, so you know what the script is doing. You don\u2019t want to scrape data and save it to a database only to find that the scraper script is also sending the data to a remote location.<\/p>\n<p>This is even more important if you\u2019re using a scraper that requires you yo log in, either with credentials for Amazon or credentials for anything else. Always assume that any password you give to the scraper is stolen, unless you verify yourself that it\u2019s not going to be.<\/p>\n<p>Also, scan the app for viruses. Embedding a virus in a program generally makes antivirus programs block the download, but some will hide it well enough that only a detailed scan will work.<\/p>\n<ul>\n<li>\n<p class=\"dd b de dp dg dq di dr dk ds dm dt dc\"><a href=\"https:\/\/medium.com\/@jesaltnl\/how-to-scrape-amazon-reviews-with-python-code-5fd8ab62d165\">How to Scrape Amazon reviews with python<\/a><\/p>\n<\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"6-implement-a-limit-on-queries-per-second\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"6_Implement_a_Limit_on_Queries_per_Second\"><\/span>6. Implement a Limit on Queries per Second<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This is something else I mentioned earlier, but I\u2019ll go into more detail now. There are two reasons to avoid making too many queries in too short a span.<\/p>\n<p>The first reason is that you\u2019ll be detected as a bot. As I said, it\u2019s very easy to be seen as a bot when you\u2019re making identical requests on di\ufb00erent URLs with the same timing, over and over. It\u2019s a sure-fire way to get your proxy banned.<\/p>\n<p>The second reason is to avoid being accused of DDoSing the database you\u2019re harvesting. It\u2019s not very likely that a bot or script you\u2019re using will have the power to even slightly lag Amazon, but that doesn\u2019t mean they won\u2019t filter your requests based on <a href=\"https:\/\/www.netscout.com\/what-is-ddos\"><strong>DDoS protection<\/strong><\/a> if you make too many too quickly.<\/p>\n<p>You could also consider your own system requirements to be part of the possible issues. There\u2019s only so may operations at a time your computer can handle, and a script making too many requests can overload a network card or modem, or saving all of the data could lag your hard drive if it\u2019s not su\ufb03ciently fast.<\/p>\n<p>Of course, there\u2019s also the issue with proxies. They don\u2019t always allow high throughput, or they might not have a significant amount of lag. That\u2019s the problem with using most proxies; they\u2019re located overseas, which adds significantly to response times.<\/p>\n<hr\/>\n<h2 id=\"7-rotate-your-summer-proxies\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"7_Rotate_Your_Summer_Proxies\"><\/span>7. Rotate Your Summer Proxies<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><picture class=\"aligncenter size-full wp-image-328 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies.jpg.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-300x154.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-768x395.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20411'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 800px) 100vw, 800px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20800%20411'%3E%3C\/svg%3E\" alt=\"Way to want a number of different proxies\" width=\"800\" height=\"411\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies.jpg 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-300x154.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-768x395.jpg 768w\" data-sizes=\"(max-width: 800px) 100vw, 800px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-328\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies.jpg.webp 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-300x154.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-768x395.jpg.webp 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies.jpg\" alt=\"Way to want a number of different proxies\" width=\"800\" height=\"411\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies.jpg 800w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-300x154.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Way-to-want-a-number-of-different-proxies-768x395.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>This is, again, something I mentioned to a minor degree earlier. You\u2019re going to want a number of di\ufb00erent proxies, so as to spread out your requests and make them look much less suspicious. That means buying access to lists of proxies, or a site that will give you access to a lot of them at once.<\/p>\n<p>You should also consider using <a href=\"https:\/\/royadata.io\/blog\/private-proxy-guide\/\"><strong>private proxies rather than public proxies<\/strong><\/a>. Private proxies don\u2019t have any of the issues that plague public proxies. For one thing, they don\u2019t need to interrupt your browsing with ads in order to make a buck. They\u2019re much less laggy due to the limited number of users allowed to access it.<\/p>\n<p>They\u2019re also much less likely to be banned already. Public proxies are the low-quality solution for people who don\u2019t know what they\u2019re doing. Private proxies are the answer for those who demand higher levels of quality from their scraping.<\/p>\n<hr\/>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/how-to-scrape-linkedin-using-proxies\/\">How to Scrape Data from Linkedin Using Proxies<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/use-chrome-headless-and-dedicated-proxies-to-scrape-any-website\/\">Use Chrome Headless and Dedicated Proxies to Scrape Any Website<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/residential-proxies\/\">Picking best Residential Proxies for amazon scraping<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>There are a lot of reasons you might want to scrape data from Amazon. As a competing retailer, you might want to keep a database of their pricing data, so you can try to match them. You might want to keep an eye on competitors selling through the Amazon Marketplace. Maybe you want to aggregate &#8230; <a title=\"7 Things to Know Before Scraping Amazon Product Results\" class=\"read-more\" href=\"http:\/\/royadata.io\/blog\/scrape-amazon\/\" aria-label=\"More on 7 Things to Know Before Scraping Amazon Product Results\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":488,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6309"}],"collection":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/comments?post=6309"}],"version-history":[{"count":0,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6309\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media\/488"}],"wp:attachment":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media?parent=6309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/categories?post=6309"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/tags?post=6309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}