{"id":6417,"date":"2023-10-18T14:47:43","date_gmt":"2023-10-18T14:47:43","guid":{"rendered":"https:\/\/royadata.io\/blog\/?p=6417"},"modified":"2023-10-18T14:47:43","modified_gmt":"2023-10-18T14:47:43","slug":"open-source-web-scraper","status":"publish","type":"post","link":"http:\/\/royadata.io\/blog\/open-source-web-scraper\/","title":{"rendered":"15 Best Open-Source Web Scraper for 2022"},"content":{"rendered":"<blockquote>\n<p>Are you looking for open-source web scrapers to use for your next web scraping project? On this page, we list some of the best open-source web scrapers in the market.<\/p>\n<\/blockquote>\n<p><picture class=\"aligncenter size-full wp-image-8906 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-300x167.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-768x426.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20555'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20555'%3E%3C\/svg%3E\" alt=\"Open-Source Web Scraper\" width=\"1000\" height=\"555\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-300x167.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-768x426.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8906\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-300x167.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-768x426.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper.jpg\" alt=\"Open-Source Web Scraper\" width=\"1000\" height=\"555\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-300x167.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Open-Source-Web-Scraper-768x426.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p><a href=\"https:\/\/royadata.io\/blog\/web-scraping\/\">Web scraping<\/a> is the automated means of using computer programs to extract data from web pages. It is incredibly important for gathering data available online, and as you already know \u2013 the Internet is an enormous source of data. As a programmer, you can develop web scrapers from scratch, but that will be a hell of a work for you to do \u2013 and except you are experienced \u2013 you will have a bug-filled web scraper that is not upgradeable and scalable.<\/p>\n<p>What then is the best option for you? My advice for you is to make use of <a href=\"https:\/\/royadata.io\/blog\/web-scraping-tools\/\">web scraping libraries and frameworks<\/a> that makes the development of web scrapers easy. While this means not inventing the wheel, it also means you will save development time.<\/p>\n<p>One thing you will come to like about open source web scraping libraries and frameworks is that they are free to use. I have used a good number of them across multiple programming languages to help speed up development time and have a clean code that is easy to understand.<\/p>\n<p>I know some of the best open-source web scrapers out there, and in this article, I will be discussing some of the best open-source web scrapers out there.<\/p>\n<hr\/>\n<h2 id=\"scrapy-python\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Scrapy_Python\"><\/span><a href=\"https:\/\/scrapy.org\/\"  rel=\"noopener noreferrer\"><strong>Scrapy<\/strong><\/a> (Python)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/scrapy.org\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8823 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-300x142.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-768x363.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20473'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20473'%3E%3C\/svg%3E\" alt=\"Scrapy Homepage\" width=\"1000\" height=\"473\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-300x142.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-768x363.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8823\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-300x142.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-768x363.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage.jpg\" alt=\"Scrapy Homepage\" width=\"1000\" height=\"473\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-300x142.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Scrapy-Homepage-768x363.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The Scrapy web scraping framework is arguably the most popular web scraping framework you can use to develop scalable and high-performing web scraper. This is because it is the number web scraping framework for developing scrapers and crawlers using the Python programming language \u2013 and Python is the most popular programming language among web scraper developers.<\/p>\n<p>This framework is completely an open-source tool maintained by Scrapinghub, a popular name in the web scraping industry. Scrapy is fast, powerful, and incredibly easy to extend with new functionality. One thing you will come to like about this one is that it is a complete framework that comes with both an HTTP library as well as a parsing tool.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/scrapy-vs-selenium-vs-beautifulsoup-for-web-scraping\/\">Scrapy Vs. Beautifulsoup Vs. Selenium for Web Scraping<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"pyspider-python\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Pyspider_Python\"><\/span><a href=\"http:\/\/docs.pyspider.org\/en\/latest\/\"  rel=\"noopener noreferrer\"><strong>Pyspider<\/strong><\/a> (Python)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"http:\/\/docs.pyspider.org\/en\/latest\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8824 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-300x126.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-768x322.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20419'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20419'%3E%3C\/svg%3E\" alt=\"Pyspider Homepage\" width=\"1000\" height=\"419\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-300x126.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-768x322.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8824\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-300x126.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-768x322.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage.jpg\" alt=\"Pyspider Homepage\" width=\"1000\" height=\"419\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-300x126.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Pyspider-Homepage-768x322.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The Pyspider framework is another framework that you can use to develop scalable web scrapers. From the name, you can tell that it is also a python based tool. This framework was initially developed for writing web crawlers, but you can adapt it and use it for coding powerful web scrapers.<\/p>\n<p>This tool comes with a WebUI script editor, project manager, task monitor, and result viewer, among other features. The Pyspider has support for a good number of databases. It is based on a distributed architecture and has the capability of crawling JavaScript pages \u2013 a feature the Scrapy framework lacks.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/web-scraping-with-python\/\">Python Web Scraping Libraries and Framework<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"heritrix-javascript\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Heritrix_JavaScript\"><\/span><a href=\"https:\/\/github.com\/internetarchive\/heritrix3\"  rel=\"noopener noreferrer\"><strong>Heritrix<\/strong><\/a> (JavaScript)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/github.com\/internetarchive\/heritrix3\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8825 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-300x115.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-768x294.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20383'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20383'%3E%3C\/svg%3E\" alt=\"Heritrix3 Homepage\" width=\"1000\" height=\"383\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-300x115.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-768x294.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8825\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-300x115.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-768x294.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage.jpg\" alt=\"Heritrix3 Homepage\" width=\"1000\" height=\"383\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-300x115.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Heritrix3-Homepage-768x294.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>Unlike the other tools described above, the Heritrix software is a complete crawler that you can use to crawl the Internet. It was developed by the Internet Archive for web archiving. This crawler was written in JavaScript.<\/p>\n<p>Unlike the above, that you have the liberty of not respecting the robots.txt file directives, the Heritrix tool has been designed to respect it. This tool, just like the above, is completely free to use. It is open-source software, and you can contribute to it too. This one is battle-tested and tested for collecting a large amount of data \u2013 you will not have a performance problem using this tool.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/web-scraping-javascript-tutorials\/\">How to scrape HTML from a website Using Javascript?<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"web-harvest-java\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Web-Harvest_Java\"><\/span><a href=\"http:\/\/web-harvest.sourceforge.net\/\"  rel=\"noopener noreferrer\"><strong>Web-Harvest<\/strong><\/a> (Java)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"http:\/\/web-harvest.sourceforge.net\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8826 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage.jpg.webp 802w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-300x177.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-768x452.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20802%20472'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 802px) 100vw, 802px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20802%20472'%3E%3C\/svg%3E\" alt=\"Web Harvest Homepage\" width=\"802\" height=\"472\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage.jpg 802w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-300x177.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-768x452.jpg 768w\" data-sizes=\"(max-width: 802px) 100vw, 802px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8826\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage.jpg.webp 802w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-300x177.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-768x452.jpg.webp 768w\" sizes=\"(max-width: 802px) 100vw, 802px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage.jpg\" alt=\"Web Harvest Homepage\" width=\"802\" height=\"472\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage.jpg 802w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-300x177.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Harvest-Homepage-768x452.jpg 768w\" sizes=\"(max-width: 802px) 100vw, 802px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The Web-Harvest library is a web extraction tool written in Java for Java developers to develop web scrapers for collecting data from web pages. This tool is a complete tool as it comes with an API for sending web requests and downloading web pages. It also comes with support for parsing content from a downloaded web document (HTML document).<\/p>\n<p>This tool comes with support for file handling, looping, HTML and XML handling, conditional operations, exceptional handling, and variable manipulation. It is open source and perfect for writing Java-based web scrapers.<\/p>\n<hr\/>\n<h2 id=\"mechanicalsoup\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"MechanicalSoup\"><\/span><a href=\"https:\/\/mechanicalsoup.readthedocs.io\/en\/stable\/\"  rel=\"noopener noreferrer\"><strong>MechanicalSoup<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/mechanicalsoup.readthedocs.io\/en\/stable\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8827 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-300x151.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-768x386.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20503'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20503'%3E%3C\/svg%3E\" alt=\"Mechanical Soup Homepage\" width=\"1000\" height=\"503\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-300x151.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-768x386.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8827\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-300x151.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-768x386.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage.jpg\" alt=\"Mechanical Soup Homepage\" width=\"1000\" height=\"503\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-300x151.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Mechanical-Soup-Homepage-768x386.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The MechanicalSoup library is another Python-based tool for writing web scrapers. This tool can be used for automating tasks online, which makes it perfect for web scraping. The major setback it has is that it does not support JavaScript-based actions and, as such, not suitable for scraping from JavaScript-rich websites.<\/p>\n<p>If you have used the duo of Requests and BeautifulSoup before, then you will find the MechanicalSoup library easy to use as its mimics their simple APIs. This tool comes with documentation that is easy to understand, making it easy for you to get started with the tool.<\/p>\n<hr\/>\n<h2 id=\"apify-sdk\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Apify_SDK\"><\/span><a href=\"https:\/\/sdk.apify.com\/\"  rel=\"noopener noreferrer\"><strong>Apify SDK<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/sdk.apify.com\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8828 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-300x151.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-768x386.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20502'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20502'%3E%3C\/svg%3E\" alt=\"Apify SDK Homepage\" width=\"1000\" height=\"502\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-300x151.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-768x386.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8828\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-300x151.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-768x386.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage.jpg\" alt=\"Apify SDK Homepage\" width=\"1000\" height=\"502\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-300x151.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apify-SDK-Homepage-768x386.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The Apify SDK is a highly-scalable web scraper developed for the Node.js platform. JavaScript is the Internet&#8217;s language and having a web scraper for it makes a lot of sense. Well, the Apify SDK fills the gap.<\/p>\n<p>This library builds on popular tools like playwright, puppeteer, and Cheerio to deliver large-scale high-performance web scraping and crawling of any website. This library is not just a web scraper; it is a full-fledged automation tool that you can use to automate your actions on the Internet. You can run it on the Apify platform or have it integrated into your code. It is powerful and easy to use.<\/p>\n<hr\/>\n<h2 id=\"apache-nutch\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Apache_Nutch\"><\/span><a href=\"http:\/\/nutch.apache.org\/\"  rel=\"noopener noreferrer\"><strong>Apache Nutch<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"http:\/\/nutch.apache.org\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8829 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-300x133.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-768x341.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20444'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20444'%3E%3C\/svg%3E\" alt=\"Apache Nutch Homepage\" width=\"1000\" height=\"444\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-300x133.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-768x341.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8829\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-300x133.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-768x341.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage.jpg\" alt=\"Apache Nutch Homepage\" width=\"1000\" height=\"444\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-300x133.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Apache-Nutch-Homepage-768x341.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The Apache is a high-performing web scraper you can integrate into your project. If you are looking for a web scraper that gets updated regularly, then the Apache Nutch is a great choice. This web crawler is production-ready and has been around for a while, and can be seen as matured.<\/p>\n<p>The Oregon State University is converting its searching infrastructure from Googletm to the open-source project Nutch. What makes this web scraper stands out is that it is from the Apache Software Foundation. It is completely free to use and open source.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/free-web-scrapers\/\">Free Web Scraping Software &#038; Extension for Non-programmers<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"crawler4j\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Crawler4j\"><\/span><a href=\"https:\/\/github.com\/yasserg\/crawler4j\"  rel=\"noopener noreferrer\"><strong>Crawler4j<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/github.com\/yasserg\/crawler4j\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8830 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-300x118.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-768x303.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20394'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20394'%3E%3C\/svg%3E\" alt=\"Crawler4J Homepage\" width=\"1000\" height=\"394\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-300x118.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-768x303.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8830\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-300x118.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-768x303.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage.jpg\" alt=\"Crawler4J Homepage\" width=\"1000\" height=\"394\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-300x118.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawler4J-Homepage-768x303.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a>The Crawler4j is an open-source Java library for crawling and scraping data from web pages. The tool is easy to use \u2013 thanks to its simple APIs that make it easy to set up. Within minutes, you can set up a multithreaded web scraper that you can use to carry out web data extraction.<\/p>\n<p>All you need is to extend the WebCrawler class, which decides which URLs should be crawled and handles the downloaded page. They provide an easy-to-understand guide on how to use the library. You can check it out on GitHub. Because it is an open-source library, you could contribute to it if you feel the code base needs modification.<\/p>\n<hr\/>\n<h2 id=\"webmagic\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"WebMagic\"><\/span><a href=\"https:\/\/webmagic.io\/en\/\"  rel=\"noopener noreferrer\"><strong>WebMagic<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/webmagic.io\/en\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8832 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage.jpg.webp 854w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-300x162.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-768x415.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20854%20462'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 854px) 100vw, 854px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20854%20462'%3E%3C\/svg%3E\" alt=\"Web Magic Homepage\" width=\"854\" height=\"462\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage.jpg 854w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-300x162.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-768x415.jpg 768w\" data-sizes=\"(max-width: 854px) 100vw, 854px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8832\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage.jpg.webp 854w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-300x162.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-768x415.jpg.webp 768w\" sizes=\"(max-width: 854px) 100vw, 854px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage.jpg\" alt=\"Web Magic Homepage\" width=\"854\" height=\"462\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage.jpg 854w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-300x162.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Magic-Homepage-768x415.jpg 768w\" sizes=\"(max-width: 854px) 100vw, 854px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>WWebMagic, at its core, is a\u00a0 core flexibility web scraper. It is a Java-based scraping tool downloaded using Maven. This works only for extracting data from HTML pages \u2013 if you want to scrape from JavaScript featured websites, then you will need to look elsewhere as the WebMagic does not have support for JavaScript rendering and, as such, not suitable for that.<\/p>\n<p>The library has a simple API interface that makes it easy to integrate into your project. It covers the whole lifecycle of web scraping and crawling, which includes downloading, URL management, content extraction, and persistence.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/how-to-extract-data-from-a-website\/\">How to Extract Data from a Website?<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/web-scraping-api\/\">Web Scraping API to Help Scrape &#038; Extract Data<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"webcollector\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"WebCollector\"><\/span><a href=\"https:\/\/github.com\/CrawlScript\/WebCollector\/blob\/master\/README.md\"  rel=\"noopener noreferrer\"><strong>WebCollector<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/github.com\/CrawlScript\/WebCollector\/blob\/master\/README.md\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8833 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage.jpg.webp 912w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-300x123.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-768x314.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20912%20373'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 912px) 100vw, 912px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20912%20373'%3E%3C\/svg%3E\" alt=\"WebCollector Homepage\" width=\"912\" height=\"373\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage.jpg 912w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-300x123.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-768x314.jpg 768w\" data-sizes=\"(max-width: 912px) 100vw, 912px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8833\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage.jpg.webp 912w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-300x123.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-768x314.jpg.webp 768w\" sizes=\"(max-width: 912px) 100vw, 912px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage.jpg\" alt=\"WebCollector Homepage\" width=\"912\" height=\"373\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage.jpg 912w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-300x123.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/WebCollector-Homepage-768x314.jpg 768w\" sizes=\"(max-width: 912px) 100vw, 912px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The WebCollector is a rugged web scraper and crawler available to Java programmers. You can use it to develop high performing web scraper to help you collect data from web pages. One thing you will come to like about this library is that it is extensible via the use of plugins.<\/p>\n<p>WebCollector integrates CEPF, a well-designed state-of-the-art web content extraction algorithm proposed by Wu, et al. This library is easy to integrate into your custom projects. Being an open-source library, you can access it on GitHub and add to its development there.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/how-to-collect-big-data\/\">How to Collect Big Data<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"crawley\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Crawley\"><\/span><a href=\"https:\/\/github.com\/jmg\/crawley\"  rel=\"noopener noreferrer\"><strong>Crawley<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/github.com\/jmg\/crawley\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8834 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage.jpg.webp 948w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-300x133.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-768x339.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20948%20419'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 948px) 100vw, 948px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20948%20419'%3E%3C\/svg%3E\" alt=\"Crawley Homepage\" width=\"948\" height=\"419\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage.jpg 948w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-300x133.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-768x339.jpg 768w\" data-sizes=\"(max-width: 948px) 100vw, 948px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8834\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage.jpg.webp 948w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-300x133.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-768x339.jpg.webp 768w\" sizes=\"(max-width: 948px) 100vw, 948px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage.jpg\" alt=\"Crawley Homepage\" width=\"948\" height=\"419\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage.jpg 948w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-300x133.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Crawley-Homepage-768x339.jpg 768w\" sizes=\"(max-width: 948px) 100vw, 948px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The Crawley web scraping framework is a framework for developing web scrapers in Python. This framework is based on Non-Blocking I\/O operations and built on Eventlet. The Crawley framework support both relational databases and their non-relational counterparts. With this tool, you can extract data using XPath or Pyquery.<\/p>\n<p>Pyquery is a jQuery-like library for the Python programming language. Crawley comes with native support for cookie handling, which makes it a good scraping tool for websites that make use of cookies for session persistent such as websites you will need to login to.<\/p>\n<hr\/>\n<h2 id=\"portia\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Portia\"><\/span><a href=\"https:\/\/github.com\/scrapinghub\/portia#:~:text=Portia%20is%20a%20tool%20that,scrape%20data%20from%20similar%20pages.\"  rel=\"noopener noreferrer\"><strong>Portia<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/github.com\/scrapinghub\/portia#:~:text=Portia%20is%20a%20tool%20that,scrape%20data%20from%20similar%20pages.\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8835 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage.jpg.webp 910w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-300x128.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-768x327.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20910%20388'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 910px) 100vw, 910px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20910%20388'%3E%3C\/svg%3E\" alt=\"Portia Homepage\" width=\"910\" height=\"388\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage.jpg 910w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-300x128.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-768x327.jpg 768w\" data-sizes=\"(max-width: 910px) 100vw, 910px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8835\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage.jpg.webp 910w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-300x128.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-768x327.jpg.webp 768w\" sizes=\"(max-width: 910px) 100vw, 910px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage.jpg\" alt=\"Portia Homepage\" width=\"910\" height=\"388\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage.jpg 910w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-300x128.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Portia-Homepage-768x327.jpg 768w\" sizes=\"(max-width: 910px) 100vw, 910px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>Portia is the second tool coming from the desk of Scrapinghub that\u2019s present on the list. The Portia web scraper is a different type of web scraper and developed for a different audience. While the others described in the article are developed for developers, the Portia tool has been developed for use even without coding skill.<\/p>\n<p>Portia is an open source is a tool that allows you to visually scrape websites. With Portia, you can annotate a web page to identify the data you wish to extract, and Portia will understand, based on these annotations, how to scrape data from similar pages.<\/p>\n<hr\/>\n<h2 id=\"juant\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Juant\"><\/span><a href=\"https:\/\/jaunt-api.com\/\"  rel=\"noopener noreferrer\"><strong>Juant<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/jaunt-api.com\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8836 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage.jpg.webp 944w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-300x166.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-768x425.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20944%20523'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 944px) 100vw, 944px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20944%20523'%3E%3C\/svg%3E\" alt=\"Jaunt Homepage\" width=\"944\" height=\"523\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage.jpg 944w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-300x166.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-768x425.jpg 768w\" data-sizes=\"(max-width: 944px) 100vw, 944px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8836\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage.jpg.webp 944w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-300x166.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-768x425.jpg.webp 768w\" sizes=\"(max-width: 944px) 100vw, 944px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage.jpg\" alt=\"Jaunt Homepage\" width=\"944\" height=\"523\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage.jpg 944w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-300x166.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Jaunt-Homepage-768x425.jpg 768w\" sizes=\"(max-width: 944px) 100vw, 944px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>Juant is an open-source project developed for JavaScript programmers for the quick development of web automation tools. It comes with a headless browser that makes it possible to automate tasks without revealing itself as a non-browser.<\/p>\n<p>With this tool, you can carry out web scraping tasks easily. You can see this tool as a browser without a GUI that would visit websites, download its content and parse out required data. One thing you will come to like about Juant is that it is built for the modern web and, as such, can be used for scraping JavaScript-rich pages as it can render and execute JavaScript.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/website-downloader-copier\/\">Best Website Downloaders to Copy website locally offline<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"node-crawler\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Node-Crawler\"><\/span><a href=\"https:\/\/node-crawler.readthedocs.io\/zh_CN\/latest\/\"  rel=\"noopener noreferrer\"><strong>Node-Crawler<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/node-crawler.readthedocs.io\/zh_CN\/latest\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8837 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage.jpg.webp 864w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-300x166.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-768x426.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20864%20479'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 864px) 100vw, 864px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20864%20479'%3E%3C\/svg%3E\" alt=\"Node Crawler Homepage\" width=\"864\" height=\"479\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage.jpg 864w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-300x166.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-768x426.jpg 768w\" data-sizes=\"(max-width: 864px) 100vw, 864px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8837\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage.jpg.webp 864w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-300x166.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-768x426.jpg.webp 768w\" sizes=\"(max-width: 864px) 100vw, 864px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage.jpg\" alt=\"Node Crawler Homepage\" width=\"864\" height=\"479\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage.jpg 864w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-300x166.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Node-Crawler-Homepage-768x426.jpg 768w\" sizes=\"(max-width: 864px) 100vw, 864px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>Node-Crawler is another Node.js library for developing web crawlers and scrapers. This Node.js library can be seen as a lightweight library that comes packed with a lot of web scraping features.<\/p>\n<p>It is suitable for a distributed scraping architecture, supports hard coding, and is developed for non-blocking asynchronous IO, which provides great convenience for the scraper\u2019s pipeline operation mechanism. It uses Cheerio for querying DOM elements and parsing, but you can replace that with other DOM parsers. This tool is convenient, efficient, and easy to use.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/data-parsing\/\">What is Data Parsing and Parsing Techniques involved?<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"stormcrawler\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"StormCrawler\"><\/span><a href=\"http:\/\/stormcrawler.net\/\"  rel=\"noopener noreferrer\"><strong>StormCrawler<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"http:\/\/stormcrawler.net\/\"  rel=\"noopener noreferrer\"><picture class=\"aligncenter size-full wp-image-8838 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-300x163.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-768x417.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20543'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20543'%3E%3C\/svg%3E\" alt=\"Storm Crawler Homepage\" width=\"1000\" height=\"543\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-300x163.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-768x417.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-8838\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-300x163.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-768x417.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage.jpg\" alt=\"Storm Crawler Homepage\" width=\"1000\" height=\"543\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-300x163.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Storm-Crawler-Homepage-768x417.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/a><\/p>\n<p>The StormCrawler is a Software Development Kit (SDK) developed for building efficient, high-performance web scrapers and <a href=\"https:\/\/royadata.io\/blog\/web-crawler\/\">crawlers<\/a>. This is based on the Apache Storm and built for distributed web scraper development.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/crawling-vs-scraping\/\">Web Crawling Vs. Web Scraping<\/a><\/li>\n<\/ul>\n<p>The SDK is battle-tested, and it has proven to be scalable, resilient, easy to extent, and efficient. While it has been built with the distributed architecture in mind, you can use it for your small-scale web scraping project, and it will work fine. Because of what it was designed to achieve, it has one of the fastest speeds when it comes to fetching data.<\/p>\n<pre style=\"text-align: center;\"><strong>Conclusion<\/strong><\/pre>\n<p>With open-source software, web scraping has been made easy, and you do not have to pay to make use of a library or framework. One thing you will come to like about this is that your workflow is improved.<\/p>\n<p>You will also have the chance to view the code that powered these web crawlers and scrapers and even adds to the code base if you want to, provided it goes well with the maintainers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Are you looking for open-source web scrapers to use for your next web scraping project? On this page, we list some of the best open-source web scrapers in the market. Web scraping is the automated means of using computer programs to extract data from web pages. It is incredibly important for gathering data available online, &#8230; <a title=\"15 Best Open-Source Web Scraper for 2022\" class=\"read-more\" href=\"http:\/\/royadata.io\/blog\/open-source-web-scraper\/\" aria-label=\"More on 15 Best Open-Source Web Scraper for 2022\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":595,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6417"}],"collection":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/comments?post=6417"}],"version-history":[{"count":0,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6417\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media\/595"}],"wp:attachment":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media?parent=6417"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/categories?post=6417"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/tags?post=6417"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}