{"id":6273,"date":"2023-10-18T14:47:43","date_gmt":"2023-10-18T14:47:43","guid":{"rendered":"https:\/\/royadata.io\/blog\/?p=6273"},"modified":"2023-10-18T14:47:43","modified_gmt":"2023-10-18T14:47:43","slug":"python-web-scraper-tutorial","status":"publish","type":"post","link":"http:\/\/royadata.io\/blog\/python-web-scraper-tutorial\/","title":{"rendered":"How to Build a Simple Web Scraper with Python (Scrape SERP)"},"content":{"rendered":"<blockquote>\n<p>Do you want to learn how to build web scrapers using Python? Come in now and read our article on how to build a simple web scraper. Codes, together with explanations included.<\/p>\n<\/blockquote>\n<p><picture class=\"aligncenter size-full wp-image-3707 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python.jpg.webp 1009w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-300x149.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-768x381.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201009%20501'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1009px) 100vw, 1009px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201009%20501'%3E%3C\/svg%3E\" alt=\"Web Scraper with Python\" width=\"1009\" height=\"501\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python.jpg 1009w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-300x149.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-768x381.jpg 768w\" data-sizes=\"(max-width: 1009px) 100vw, 1009px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-3707\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python.jpg.webp 1009w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-300x149.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-768x381.jpg.webp 768w\" sizes=\"(max-width: 1009px) 100vw, 1009px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python.jpg\" alt=\"Web Scraper with Python\" width=\"1009\" height=\"501\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python.jpg 1009w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-300x149.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraper-with-Python-768x381.jpg 768w\" sizes=\"(max-width: 1009px) 100vw, 1009px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Have you ever wondered how programmers build web scrapers for extracting data from websites? If you have, then this article has been written specifically for you.<\/p>\n<p>It is no longer news that we live in a data-driven world, and much of the data required by businesses can be found only. By using automation bots known as web scrapers, you can pull required data from websites at high speed.<\/p>\n<blockquote>\n<p>Google does it, so does Yahoo, Semrush, Ahref, and many other data-driven websites.<\/p>\n<\/blockquote>\n<p>I am going to show you how you can start building a web scraper. No, it is not going to be high-end like that of Google. It can\u2019t even be compared to many production-ready web scrapers.<\/p>\n<p>But it is going to be a useful tool that you can use straight away. I choose to build this web scraper for this tutorial because it is something I can personally use \u2013 and it is simple to build. Let start with the problem definition.<\/p>\n<hr\/>\n<h2 id=\"problem-definition\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Problem_Definition\"><\/span><strong>Problem Definition<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Build a web scraper that scrapes Google related keywords and write them into a text file. In essence, what we would be building is an <a href=\"https:\/\/royadata.io\/blog\/how-to-find-proxies-to-use-with-seo-software\/\">SEO tool<\/a> that accepts a search keyword as input and then scrapes the related keywords for you. Just in case you do not know, Google related keywords are keyword suggestions found below search engine listing.<\/p>\n<p><picture class=\"aligncenter wp-image-3688 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition.jpg.webp 816w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-300x114.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-768x292.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20342'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20342'%3E%3C\/svg%3E\" alt=\"Python Problem Definition\" width=\"900\" height=\"342\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition.jpg 816w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-300x114.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-768x292.jpg 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-3688\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition.jpg.webp 816w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-300x114.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-768x292.jpg.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition.jpg\" alt=\"Python Problem Definition\" width=\"900\" height=\"342\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition.jpg 816w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-300x114.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Python-Problem-Definition-768x292.jpg 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The keywords are related to the search keyword you entered into Google search and can be incorporated into an article related to the keyword search for SEO purposes.<\/p>\n<p>There are many paid tools that do this in the market but have other functionalities that our own do not. Because it has been built for a tutorial, I stripped out all the complexities, and this means no exception (error) handling. If you enter a keyword without related keywords, it will throw an exception, and the program will crash.<\/p>\n<hr\/>\n<h2 id=\"requirements\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Requirements\"><\/span><strong>Requirements<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Even though the tutorial is a beginner level tutorial, I expect you to know how to code a little bit in <a href=\"https:\/\/www.python.org\/about\/gettingstarted\/\"  rel=\"noopener noreferrer\">Python<\/a>. You should know the Python data structures such as integer, string, list, tuple, and dictionary. You should also know how to loop through a list using the for-in loop.<\/p>\n<p>Know how to create functions and classes as the code is written in Object-Oriented Programming (OOP) paradigm. You are also expected to know how to read and write HTML for the inspection of data to be scraped.<\/p>\n<p>The required dependencies are only two \u2013 requests and BeautifulSoup.<\/p>\n<ul>\n<li>\n<h3 id=\"requests\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Requests\"><\/span><a href=\"https:\/\/requests.readthedocs.io\/en\/master\/\"  rel=\"noopener noreferrer\"><strong>Requests<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<\/li>\n<\/ul>\n<p><picture class=\"aligncenter wp-image-3693 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python.jpg.webp 986w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-300x152.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-768x388.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20455'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20455'%3E%3C\/svg%3E\" alt=\"HTTP library for Python\" width=\"900\" height=\"455\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python.jpg 986w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-300x152.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-768x388.jpg 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-3693\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python.jpg.webp 986w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-300x152.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-768x388.jpg.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python.jpg\" alt=\"HTTP library for Python\" width=\"900\" height=\"455\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python.jpg 986w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-300x152.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/HTTP-library-for-Python-768x388.jpg 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>This is an HTTP library for Python. This library is used for sending HTTP requests. While you can use the urllib module in the standard library, Requests is a better option. Use the \u201cpip install requests\u201d command to install this library.<\/p>\n<ul>\n<li>\n<h3 id=\"beautifulsoup\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"BeautifulSoup\"><\/span><a href=\"https:\/\/www.crummy.com\/software\/BeautifulSoup\/bs4\/doc\/\"  rel=\"noopener noreferrer\"><strong>BeautifulSoup<\/strong><\/a><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<\/li>\n<\/ul>\n<p><picture class=\"aligncenter wp-image-3695 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python.jpg.webp 950w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-300x132.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-768x337.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20395'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20395'%3E%3C\/svg%3E\" alt=\"BeautifulSoup for Python\" width=\"900\" height=\"395\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python.jpg 950w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-300x132.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-768x337.jpg 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-3695\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python.jpg.webp 950w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-300x132.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-768x337.jpg.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python.jpg\" alt=\"BeautifulSoup for Python\" width=\"900\" height=\"395\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python.jpg 950w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-300x132.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/BeautifulSoup-for-Python-768x337.jpg 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><br \/>\nBeautifulSoup is a HTML and XML documents parser for Python. With this library, you can parse data from webpages. Installing this one too is easy, just use the \u201cpip install beautifulsoup4\u201d command in your command prompt.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/scrapy-vs-selenium-vs-beautifulsoup-for-web-scraping\/\">Scrapy Vs. Beautifulsoup Vs. Selenium for Web Scraping<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/web-scraping-with-python\/\">Python Web Scraping Libraries and Framework<\/a><\/li>\n<\/ul>\n<p>Without the above two libraries installed, you won\u2019t be able to follow this tutorial. Install them before we continue.<\/p>\n<div class=\"su-youtube su-u-responsive-media-yes\">\n<div class=\"perfmatters-lazy-youtube\" data-src=\"https:\/\/www.youtube.com\/embed\/-z0WKmGE_Qs\" data-id=\"-z0WKmGE_Qs\" data-query onclick=\"if (!window.__cfRLUnblockHandlers) return false; perfmattersLazyLoadYouTube(this);\" data-cf-modified-9128fef0e15c92a7677bd3c5->\n<div><img loading=\"lazy\" decoding=\"async\" class=\"perfmatters-lazy\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20480%20360%3E%3C\/svg%3E\" data-src=\"https:\/\/i.ytimg.com\/vi\/-z0WKmGE_Qs\/hqdefault.jpg\" alt=\"YouTube video\" width=\"480\" height=\"360\" data-pin-nopin=\"true\"><\/p>\n<div class=\"play\"><\/div>\n<\/div>\n<\/div>\n<p><noscript><iframe loading=\"lazy\" width=\"600\" height=\"400\" src=\"https:\/\/www.youtube.com\/embed\/-z0WKmGE_Qs?\" frameborder=\"0\" allowfullscreen allow=\"autoplay; encrypted-media; picture-in-picture\" title=\"\"><\/iframe><\/noscript><\/div>\n<hr\/>\n<h2 id=\"step-by-step-python-web-scraping-tutorial\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_By_Step_%E2%80%93_Python_Web_Scraping_Tutorial\"><\/span>Step By Step \u2013 Python Web Scraping Tutorial<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<h3 id=\"step-1-inspect-html-of-google-search-engine-result-pages-serp\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_1_Inspect_HTML_of_Google_Search_Engine_Result_Pages_SERP\"><\/span><strong>Step 1: Inspect HTML of Google Search Engine Result Pages (SERP)<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The first step in every web scraping exercise is to inspect the HTML of the page. This is because when you send an HTTP GET request to a page, the whole page will be downloaded. You need to know where to look for the data you are interested in. Only then can you extract the data.<\/p>\n<p><picture class=\"aligncenter wp-image-3698 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP.jpg.webp 1216w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-300x146.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-1024x497.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-768x373.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20437'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20437'%3E%3C\/svg%3E\" alt=\"Inspect HTML of SERP\" width=\"900\" height=\"437\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP.jpg 1216w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-300x146.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-1024x497.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-768x373.jpg 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-3698\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP.jpg.webp 1216w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-300x146.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-1024x497.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-768x373.jpg.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP.jpg\" alt=\"Inspect HTML of SERP\" width=\"900\" height=\"437\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP.jpg 1216w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-300x146.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-1024x497.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Inspect-HTML-of-SERP-768x373.jpg 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<ul>\n<li>Start by searching for the phrase \u201cpython tutorials\u201d and scroll down to the bottom of the page where the list of related keywords is displayed.<\/li>\n<li>Right-click on the section of related keywords and select \u201cInspect Element.\u201d<\/li>\n<li>You will see that the whole section of the related search keyword is embedded within a div element with a class attribute \u2013 <strong>card-section<\/strong>.<\/li>\n<li>Usually, the keywords in this section are eight (8) in numbers, divided into two (2) columns \u2013 each column consisting of four (4) keywords. Each of the two keywords is embedded each within a div element with a class attribute \u2013 <strong>brs-col.<\/strong><\/li>\n<li>For each of the columns of 4 keywords, the keywords are embedded as anchor elements (<a>) within a paragraph element\n<p> with a class attribute \u2013 <strong>nVcaUb<\/strong>.<\/li>\n<\/ul>\n<p>From the above, for you to reach any of the 8 keywords, you need to follow this path \u2013 div (class:card-section) -> div (class:brs-col) \u2013 > p (class:nVcaUb) \u2013 > a.<\/p>\n<hr\/>\n<h3 id=\"step-2-import-required-libraries\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_2_Import_required_Libraries\"><\/span><strong>Step 2: Import required Libraries<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Launch your desired IDE. For me, PyCharm is a Python IDE of choice. But for this tutorial, I used the Python IDLE that comes when I installed Python on my system. After launching the IDLE, create a new python file (.py) and name it \u201cKeywordScraper.py\u201d then import the required modules.<\/p>\n<p><picture class=\"aligncenter size-full wp-image-3700 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries.png.webp 900w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20283'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20283'%3E%3C\/svg%3E\" alt=\"Import required Libraries\" width=\"900\" height=\"283\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries.png\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries-300x94.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries-768x241.png 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-3700\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries.png.webp 900w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries.png\" alt=\"Import required Libraries\" width=\"900\" height=\"283\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries.png 900w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries-300x94.png 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Import-required-Libraries-768x241.png 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<hr\/>\n<h3 id=\"step-3-create-a-helper-function-for-adding-plus-to-keywords\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_3_Create_a_helper_function_for_adding_plus_to_keywords\"><\/span><strong>Step 3: Create a helper function for adding plus to keywords<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The search URL for the keyword \u201cpython tutorials\u201d is <a href=\"https:\/\/www.google.com\/search?q=python+tutorials\"  rel=\"noopener noreferrer\">https:\/\/www.google.com\/search?q=python+tutorials<\/a>. How this is generated by Google is simple. The search URL without a keyword is <a href=\"https:\/\/www.google.com\/search?q=\"  rel=\"noopener noreferrer\">https:\/\/www.google.com\/search?q=<\/a>. The keyword is usually appended to the string immediately after the q=.<\/p>\n<p>But before the keyword is appended, all the spaces between each word is replaced with a plus sign (+) so \u201cpython tutorials\u201d is converted into \u201cpython+tutorials\u201d. And then, the search URL becomes <a href=\"https:\/\/www.google.com\/search?q=python+tutorials\"  rel=\"noopener noreferrer\">https:\/\/www.google.com\/search?q=python+tutorials<\/a>. Below is the helper function for that.<\/p>\n<div class=\"su-tabs su-tabs-style-default su-tabs-mobile-stack\" data-active=\"1\" data-scroll-offset=\"0\" data-anchor-in-url=\"no\">\n<div class=\"su-tabs-nav\"><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Sample<\/span><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Code<\/span><\/div>\n<div class=\"su-tabs-panes\">\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Sample\"><picture class=\"aligncenter size-full wp-image-3701 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords.jpg.webp 600w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords-300x127.jpg.webp 300w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20600%20253'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 600px) 100vw, 600px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20600%20253'%3E%3C\/svg%3E\" alt=\"helper function for adding plus to keywords\" width=\"600\" height=\"253\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords.jpg 600w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords-300x127.jpg 300w\" data-sizes=\"(max-width: 600px) 100vw, 600px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-3701\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords.jpg.webp 600w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords-300x127.jpg.webp 300w\" sizes=\"(max-width: 600px) 100vw, 600px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords.jpg\" alt=\"helper function for adding plus to keywords\" width=\"600\" height=\"253\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords.jpg 600w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/helper-function-for-adding-plus-to-keywords-300x127.jpg 300w\" sizes=\"(max-width: 600px) 100vw, 600px\"\/>\n<\/picture>\n<\/noscript><\/div>\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Code\">\n<pre><code>import requests\n\nfrom bs4 import BeautifulSoup\n\n\n\ndef add_plus(keywords):\n\n\tkeywords = keywords.split()\n\n\tkeyword_edited = \"\"\n\n\tfor i in keywords:\n\n\t\tkeyword_edited += i + \"+\"\n\n\tkeyword_edited = keyword_edited[:-1]\n\n\treturn keyword_edited<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<hr\/>\n<h3 id=\"step-4-create-a-keywordscraper-class-and-initialize-it\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_4_Create_a_KeywordScraper_Class_and_initialize_it\"><\/span><strong>Step 4: Create a KeywordScraper Class and initialize it<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Create a KeywordScraper Class that accepts only one parameter \u2013 which is the keyword. After creating the class, initialize it with the following variables.<\/p>\n<div class=\"su-tabs su-tabs-style-default su-tabs-mobile-stack\" data-active=\"1\" data-scroll-offset=\"0\" data-anchor-in-url=\"no\">\n<div class=\"su-tabs-nav\"><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Sample<\/span><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Code<\/span><\/div>\n<div class=\"su-tabs-panes\">\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Sample\"><picture class=\"aligncenter size-full wp-image-3702 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class-.jpg.webp 811w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--300x108.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--768x277.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20811%20293'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 811px) 100vw, 811px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20811%20293'%3E%3C\/svg%3E\" alt=\"Create a KeywordScraper Class\" width=\"811\" height=\"293\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class-.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class-.jpg 811w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--300x108.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--768x277.jpg 768w\" data-sizes=\"(max-width: 811px) 100vw, 811px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-3702\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class-.jpg.webp 811w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--300x108.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--768x277.jpg.webp 768w\" sizes=\"(max-width: 811px) 100vw, 811px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class-.jpg\" alt=\"Create a KeywordScraper Class\" width=\"811\" height=\"293\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class-.jpg 811w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--300x108.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-a-KeywordScraper-Class--768x277.jpg 768w\" sizes=\"(max-width: 811px) 100vw, 811px\"\/>\n<\/picture>\n<\/noscript><\/div>\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Code\">\n<pre><code>class KeywordScraper:\n\n        def __init__(self, keyword):\n\n                self.keyword = keyword\n\n                plusified_keyword = add_plus(keyword)\n\n                self.keywords_scraped = []\n\n                self.search_string = \"https:\/\/www.google.com\/search?q=\" + plusified_keyword\n\n<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<ul>\n<li><strong>keyword<\/strong> \u2013 for storing the keyword to be searched<\/li>\n<li><strong>plusified_keyword<\/strong> \u2013 for storing the keyword above, but with space between words converted to plus (+). As you can see from the screenshot below, the add_plus helper function was used in the conversion.<\/li>\n<li><strong>keywords_scraped<\/strong> \u2013 an empty list meant for holding the scraped keywords. Initialize as an empty list ([]).<\/li>\n<li><strong>search_string<\/strong> \u2013 holds URL of Google Search for your keyword. See how the \u201cplusified\u201d keywords were appended to form the full URL.<\/li>\n<\/ul>\n<hr\/>\n<h3 id=\"step-5-create-method-for-scraping-serp-within-the-keywordscraper-class\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_5_Create_Method_for_Scraping_SERP_within_the_KeywordScraper_Class\"><\/span><strong>Step 5: Create Method for Scraping SERP within the KeywordScraper Class<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The method name in the class is <strong>scrape_SERP<\/strong>. As you can see below, the first variable is a dictionary (dic) with the variable name \u2013 header. <a href=\"https:\/\/royadata.io\/blog\/user-agent\/\">The string passed as value for User-Agent<\/a> is the user agent of my browser. This is very important. Google serves different versions of its pages, depending on a user\u2019s user agent.<\/p>\n<p>I tried running the same code on my mobile IDE without the user agent, and it fails to pass because the HTML document delivered isn\u2019t the same as the one I used in parsing. You can experiment with different headers to see which work for this code and which doesn\u2019t.<\/p>\n<div class=\"su-tabs su-tabs-style-default su-tabs-mobile-stack\" data-active=\"1\" data-scroll-offset=\"0\" data-anchor-in-url=\"no\">\n<div class=\"su-tabs-nav\"><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Sample<\/span><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Code<\/span><\/div>\n<div class=\"su-tabs-panes\">\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Sample\"><picture class=\"aligncenter wp-image-3703 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP.jpg.webp 1218w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-300x118.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-1024x403.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-768x302.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20354'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20354'%3E%3C\/svg%3E\" alt=\"Create Method for Scraping SERP\" width=\"900\" height=\"354\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP.jpg 1218w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-300x118.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-1024x403.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-768x302.jpg 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-3703\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP.jpg.webp 1218w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-300x118.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-1024x403.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-768x302.jpg.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP.jpg\" alt=\"Create Method for Scraping SERP\" width=\"900\" height=\"354\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP.jpg 1218w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-300x118.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-1024x403.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Create-Method-for-Scraping-SERP-768x302.jpg 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/div>\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Code\">\n<pre><code>def scrape_SERP(self):\n\n                headers = {'User-Agent': 'Mozilla\/5.0 (Windows NT 10.0) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/80.0.3987.132 Safari\/537.36'}\n\n                content = requests.get(self.search_string, headers=headers).text\n\n                soup = BeautifulSoup(content, \"html.parser\")\n\n                related_keyword_section = soup.find(\"div\", {\"class\":\"card-section\"})\n\n                keywords_cols = related_keyword_section.find_all(\"div\", {\"class\":\"brs_col\"})\n\n\n\n                for col in keywords_cols:\n\n                        list_of_keywords = col.find_all(\"p\", {\"class\":\"nVcaUb\"})\n\n                        for i in list_of_keywords:\n\n                                self.keywords_scraped.append(i.find(\"a\").text)<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<p>The content variable holds a string of the whole HTML of Google SERP for the keyword, \u201cPython tutorials.\u201d This was downloaded using the get method of the requests library \u2013 you can see the headers variable added as a parameter for the requests.get(). At this point, the page has been downloaded and stored in the content variable. What is required is <a href=\"https:\/\/royadata.io\/blog\/data-parsing\/\">parsing<\/a>.<\/p>\n<p>BeautifulSoup is used for parsing the downloaded page. To learn how to use BeautifulSoup, visit <a href=\"https:\/\/www.crummy.com\/software\/BeautifulSoup\/bs4\/doc\/\"  rel=\"noopener noreferrer\">the BeautifulSoup documentation website<\/a>. Looking at the code, you will see that BeautifulSoup takes two parameters \u2013 content to be parsed and the parsing engine to be used. After initializing it, you can start searching for the required data.<\/p>\n<p>As you can see, the code first searched for the related keyword container (a div element with class <strong>card-section<\/strong>). After this, it then searched for the two div, each representing a column with the class name <strong>brs-col, <\/strong>housing 4 keywords each.<\/p>\n<p>The code then loops through the two div, searching for p elements with class name <strong>nVacUb<\/strong>. Each of these houses an anchor element (link), which has the keyword as its name. After getting each keyword, it is added to the self.keywords_scraped variable.<\/p>\n<hr\/>\n<h3 id=\"step-6-create-a-database-writing-method\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_6_Create_a_Database_Writing_Method\"><\/span><strong>Step 6: Create a Database Writing Method<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>From the above, some will argue that you have successfully scraped the required data. But I choose to say unless you save it in persistent storage, the tutorial is not complete. In what storage do you save your data? There are many options; you can save your data in a CSV file, a database system such as SQLite, or even MySQL. In this simple tutorial, we will be saving our data in a .txt file.<\/p>\n<div class=\"su-tabs su-tabs-style-default su-tabs-mobile-stack\" data-active=\"1\" data-scroll-offset=\"0\" data-anchor-in-url=\"no\">\n<div class=\"su-tabs-nav\"><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Sample<\/span><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Code<\/span><\/div>\n<div class=\"su-tabs-panes\">\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Sample\"><picture class=\"aligncenter wp-image-3704 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method.jpg.webp 1238w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-300x141.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-1024x480.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-768x360.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20862%20404'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 862px) 100vw, 862px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20862%20404'%3E%3C\/svg%3E\" alt=\"Database Writing Method\" width=\"862\" height=\"404\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method.jpg 1238w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-300x141.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-1024x480.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-768x360.jpg 768w\" data-sizes=\"(max-width: 862px) 100vw, 862px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-3704\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method.jpg.webp 1238w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-300x141.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-1024x480.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-768x360.jpg.webp 768w\" sizes=\"(max-width: 862px) 100vw, 862px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method.jpg\" alt=\"Database Writing Method\" width=\"862\" height=\"404\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method.jpg 1238w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-300x141.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-1024x480.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Database-Writing-Method-768x360.jpg 768w\" sizes=\"(max-width: 862px) 100vw, 862px\"\/>\n<\/picture>\n<\/noscript><\/div>\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Code\">\n<pre><code>         def write_to_file(self):\n\n                for keyword in self.keywords_scraped:\n\n                        with open(\"scraped keywords.txt\", \"a\") as f:\n\n                                f.write(keyword + \"\\n\")\n\n                print(\"keywords related to \" + self.keyword + \" scraped successfully\"<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<p>Look at the method below; the code opens a file known using the <strong>open <\/strong>function and passes the value \u201cscraped keywords.txt\u201d as an argument. If this file does not exist, the script will create it, and if it already exists, it will write each keyword on a separate line.<\/p>\n<hr\/>\n<h3 id=\"step-7-running-the-code\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Step_7_Running_the_Code\"><\/span><strong>Step 7: Running the Code<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>To run the script, create an instance of the KeywordScraper Class \u2013 I named the variable \u201c<strong>s\u201d<\/strong> and passed the keyword \u201cpython tutorials\u201d as a parameter. You can pass any meaningful keyword, such as \u201cBest gaming pc\u201d, and you will get keywords scraped for that keyword you pass as a parameter.<\/p>\n<div class=\"su-tabs su-tabs-style-default su-tabs-mobile-stack\" data-active=\"1\" data-scroll-offset=\"0\" data-anchor-in-url=\"no\">\n<div class=\"su-tabs-nav\"><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Sample<\/span><span class data-url data-target=\"blank\" tabindex=\"0\" role=\"button\">Code<\/span><\/div>\n<div class=\"su-tabs-panes\">\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Sample\"><picture class=\"aligncenter wp-image-3705 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code.jpg.webp 1241w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-300x160.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-1024x545.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-768x408.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20479'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 900px) 100vw, 900px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20900%20479'%3E%3C\/svg%3E\" alt=\"Running the Code\" width=\"900\" height=\"479\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code.jpg 1241w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-300x160.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-1024x545.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-768x408.jpg 768w\" data-sizes=\"(max-width: 900px) 100vw, 900px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter wp-image-3705\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code.jpg.webp 1241w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-300x160.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-1024x545.jpg.webp 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-768x408.jpg.webp 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code.jpg\" alt=\"Running the Code\" width=\"900\" height=\"479\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code.jpg 1241w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-300x160.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-1024x545.jpg 1024w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Running-the-Code-768x408.jpg 768w\" sizes=\"(max-width: 900px) 100vw, 900px\"\/>\n<\/picture>\n<\/noscript><\/div>\n<div class=\"su-tabs-pane su-u-clearfix su-u-trim\" data-title=\"Code\">\n<pre><code>s = KeywordScraper(\"Best gaming pc\")\n\ns.scrape_SERP()\n\ns.write_to_file()\n\n<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<p>After creating an instance of the class, call the <strong>scrape_SERP <\/strong>method then the <strong>write_to_file <\/strong>method. After doing this, you are done writing the code. It is time to run your code. Run it now, and if everything goes well, just check the folder where your script is located, and you will see a new text file with the name \u201cscraped keywords.txt.\u201d Open the file, and you will see the keywords scraped for you.<\/p>\n<blockquote>\n<p><strong><a href=\"https:\/\/www.bestproxyreviews.com\/wp-content\/uploads\/KeywordScraper.py\"  rel=\"noopener noreferrer\">Click Here<\/a> Now to Download the Full KeywordScraper.py Script.<\/strong><\/p>\n<\/blockquote>\n<hr\/>\n<h2 id=\"how-to-improve-this-web-scraper\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"How_to_Improve_this_Web_Scraper\"><\/span><strong>How to Improve this Web Scraper<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>No doubt, this web scraper is not like the dummy scrapers you see in most tutorials, this can actually be useful for SEO. However, there\u2019s a lot of room for improvement.<\/p>\n<p>As I stated earlier, it does not handle exceptions \u2013 this should be the first improvement you should make to handle different cases of errors like keyword not having related keywords to be scraped. You can even go further to scrape related questions in addition to keywords. Making the web scraper multitask in other to scrape more pages at a time will also make it better.<\/p>\n<p>The truth is, you cannot use this tool to scrape thousands of keywords as Google will discover you are using a bot and <strong>will block you<\/strong>. To prevent any form of a block, you should extend the bot to use proxies.<\/p>\n<p>For Google, I will advise you to use <a href=\"https:\/\/royadata.io\/blog\/residential-proxies\/\"><strong>residential proxies<\/strong><\/a> such as <a class=\"thirstylink\" title=\"luminati-static-residential-proxies\" href=\"###luminati-static-residential-proxies\/\"  rel=\"nofollow noopener noreferrer\">Luminati<\/a>, <a href=\"###smartproxy\/\"  rel=\"noopener noreferrer\">Smartproxy<\/a>, <a class=\"thirstylink\" title=\"stormproxies\" href=\"###stormproxies\/\"  rel=\"nofollow noopener noreferrer\">Stormproxies<\/a>. I will also advise you to set up an alert system or a logging system to notify you if the structure of the page has changed, and such, the code is unable to work as expected. This is important because Google changes the structure of their pages every now and then.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/proxies-for-scraping-google\/\">Proxies for Preventing Bans and Captchas When Scraping Google<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/seo-proxies\/\">SEO Proxies to Master Google \u2013 Scraping Search Engines without Block and Captchas!<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"conclusion\" class=\"ftwp-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span><strong>Conclusion <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Building a simple web scraper isn\u2019t a difficult task because you probably have one website to scrap, and the webpage is structured. It also does not require multithreading and certainly does not have to think of request limits if you aren\u2019t going to be sending a huge number of requests per minute.<\/p>\n<p>The main problem comes when you are developing a complex web scraper. Even then, with proper planning and learning, the problems can be overcome.<\/p>\n<hr\/>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/serp-api\/\">15 Best SERP (Search Engine Results Page) API<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/use-chrome-headless-and-dedicated-proxies-to-scrape-any-website\/\"  rel=\"noopener noreferrer\">Use Chrome Headless and Dedicated Proxies to Scrape Any Website<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/how-to-scrape-linkedin-using-proxies\/\">How to Scrape Data from Linkedin Using Proxies<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/scraping-craigslist\/\">The Ultimate Guide to Scraping Craigslist Data with Software<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Do you want to learn how to build web scrapers using Python? Come in now and read our article on how to build a simple web scraper. Codes, together with explanations included. Have you ever wondered how programmers build web scrapers for extracting data from websites? If you have, then this article has been written &#8230; <a title=\"How to Build a Simple Web Scraper with Python (Scrape SERP)\" class=\"read-more\" href=\"http:\/\/royadata.io\/blog\/python-web-scraper-tutorial\/\" aria-label=\"More on How to Build a Simple Web Scraper with Python (Scrape SERP)\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":452,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6273"}],"collection":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/comments?post=6273"}],"version-history":[{"count":0,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6273\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media\/452"}],"wp:attachment":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media?parent=6273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/categories?post=6273"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/tags?post=6273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}