{"id":6218,"date":"2023-10-18T14:47:43","date_gmt":"2023-10-18T14:47:43","guid":{"rendered":"https:\/\/royadata.io\/blog\/?p=6218"},"modified":"2023-10-18T14:47:43","modified_gmt":"2023-10-18T14:47:43","slug":"selenium-web-scraping-python","status":"publish","type":"post","link":"http:\/\/royadata.io\/blog\/selenium-web-scraping-python\/","title":{"rendered":"Web Scraping Using Selenium and Python: The Step-By-Step Guide for Beginner (2023)"},"content":{"rendered":"<blockquote>\n<p>For dynamic sites richly built with JavaScript, Selenium is the tool of choice for extracting data from them. Come in now and read this article to learn how to extract data from web pages using Selenium.<\/p>\n<\/blockquote>\n<p><picture class=\"aligncenter size-full wp-image-7290 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-300x167.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-768x426.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20555'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20555'%3E%3C\/svg%3E\" alt=\"Web Scraping Using Selenium and Python\" width=\"1000\" height=\"555\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-300x167.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-768x426.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-7290\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-300x167.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-768x426.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python.jpg\" alt=\"Web Scraping Using Selenium and Python\" width=\"1000\" height=\"555\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-300x167.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Web-Scraping-Using-Selenium-and-Python-768x426.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>The easiest websites to scrape data from are static pages that all content is downloaded upon request. Sadly, these types of sites are gradually fading out, and dynamic websites are gradually taking over.<\/p>\n<p>With dynamic sites, all content on a page is not provided upon loading a page \u2013 the content is dynamically added after specific JavaScript events, which pose a different problem to <a href=\"https:\/\/royadata.io\/blog\/web-scraping-tools\/\">scraping tools<\/a> designed for static websites. Fortunately enough, with tools like <a href=\"https:\/\/www.selenium.dev\/projects\/\"  rel=\"noopener noreferrer\">Selenium<\/a>, you are able to trigger JavaScript events and scrape any page you want, no matter how JavaScript-rich a page is.<\/p>\n<p>With Selenium, you are not tied to a single language like other tools. Selenium has support for <a href=\"https:\/\/royadata.io\/blog\/web-scraping-with-python\/\">Python<\/a>, Ruby, Java, C#, and <a href=\"https:\/\/royadata.io\/blog\/web-scraping-javascript-tutorials\/\">JavaScript<\/a>. In this article, we will be making use of Selenium and Python to <a href=\"https:\/\/royadata.io\/blog\/how-to-extract-data-from-a-website\/\">extract web data<\/a>. Before we go into that in detail, it is wise if we look at Selenium and instances when you should make use of it.<\/p>\n<hr\/>\n<h2 id=\"selenium-webdriver-an-overview\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Selenium_WebDriver_%E2%80%93_an_Overview\"><\/span><strong>Selenium WebDriver \u2013 an Overview <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p>Selenium was not initially developed for <a href=\"https:\/\/royadata.io\/blog\/web-scraping\/\">web scraping<\/a> \u2013 it was initially developed for testing web applications but has found its usage in web scraping. In technical terms, Selenium or, more appropriately, Selenium WebDriver is a portable framework for testing web applications.<\/p>\n<p>In simple terms, all Selenium does is to automate web browsers. And as the team behind Selenium rightfully put it, what you do with that power is up to you! Selenium has support for Windows, macOS, and Linux. In terms of browser support, you can use it for automating Chrome, Firefox, Internet Explorer, Edge, and Safari. Also important is the fact that Selenium can be extended using third-party plugins.<\/p>\n<p><picture class=\"aligncenter size-full wp-image-7293 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-300x129.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-768x330.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20430'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%201000%20430'%3E%3C\/svg%3E\" alt=\"Selenium WebDriver\" width=\"1000\" height=\"430\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-300x129.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-768x330.jpg 768w\" data-sizes=\"(max-width: 1000px) 100vw, 1000px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-7293\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver.jpg.webp 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-300x129.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-768x330.jpg.webp 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver.jpg\" alt=\"Selenium WebDriver\" width=\"1000\" height=\"430\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver.jpg 1000w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-300x129.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-WebDriver-768x330.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>With Selenium, you can automate filling of forms, clicking buttons, taking a snapshot of a page, and other specific tasks online. One of these tasks is web extraction. While you can use it for web scraping, it is certainly not a Swiss Army Knife of web scraping; it has its own downside that will make you avoid using it for certain use cases.<\/p>\n<p>The most notable of its downsides is its slow speed. If you have tried using <a href=\"https:\/\/royadata.io\/blog\/scrapy-proxy\/\">Scrapy<\/a> or the combo of <a href=\"https:\/\/pypi.org\/project\/requests\/\"  rel=\"noopener noreferrer\">Requests<\/a> and <a href=\"https:\/\/www.crummy.com\/software\/BeautifulSoup\/bs4\/doc\/\"  rel=\"noopener noreferrer\">Beautifulsoup<\/a>, you will have a speed benchmark that will get you to rank Selenium slow. This is not unconnected to the fact that it makes use of a real browser, and rendering has to take place.<\/p>\n<p>For this reason, developers only use Selenium when dealing with JavaScript-rich sites that you will find it difficult to call underlying APIs. With Selenium, all you do is automate the process, and all events will be triggered.<\/p>\n<p>For static sites that you can quickly replicate API requests, and all content is downloaded upon loading, you will want to use the better option, which is Scrapy or the duo of Requests and Beautifulsoup.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/scrapy-vs-selenium-vs-beautifulsoup-for-web-scraping\/\">Scrapy Vs. Beautifulsoup Vs. Selenium for Web Scraping<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"installation-guide\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Installation_Guide\"><\/span><strong>Installation Guide<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p><picture class=\"aligncenter size-full wp-image-7296 perfmatters-lazy\" loading=\"lazy\"><source type=\"image\/webp\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide.jpg.webp 942w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-300x174.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-768x445.jpg.webp 768w\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20942%20546'%3E%3C\/svg%3E\" data-sizes=\"(max-width: 942px) 100vw, 942px\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns='http:\/\/www.w3.org\/2000\/svg'%20viewBox='0%200%20942%20546'%3E%3C\/svg%3E\" alt=\"Selenium Installation Guide\" width=\"942\" height=\"546\" data-src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide.jpg\" data-srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide.jpg 942w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-300x174.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-768x445.jpg 768w\" data-sizes=\"(max-width: 942px) 100vw, 942px\" loading=\"lazy\" \/>\n<\/picture>\n<noscript><picture class=\"aligncenter size-full wp-image-7296\"><source type=\"image\/webp\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide.jpg.webp 942w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-300x174.jpg.webp 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-768x445.jpg.webp 768w\" sizes=\"(max-width: 942px) 100vw, 942px\"\/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide.jpg\" alt=\"Selenium Installation Guide\" width=\"942\" height=\"546\" srcset=\"https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide.jpg 942w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-300x174.jpg 300w, https:\/\/royadata.io\/blog\/wp-content\/uploads\/2023\/10\/Selenium-Installation-Guide-768x445.jpg 768w\" sizes=\"(max-width: 942px) 100vw, 942px\"\/>\n<\/picture>\n<\/noscript><\/p>\n<p>Selenium is a third-party library, and as such, you will need to install it before you can make use of it. Before installing Selenium, make sure you already have Python installed. To install Python, you can <a href=\"https:\/\/www.python.org\/downloads\/\"  rel=\"noopener noreferrer\">visit the Python official download page<\/a>. For Selenium to work, you will need to install the Selenium package and then the specific browser driver you want to automate. You can install the library using pip.<\/p>\n<pre>pip install Selenium<\/pre>\n<p>For browser drivers, they have support for Chrome, Firefox, and many others. Our focus in this article is on Chrome. If you don\u2019t have Chrome installed on your computer, you can <a href=\"https:\/\/www.google.com\/chrome\/\"  rel=\"noopener noreferrer\">download it from the official Google Chrome page<\/a>. With Chrome installed, you can then go ahead and <a href=\"https:\/\/sites.google.com\/a\/chromium.org\/chromedriver\/downloads\"  rel=\"noopener noreferrer\">download the Chrome driver binary here<\/a>.<\/p>\n<p>Make you download the driver for the version of Chrome you have installed. The file is a zip file with the actual driver inside of it. Extract the actual Chrome driver (chromedriver.exe) and place it in the same folder as any Selenium script you are writing.<\/p>\n<hr\/>\n<h2 id=\"selenium-hello-world\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Selenium_Hello_World\"><\/span><strong>Selenium Hello World <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p>As it is in the coding tutorial tradition, we are starting this selenium guide with the classical hello world program. The code does not scrape any data at this point. All it does is attempt to log into an imaginary Twitter account. Let take a look at the code below.<\/p>\n<pre>import time\n\nfrom selenium import webdriver\n\nfrom selenium.webdriver.common.keysimport Keys\n\n\n\nusername = \"concanated\"\n\npassword = \"djhhfhfhjdghsd\"\n\ndriver = webdriver.Chrome()\n\ndriver.get(\"https:\/\/twitter.com\/login\")\n\nname_form = driver.find_element_by_name(\"session[username_or_email]\")\n\nname_form.send_keys(username)\n\npass_form = driver.find_element_by_name((\"session[password]\"))\n\npass_form.send_keys(password)\n\npass_form.send_keys((Keys.RETURN))\n\ntime.sleep(5)\n\ndriver.quit()<\/pre>\n<p>the username and password variables\u2019 values are dummies. When you run the above code, it will launch Chrome and then open the Twitter login page. The username and password will be inputted and then sent.<\/p>\n<p>Because the username and password are not correct, it displays an error message, and after 5 seconds, the browser is closed. As you can see from the above, you need to specify the specific web browser, and you can see we did that on line 7. The get method sends GET requests. After the page has loaded successfully, we use the<\/p>\n<pre>driver.find_element_by_name<\/pre>\n<p>method to find the username and input elements and then use<\/p>\n<pre>.send_keys<\/pre>\n<p>for filling the input fields with the appropriate data.<\/p>\n<hr\/>\n<h2 id=\"sending-web-requests\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Sending_Web_Requests\"><\/span><strong>Sending Web Requests <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p>Sending web requests using Selenium is one of the easiest tasks to do. Unlike in the case of other tools that differentials between POST and GET requests, in Selenium, they are sent the same way. All that\u2019s required is for you to call the get method on the driver passing the URL as an argument. Let see how that is done in action below.<\/p>\n<pre>from selenium import webdriver\n\n\n\ndriver = webdriver.Chrome()\n\n# visit Twitter homepage\n\ndriver.get(\"https:\/\/twitter.com\/\")\n\n# page source\n\nprint(driver.page_source)\n\ndriver.quit()<\/pre>\n<p>Running the code above will launch Chrome in automation mode and visit the Twitter homepage and print the HTML source code of the page using the<\/p>\n<pre>driver.page_source<\/pre>\n<p>. You will see a notification below the address bar telling you Chrome is being controlled by an automated test software.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/playwright-vs-puppeteer-vs-selenium\/\">Playwright Vs. Puppeteer Vs. Selenium: What are the differences?<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"chrome-in-headless-mode\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Chrome_in_Headless_Mode\"><\/span><strong>Chrome in Headless Mode<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p>From the above, Chrome gets launched \u2013 this is the headful approach and used mainly for debugging. If you are ready to launch your script on a server or in a production environment, you wouldn\u2019t want Chrome launched \u2013 you will want it to work in the background. This method of running the Chrome browser without it launching is known as the headless Chrome mode. Below is how to run Selenium Chrome in headless mode.<\/p>\n<pre>from selenium import webdriver\n\nfrom selenium.webdriver.chrome.optionsimport Options\n\n\n\n# Pay attention to the code below\n\noptions = Options()\n\noptions.headless = True\n\ndriver = webdriver.Chrome(options=options)\n\n\n\n# visit Twitter homepage\n\ndriver.get(\"https:\/\/twitter.com\/\")\n\n# page source\n\nprint(driver.page_source)\n\ndriver.quit()<\/pre>\n<p>Running the code above will not launch Chrome for you to see \u2013 all you see is the source code of the page visited. The only difference between this code and the one before it is that this one is running in the headless mode.<\/p>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/headless-browser\/\">Headless Browser 101: Chrome Headless Vs. Firefox Vs. PhantomJS<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/use-chrome-headless-and-dedicated-proxies-to-scrape-any-website\/\">Use Chrome Headless and Dedicated Proxies to Scrape Any Website<\/a><\/li>\n<\/ul>\n<hr\/>\n<h2 id=\"accessing-elements-on-a-page\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Accessing_Elements_on_a_Page\"><\/span><strong>Accessing Elements on a Page<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p>There are basically 3 things involved in web scraping \u2013 sending web requests, parsing page source, and then processing or saving the <a href=\"https:\/\/royadata.io\/blog\/data-parsing\/\">parsed data<\/a>. The first two are usually the focus as they present more challenges.<\/p>\n<p>You have already learned how to send web requests. Now let me show you how to access elements in other to parse out data from them or carry out a task with them. In the code above, we use the<\/p>\n<pre>page_source<\/pre>\n<p>method to access the page source. This is only useful when you want to parse using Beautifulsoup or other parsing libraries. If you want to use Selenium, you do not have to use the<\/p>\n<pre>page_source<\/pre>\n<p>method. <\/p>\n<div class=\"su-list\" style=\"margin-left:0px\">Below are the options available to you.<\/p>\n<ul>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>driver.title<\/pre>\n<p>is for retrieving page title<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>driver.current_url<\/pre>\n<p>for retrieving the URL of the page in view.<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>driver.find_element_by_name<\/pre>\n<p>for retrieving an element by its name, e.g., password input with name password.<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>driver.find_element_by_tag_name<\/pre>\n<p>for retrieving element by tag name such as a, div, span, body, h1, etc.<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>driver.find_element_by_class_name<\/pre>\n<p>for retrieving element by class name<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>driver.find_element_by_id<\/pre>\n<p>for finding element by id.<\/li>\n<\/ul>\n<p>For each of the<\/p>\n<pre>find_element_by***<\/pre>\n<p>methods, there is a corresponding method that retrieves a list of elements instead of one except for<\/p>\n<pre>find_element_by_id<\/pre>\n<p>. Take, for instance, if you want to retrieve all elements with the \u201cthin-long\u201d class, you can make use of the<\/p>\n<pre>driver.find_elements_by_class_name(\u201cthin-long\u201d)<\/pre>\n<p>instead of<\/p>\n<pre>driver.find_element_by_class_name(\u201cthin-long\u201d)<\/pre>\n<p>. The difference is the plurality of the element keyword in the function.<\/p>\n<hr\/>\n<h2 id=\"interacting-with-elements-on-a-page\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"Interacting_with_Elements_on_a_Page\"><\/span><strong>Interacting with Elements on a Page<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<hr\/>\n<p>With the above, you can find specific elements on a page. However, you do not just do that for doing sake; you will need to interact with them either to trigger certain events or retrieve data from them. Let take a look at some of the interactions you can have with elements on a page using Selenium and Python.<\/p>\n<ul>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>element.text<\/pre>\n<p>will retrieve the text attached to an element<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>element.click()<\/pre>\n<p>will trigger the click action and events that follow that<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>element.send_keys(\u201ctest text\u201d)<\/pre>\n<p>is meant for filling input forms<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>element.is_displayed()<\/pre>\n<p>is for detecting if an element is visible to real users or not -this is perfect for honeypot detection.<\/li>\n<li><i class=\"sui sui-hand-o-right\" style=\"color:#0E86D4\"><\/i>\n<pre>element.get_attributes(\u201cclass\u201d)<\/pre>\n<p>for retrieving the value of an element\u2019s attribute. You can change the \u201cclass\u201d keyword for any other attribute.<\/p><\/div>\n<\/li>\n<\/ul>\n<p>With the above, you have what is required to start scraping data from web pages. I will be using the above to scrape the <a href=\"https:\/\/www.britannica.com\/topic\/list-of-state-capitals-in-the-United-States-2119210\"  rel=\"noopener noreferrer\">list of US states their capital, population (census), and estimated population from the Britannica website<\/a>. Take a look at the code below.<\/p>\n<pre>from selenium import webdriver\n\nfrom selenium.webdriver.chrome.optionsimport Options\n\n\n\n# Pay attention to the code below\n\noptions = Options()\n\noptions.headless = True\n\ndriver = webdriver.Chrome(options=options)\n\n\n\ndriver.get(\"https:\/\/www.britannica.com\/topic\/list-of-state-capitals-in-the-United-States-2119210\")\n\nlist_states = []\n\ntrs = driver.find_element_by_tag_name(\"tbody\").find_elements_by_tag_name(\"tr\")\n\nfor iin trs:\n\ntr = i.find_elements_by_tag_name(\"td\")\n\ntr_data = []\n\nfor x in tr:\n\ntr_data.append(x.text)\n\nlist_states.append(tr_data)\n\nprint(list_states)\n\ndriver.quit()<\/pre>\n<p>Looking at the above, we put into practice almost all of what we discussed above. Pay attention to the trs variable. If you look at the source code of the page, you will discover that the list of states and the associated information are contained in a table. The table does not have a class neither does its body.<\/p>\n<p>Interestingly, it is the only table, and as such, we can use the find.element_by_tag_name(\u201ctbody\u201d) method to retrieve the tbody element. Each row in the tbody element represents a state and its information, each embedded in a td element. we called the find.elements_by_tag_name(\u201ctd\u201d) to retrieve the td elements.<\/p>\n<p>The first loop is for iterating through the tr elements. The second one is for iterating through the td elements for each of the tr elements. Element.text was used for retrieving text attached to an element.<\/p>\n<hr\/>\n<h2 id=\"you-have-learnt-the-basics-now-what\" class=\"ftwp-heading\" style=\"text-align: center;\"><span class=\"ez-toc-section\" id=\"You_Have_Learnt_the_Basics_Now_What\"><\/span><strong>You Have Learnt the Basics: Now What?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>From the above, we have been able to show you how to scrape a page using Selenium and Python. However, you need to know that what you have learned is just the basics. There is more you need to learn. You will need to know how to carry out other moves and keyboard actions.<\/p>\n<p>Sometimes, just filling out a form with a text string at once will reveal traffic is <a href=\"https:\/\/royadata.io\/blog\/bot-traffic\/\">bot-originating<\/a>. In instances like that, you will have to mimic typing by filling in each letter one after the other. With Selenium, you can even take a snapshot of a page, execute custom JavaScript, and carry out a lot of automation tasks. I will advise you to learn more about the Selenium web browser on the <a href=\"https:\/\/www.selenium.dev\/documentation\/en\/webdriver\/\"  rel=\"noopener noreferrer\">official Selenium website<\/a>.<\/p>\n<hr\/>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/selenium-proxy\/\">How to Setup Proxies on Selenium<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/how-to-build-a-web-crawler-using-selenium-proxies\/\">Building a Web Crawler Using Selenium and Proxies<\/a><\/li>\n<\/ul>\n<pre style=\"text-align: center;\"><strong>Conclusion<\/strong><\/pre>\n<p>Selenium has its own setback in terms of slow speed. However, it has proven to be the best option when you need to scrape data from a rich JavaScript website.<\/p>\n<p>One thing you will come to like about Selenium is that it makes the whole process of scraping easy as you do not have to deal with <a href=\"https:\/\/royadata.io\/blog\/http-cookies\/\">cookies<\/a> and replicating hard to replicate web requests. Interestingly, it is easy to make use of.<\/p>\n<hr\/>\n<ul>\n<li><a href=\"https:\/\/royadata.io\/blog\/http-headers\/\">What is HTTP Header &#038; How to Inspect HTTP Headers?<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/web-scraping-proxies\/\">Proxy API, Datacenter, Residential Proxies for Scraping<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/scrape-a-website-never-get-blacklisted\/\">How to Scrape a Website and Never Get Blacklisted<\/a><\/li>\n<li><a href=\"https:\/\/royadata.io\/blog\/web-scraping-api\/\"><strong>Web Scraping API to Help Scrape &#038; Extract Data<\/strong><\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>For dynamic sites richly built with JavaScript, Selenium is the tool of choice for extracting data from them. Come in now and read this article to learn how to extract data from web pages using Selenium. The easiest websites to scrape data from are static pages that all content is downloaded upon request. Sadly, these &#8230; <a title=\"Web Scraping Using Selenium and Python: The Step-By-Step Guide for Beginner (2023)\" class=\"read-more\" href=\"http:\/\/royadata.io\/blog\/selenium-web-scraping-python\/\" aria-label=\"More on Web Scraping Using Selenium and Python: The Step-By-Step Guide for Beginner (2023)\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":397,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6218"}],"collection":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/comments?post=6218"}],"version-history":[{"count":0,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/posts\/6218\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media\/397"}],"wp:attachment":[{"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/media?parent=6218"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/categories?post=6218"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/royadata.io\/blog\/wp-json\/wp\/v2\/tags?post=6218"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}