Learn to cross happiness and sorrow on the long river of ups and downs life.
0%
Python - Autologin Webpage using web scrawler
Posted onEdited onInlinux
,
services
,
webSymbols count in article: 15kReading time ≈13 mins.
In the recent period, the academic record of bear children has dropped due to their indulgence in television , so there is a need to restrict television without affecting the elderly at home.
After thinking about it, I plan to use the crawler technology automatically login the network management switch to limit the speed of the set-top box when the child is after school . when the children is at school release speed of set-top box.
Now children are young but very smart. They can turn on their own TV and change stations to find their favorite TV programs. In the past, the violent method of unplugging the Internet cable was used to prevent children from watching TV , but often forgot to plug it in again which made the elderly could not watch TV
After thinking, I found that IPTV is connected to the Netgear network management switch. We can limit the IPTV’s speed on the switch so that the TV can not be viewed.
Speaking of crawlers, the first thinking is Python. After a google, I decided to use Selenium, Firefox / Chrome to implement the crawler function.
What is Selenium
Github-Selenium
Selenium is a testing tool for web applications. Selenium directly calls the browser for testing, just like the real user is doing. It supports IE (7, 8, 9, 10, 11), Mozilla Firefox, Safari, Google Chrome, Opera, HtmlUnit, phantomjs, Android (requires Selendroid or appium), IOS (requires ios-driver or appium), etc.
Selenium supports C # / JavaScript / Java / Python / Ruby development languages. It uses WebDriver to operate the browser for web testing.
Selenium is mainly used to solve JavaScript rendering problems in crawlers.
What is WebDriver
Webdriver is a programming interface used to interact with the browser. It can be used to open or close the browser, send mouse clicks, simulate keyboard input, and so on.
The W3C defines the WebDriver specification. The most popular WebDrver now is the open source software Selenium WebDriver.
WebDriver contains several modules:
Support for multiple programming languages
An automation framework that provides automated functions such as element search, click, and input for web pages, reducing duplicate coding.
JSON protocol, automation framework and browser-driven middle layer, it provides cross-platform, cross-language capabilities.
Browser driver, through which the browser is called.
Use driver.get( “https://www.shixuen.com“ ) to open URL
Close Browser,driver.close()。
Simulate a mouse click
Let’s add some new features. After opening https://www.shixuen.com, click the article VIM Plugin-YouCompleteMe.
Look at the code first.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#!/usr/bin/env python3 # coding=utf-8
import time from selenium import webdriver
print("Initialize ChromeDriver and open Chrome") driver = webdriver.Chrome() print("Open shixuen.com") driver.get("https://www.shixuen.com") print("Search the link text") article = driver.find_element_by_link_text("VIM Plugin - YouCompleteMe") print("Click the link") article.click() time.sleep(5) print("Close Browser") driver.close()
Key code is driver.find_element_by_link_text( “VIM Plugin - YouCompleteMe” ),search the link text VIM Plugin - YouCompleteMe, returns the object of this node when found.
Search the specified element
Web node code example: <a id="btn_apply" class="btn_class">Apply</a>
Search by ID,driver.find_element_by_id( “btn_apply” )
Search by link text, driver.find_element_by_link_text( “Apply” )
Search by class, driver.find_element_by_class_name( “btn_class” )
Search by xpath, driver.find_element_by_xpath( “//a[@id=’btn_apply’ and @class=’btn_class’]” )
/: Search from the root node
//: Search all node
./: Search Search child nodes under this node
Click the web element
Use code article.click() to simulate a mouse click。
Finding web node code
open the page with a browser first.
press [F12] to bring up Web developer tools
Click the button in the upper left corner of the tool to position the element
Login and configure the Netgear webmanage switch
Next, enter the topic of this article, log in and configure the Netgear network management switch. Still code first.
import time, sys, getopt, os from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.select import Select
Above code already login the Netgear network management switch and automatically limit the speed of the TV port. Each step is written very clearly, just a few more functions and jumps between pages.
Now let us analyze the code:
Use code driver.get( “url” ) to open switch’s webpage.
Then input password and click login button. Here we use WebDriverWait( driver,10 ).until( EC.presence_of_element_located( (By.ID,”password”) ) ). It means using the driver to get the ID of element whtich is password within 10 seconds. If successful, the element object is returned, and if it times out, an error is reported. Note that iBy, EC and WebDriverWait need to be imported before to use. By.ID is searched by ID, and similarly, By.NAME, By.XPATH, By.CLASS_NAME, By.LINK_TEXT, and so on.
Why web use it? Because if the page has not loaded the element which ID is password, our code will report an error. So the code can change to
1 2 3
print ("Wait for loading iframe") time.sleep(10) passwd_input = driver.find_element_by_id("password")
simulate keyboard enter characterspasswd_input.send_keys( gs105e_conf[“password”] ) 3. click the menus in order QoS --> Rate Limit, See below.
4. Because the rate limit is loaded in an iframe, so we need the driver jump from current page to the iframe first, use driver.switch_to.frame (iframe) to jump. See below.
5. Click CheckBox and then modify the ingress and outgress rates. Because the ingress and outgress rates are drop-down lists, we need to use the class selenium.webdriver.support.select to select them. Select( btn_select ).select_by_index( 3 ) select the fourth option from the drop-down list, the first option index is 0.
6. from iframe to main page and click button Apply.
7. Last, click button logout。
Final code
The previous code must be run in a graphical interface because it will pop up a browser window. but our server will report an error when there is no graphical interface, so here we use the --headless option to prevent the browser from loading the graphical interface so that it can run in the terminal.
The following is the final code. We optimized previous code and an option function is added to limit or unlimit the speed of a port on the Netgear network management switch.
import time, sys, getopt, os from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.select import Select
From Monday to Friday, the speed will be limit from 12 to 13 noon, and from 19:00 to 20:40 at night. And Saturday and Sunday TV belongs to the children.