With Python 3.6 and the requests
module, it is easy to read data from the web. Here are a couple of basic things we can do with the requests
module.
Getting a Status code
import requests r = requests.get('http://www.bluegalaxy.info') print( r.status_code )
Which yields:
200
For more information about HTTP status codes, there are multiple websites where their meanings are detailed. For example:
http://www.restapitutorial.com/httpstatuscodes.html
A status code of ‘200’ means that the website is up and running.
Getting page length
import requests r = requests.get('http://www.bluegalaxy.info') print( len(r.content) )
Which yields:
586
The length of the content here is simply the number of characters in the source of the page. For example, if you took all of the page source and placed it into a string, this is a measurement of the length of the string.
Here is a bigger example:
import requests sites = [ 'http://www.python.org', 'http://www.jython.org', 'http://www.pypy.org', 'http://www.drudgereport.com', 'http://www.youtube.com', 'http://www.phys.org', 'http://www.bluegalaxy.info', 'http://www.bluegalaxy.info/codewalk' ] for url in sites: r = requests.get(url) print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \ .format(url, r.status_code, len(r.content)))
Which yields:
URL: http://www.python.org Status code: 200 Page length: 48761 URL: http://www.jython.org Status code: 200 Page length: 19210 URL: http://www.pypy.org Status code: 200 Page length: 6133 URL: http://www.drudgereport.com Status code: 200 Page length: 31485 URL: http://www.youtube.com Status code: 200 Page length: 516073 URL: http://www.phys.org Status code: 400 Page length: 91 URL: http://www.bluegalaxy.info Status code: 200 Page length: 586 URL: http://www.bluegalaxy.info/codewalk Status code: 200 Page length: 74648
For more information about the requests module see:
http://docs.python-requests.org/en/master/