With Python 3.6 and the
requests
requests
module, it is easy to read data from the web. Here are a couple of basic things we can do with the requests
requests
module.
Getting a Status code
import requests
r = requests.get('http://www.bluegalaxy.info')
print( r.status_code )
import requests
r = requests.get('http://www.bluegalaxy.info')
print( r.status_code )
import requests r = requests.get('http://www.bluegalaxy.info') print( r.status_code )
Which yields:
200
200
200
For more information about HTTP status codes, there are multiple websites where their meanings are detailed. For example:
http://www.restapitutorial.com/httpstatuscodes.html
A status code of ‘200’ means that the website is up and running.
Getting page length
import requests
r = requests.get('http://www.bluegalaxy.info')
print( len(r.content) )
import requests
r = requests.get('http://www.bluegalaxy.info')
print( len(r.content) )
import requests r = requests.get('http://www.bluegalaxy.info') print( len(r.content) )
Which yields:
586
586
586
The length of the content here is simply the number of characters in the source of the page. For example, if you took all of the page source and placed it into a string, this is a measurement of the length of the string.
Here is a bigger example:
import requests
sites = [
'http://www.python.org',
'http://www.jython.org',
'http://www.pypy.org',
'http://www.drudgereport.com',
'http://www.youtube.com',
'http://www.phys.org',
'http://www.bluegalaxy.info',
'http://www.bluegalaxy.info/codewalk'
]
for url in sites:
r = requests.get(url)
print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \
.format(url, r.status_code, len(r.content)))
import requests
sites = [
'http://www.python.org',
'http://www.jython.org',
'http://www.pypy.org',
'http://www.drudgereport.com',
'http://www.youtube.com',
'http://www.phys.org',
'http://www.bluegalaxy.info',
'http://www.bluegalaxy.info/codewalk'
]
for url in sites:
r = requests.get(url)
print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \
.format(url, r.status_code, len(r.content)))
import requests sites = [ 'http://www.python.org', 'http://www.jython.org', 'http://www.pypy.org', 'http://www.drudgereport.com', 'http://www.youtube.com', 'http://www.phys.org', 'http://www.bluegalaxy.info', 'http://www.bluegalaxy.info/codewalk' ] for url in sites: r = requests.get(url) print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \ .format(url, r.status_code, len(r.content)))
Which yields:
URL: http://www.python.org Status code: 200 Page length: 48761
URL: http://www.jython.org Status code: 200 Page length: 19210
URL: http://www.pypy.org Status code: 200 Page length: 6133
URL: http://www.drudgereport.com Status code: 200 Page length: 31485
URL: http://www.youtube.com Status code: 200 Page length: 516073
URL: http://www.phys.org Status code: 400 Page length: 91
URL: http://www.bluegalaxy.info Status code: 200 Page length: 586
URL: http://www.bluegalaxy.info/codewalk Status code: 200 Page length: 74648
URL: http://www.python.org Status code: 200 Page length: 48761
URL: http://www.jython.org Status code: 200 Page length: 19210
URL: http://www.pypy.org Status code: 200 Page length: 6133
URL: http://www.drudgereport.com Status code: 200 Page length: 31485
URL: http://www.youtube.com Status code: 200 Page length: 516073
URL: http://www.phys.org Status code: 400 Page length: 91
URL: http://www.bluegalaxy.info Status code: 200 Page length: 586
URL: http://www.bluegalaxy.info/codewalk Status code: 200 Page length: 74648
URL: http://www.python.org Status code: 200 Page length: 48761 URL: http://www.jython.org Status code: 200 Page length: 19210 URL: http://www.pypy.org Status code: 200 Page length: 6133 URL: http://www.drudgereport.com Status code: 200 Page length: 31485 URL: http://www.youtube.com Status code: 200 Page length: 516073 URL: http://www.phys.org Status code: 400 Page length: 91 URL: http://www.bluegalaxy.info Status code: 200 Page length: 586 URL: http://www.bluegalaxy.info/codewalk Status code: 200 Page length: 74648
For more information about the requests module see:
http://docs.python-requests.org/en/master/