{"id":445,"date":"2017-09-21T12:55:45","date_gmt":"2017-09-21T17:55:45","guid":{"rendered":"http:\/\/bluegalaxy.info\/codewalk\/?p=445"},"modified":"2019-07-02T14:09:33","modified_gmt":"2019-07-02T19:09:33","slug":"python-using-requests-to-get-web-page-lengths-and-status-codes","status":"publish","type":"post","link":"https:\/\/bluegalaxy.info\/codewalk\/2017\/09\/21\/python-using-requests-to-get-web-page-lengths-and-status-codes\/","title":{"rendered":"Python: Using requests to get web page lengths and status codes"},"content":{"rendered":"<p>With Python 3.6 and the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"no-highlight\">requests<\/code> module, it is easy to read data from the web. Here are a couple of basic things we can do with the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"no-highlight\">requests<\/code> module.<\/p>\n<h4>Getting a Status code<\/h4>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import requests\n\nr = requests.get('http:\/\/www.bluegalaxy.info')\nprint( r.status_code )<\/pre>\n<p>Which yields:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"no-highlight\">200<\/pre>\n<p>For more information about HTTP status codes, there are multiple websites where their meanings are detailed. For example:<br \/>\n<a href=\"http:\/\/www.restapitutorial.com\/httpstatuscodes.html\">http:\/\/www.restapitutorial.com\/httpstatuscodes.html<\/a><\/p>\n<p>A status code of &#8216;200&#8217; means that the website is up and running.<\/p>\n<h4>Getting page length<\/h4>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import requests\n\nr = requests.get('http:\/\/www.bluegalaxy.info')\nprint( len(r.content) )<\/pre>\n<p>Which yields:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"no-highlight\">586<\/pre>\n<p>The length of the content here is simply the number of characters in the source of the page. For example, if you took all of the page source and placed it into a string, this is a measurement of the length of the string.<\/p>\n<p>Here is a bigger example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import requests\n\nsites = [\n    'http:\/\/www.python.org',\n    'http:\/\/www.jython.org',\n    'http:\/\/www.pypy.org',\n    'http:\/\/www.drudgereport.com',\n    'http:\/\/www.youtube.com',\n    'http:\/\/www.phys.org',\n    'http:\/\/www.bluegalaxy.info',\n    'http:\/\/www.bluegalaxy.info\/codewalk'\n]\n\nfor url in sites:\n    r = requests.get(url)\n    print(\"URL: {0:40} Status code: {1:4} \\t Page length: {2:8}\" \\\n          .format(url, r.status_code, len(r.content)))<\/pre>\n<p>Which yields:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"no-highlight\">URL: http:\/\/www.python.org                    Status code:  200          Page length:    48761\nURL: http:\/\/www.jython.org                    Status code:  200          Page length:    19210\nURL: http:\/\/www.pypy.org                      Status code:  200          Page length:     6133\nURL: http:\/\/www.drudgereport.com              Status code:  200          Page length:    31485\nURL: http:\/\/www.youtube.com                   Status code:  200          Page length:   516073\nURL: http:\/\/www.phys.org                      Status code:  400          Page length:       91\nURL: http:\/\/www.bluegalaxy.info               Status code:  200          Page length:      586\nURL: http:\/\/www.bluegalaxy.info\/codewalk      Status code:  200          Page length:    74648<\/pre>\n<p>For more information about the requests module see:<br \/>\n<a href=\"http:\/\/docs.python-requests.org\/en\/master\/\">http:\/\/docs.python-requests.org\/en\/master\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With Python 3.6 and the requests module, it is easy to read data from the web. Here are a couple of basic things we can do with the requests module. Getting a Status code import requests r = requests.get(&#8216;http:\/\/www.bluegalaxy.info&#8217;) print( r.status_code ) Which yields: 200 For more information about HTTP status codes, there are multiple &hellip; <a href=\"https:\/\/bluegalaxy.info\/codewalk\/2017\/09\/21\/python-using-requests-to-get-web-page-lengths-and-status-codes\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Python: Using requests to get web page lengths and status codes<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22,33],"tags":[4,34],"class_list":["post-445","post","type-post","status-publish","format-standard","hentry","category-python-language","category-python-web-scraping","tag-python","tag-requests"],"_links":{"self":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts\/445","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/comments?post=445"}],"version-history":[{"count":7,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts\/445\/revisions"}],"predecessor-version":[{"id":2882,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts\/445\/revisions\/2882"}],"wp:attachment":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/media?parent=445"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/categories?post=445"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/tags?post=445"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}