{"id":1,"date":"2017-06-02T03:01:21","date_gmt":"2017-06-02T08:01:21","guid":{"rendered":"http:\/\/1uslchriston.ad.here.com:8080\/blog\/?p=1"},"modified":"2017-08-19T22:04:58","modified_gmt":"2017-08-20T03:04:58","slug":"random_sample","status":"publish","type":"post","link":"https:\/\/bluegalaxy.info\/codewalk\/2017\/06\/02\/random_sample\/","title":{"rendered":"Pandas: How to get a random sample DataFrame of x length using .sample( )"},"content":{"rendered":"<p>Let&#8217;s say you have a Pandas DataFrame called &#8216;df&#8217; that has 50 thousand rows in it and you want to get a random sample out of it that contains 300 records. Pandas has a built in function called <code>sample()<\/code> that makes this very easy.<\/p>\n<p>All you have to do is decide on the name of the new DataFrame that will contain your random sample (in this case I called it &#8216;sample_df&#8217;), and then use the following syntax (where n is the number of random rows selected out of df and placed into sample_df.)<\/p>\n<p>Note: The line below creates the new sample DataFrame.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"enlighter\">sample_df = df.sample(n=300)<\/pre>\n<p>It is also possible to get a sample that is a fraction of a DataFrame, rather than a fixed number of records. In this case, instead of using n=300, you would use frac=0.20 (for 20%). For example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sample_df = df.sample(frac=0.20)  # return a random 20%<\/pre>\n<p>For more information see: <a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/stable\/generated\/pandas.DataFrame.sample.html?highlight=sample#pandas.DataFrame.sample\" target=\"_blank\" rel=\"noopener noreferrer\">pandas.DataFrame.sample<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let&#8217;s say you have a Pandas DataFrame called &#8216;df&#8217; that has 50 thousand rows in it and you want to get a random sample out of it that contains 300 records. Pandas has a built in function called sample() that makes this very easy. All you have to do is decide on the name of &hellip; <a href=\"https:\/\/bluegalaxy.info\/codewalk\/2017\/06\/02\/random_sample\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Pandas: How to get a random sample DataFrame of x length using .sample( )<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[6,3,4,5,7],"class_list":["post-1","post","type-post","status-publish","format-standard","hentry","category-pandas","tag-dataframe","tag-pandas","tag-python","tag-random","tag-sample"],"_links":{"self":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts\/1","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/comments?post=1"}],"version-history":[{"count":15,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts\/1\/revisions"}],"predecessor-version":[{"id":371,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/posts\/1\/revisions\/371"}],"wp:attachment":[{"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/media?parent=1"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/categories?post=1"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bluegalaxy.info\/codewalk\/wp-json\/wp\/v2\/tags?post=1"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}