Pandas: How to get a random sample DataFrame of x length using .sample( )

Let’s say you have a Pandas DataFrame called ‘df’ that has 50 thousand rows in it and you want to get a random sample out of it that contains 300 records. Pandas has a built in function called sample() that makes this very easy.

All you have to do is decide on the name of the new DataFrame that will contain your random sample (in this case I called it ‘sample_df’), and then use the following syntax (where n is the number of random rows selected out of df and placed into sample_df.)

Note: The line below creates the new sample DataFrame.

sample_df = df.sample(n=300)

It is also possible to get a sample that is a fraction of a DataFrame, rather than a fixed number of records. In this case, instead of using n=300, you would use frac=0.20 (for 20%). For example:

sample_df = df.sample(frac=0.20)  # return a random 20%

For more information see: pandas.DataFrame.sample

 

Leave a Reply