Data Mining Tools: Column to Array

I have created some Python tools that make data mining easier. For example, when  moving lots of data around one thing that comes up often is getting information from a spreadsheet into a Python data container such as a list. This can be done by adding some code in a Python script to open an Excel file, parse the columns or rows, and then use a loop to capture the data into a list.

I thought there must be a faster and easier way to accomplish this task, without adding coding overhead to every project. So I created a web based data mining tool that makes it easier to get columns of spreadsheet data into a Python script as an array.

Instead of having to parse spreadsheets from within your script, all you have to do is select a column of data in the spreadsheet , then copy and paste the data into the web form. With the click of a button, the script will produce a Pythonic list (array) that can be copied into your script.

The full screen version of this tool can be seen here:

This is a simple CGI script, in that all of the HTML that the script uses is saved in triple quoted string variables inside the script and values are substituted in these strings. For example:

output_box = """
&nbsp;<b>Here is the array, ready to be copied into your script: (Ctrl + A) then (Ctrl + C) </b><BR>
		<td><TEXTAREA NAME="Output" COLS=108 ROWS=10">%s</TEXTAREA></td>
print output_box % (s)

All of the HTML is generated by the Python CGI script, and the form uses only two variables. For example:

# Import modules for CGI handling
import cgi, cgitb

# Create instance of FieldStorage
form = cgi.FieldStorage()

# Get data from fields
column_list = form.getvalue('input')
array_data_type = form.getvalue('array_data_type')

The main logic of the script entails removing the end of line characters from every row of the column and then converting the resulting string into a list using .split(). For example:

# Convert the data into a Pythonic list:
s = column_list.rstrip("\n")
s = s.rstrip("\r")
s = s.rstrip("\r\n")
s = s.rstrip(" ")
 # Convert the column into an array using split
s = s.split("\r\n")

# If not a list of strings, remove the single quotes
if array_data_type != "strings":
    s = str(s).replace("'", "")

Here is the tool, loaded into an iframe:

Leave a Reply