Wednesday, July 31, 2013

Parsing in Python





Let's make sense of this pile of data.

Reading and parsing data are the first two building blocks of a larger process, compiling, so it's always useful to know a couple of good methods.

A quick Google search of "python reading data" got me to http://docs.python.org/2/tutorial/inputoutput.html, for some simple reference material.

Now, testing is always a fun way to find out what's going on behind the code, so let's see if we can just pull out a bit of the important data here so we don't overwhelm ourselves.

Now, continuing from before, lets convert the data object to a useful string/array.

>> theData = str(html.read()).split(',')

Now let's see what this looks like.

>> for i in range(5):


         print(i, theData[i])


This smaller data structure will come in handy when setting up our larger program design.

Here's the result:

0 b'{"Key":"100-og"

1 "Name":"$100 OG"

2 "Category":"Hybrid"

3 "Symbol":"100"

4 "Abstract":"$100 OG is a hybrid cannabis strain that originally became popular in Southern California. It produces large flowers and a potent high."


Now, wait, your probably thinking, what's that b'{ doing there?

This data is meant to be parsed with the use of json modules, not bare bones python split statements, and that should massively simplify the process.


Although; you may have to redesign your parsing algorithm to completely remove all persisting characters and strings.