Thursday, March 02, 2017

Learning with Data

In printed textbooks, the need to save paper dictates that data sets remain small, lets say under 1K of data, just to fit. Analysis has a different flavor when you get to look at it and reason about it row by row, and we do that in Python using SQLite, part of the Standard Library.

However lets skip over what flavor of SQL or noSQL and refocus on data, and how it no longer needs to be small.  Nor must it be enormous just for the hell of it.  On edu-sig in Python.org, I've been yakking about "rich data structures" such as: the Periodic Table of the Elements; a Glossary of Terms; Assorted Polyhedra.  Recently added: a database on roller coasters.

Let me boast the advantages:  we're taking traditional classroom poster data, stuff already hanging on the walls, and distilling it to reinforcing content that we interact with through our computers, as well as through our own senses.  We read them as files, starting from maybe JSON or XML (those are well known data exchange formats).

Tables are nothing new.  Rows and columns, so-called arrays, became multi-dimensional in the guise of NumPy, or in the computer language J, which has them natively. The data languages all have them now, as a type.  As many axes of address as you like.  A lot like SQL or noSQL, a database in itself.

The complaint that switching to tables and SQL is some severe departure from reading and writing of old is stemming from nostalgia for when we could afford to wait in a long line for our money. ATMs to the rescue. For those, you need the tables to exist electronically, but otherwise it's just like the ledger books of our colorful past.  Lots of people use paper today, nothing wrong with that, has its advantages. Not either / or.

The point being:  now that literacy within industry does require doing homework, the need to pay ourselves to keep learning seems obvious, and as we practice, it helps to have rich data structures to hack on, and many of us have those and are currently hacking away.  I'm not at all as fast as some when it comes to drilling down in some XML file using elementtree or whatever.

Indeed, I'm the king of non sequiturs some might say, however I grew up on film and know about jump cuts, flashbacks, not just plodding linear.  Norman O. Brown and all that.  Ad copy is likewise choppy.

J comes from a collaboration among Kenneth and Eric Iverson, and Roger Hui, not that I know the whole story. I immersed myself in J awhile back, having first cut some teeth on Kenneth's APL (A Programming Language), which I really grooved on.  J did not disappoint, in terms of what it made possible.  I grabbed some of its group theory too, when I pitched my tent in nearby Python Nation.  Kenneth found a couple typos in my Jiving in J essay (I was honored, to have the attention of his genius).

Another double plus for using classroom poster data, as a topic in learning to code, is we're not intruding on anyone's privacy.  Science has shared the Periodic Table with generations already. We don't need Hydrogen to sign a release.  I'm not in denial about security and privacy, on the contrary I'm suggesting data that's the least sticky.  That stuff at least, will be super easy to come by.