Reading discussion
Nissenbaum “Bias in Computer Systems”
- Technical bias results from “the attempt to make human constructs amenable to computers, when we quantify the qualitative, discretize the continuous, or formalize the nonformal.” Is there an expressive potential in this process?
- Nissenbaum discusses a “counterstrategy” for avoiding bias in MLSA (the time-sharing system). Can you think of other examples of “hacking” bias?
Can you think of any examples of how programs that deal with text contain bias? (Or how the underlying system of representing text contains biases?)
Word frequency analysis
- The Words They Used (NY Times visualization of the 2008 US political conventions)
- Wordle
- R. Luke Dubois, Hindsight is always 20/20
Glazier “Grep: A Grammar”
- What are the “fundamental ‘materials’ of writing” according to grep? How does this differ from the “materials” of writing in other kinds of practice?
- What kinds of texts does the grep procedure create? What are their aesthetics? Can they be read, and if so, how?
Chance operations
We’ll discuss here several varieties of “non-intential” composition (which may or may not incorporate “chance” in the sense of “randomness.”) These categories overlap a great deal, and there’s obviously a lack of clarity about where to draw the lines between them. Which of these can be reproduced algorithmically?
Aleatory. “Some element of the composition is left to chance, and/or some primary element of a composed work’s realization is left to the determination of its performer(s). The term is most often associated with procedures in which the chance element involves a relatively limited number of possibilities.” (from Wikipedia. Examples: Automatic writing, John Cage’s early mesostics.
asK Little autO Where it wantS To take You.
Deterministic. A non-intentional process that leads to the same result every time. (No chance or choice involved.) Examples: Jackson Mac Low’s diastics and asymmetries. Asymmetry 205, There are many ways to use Strayer’s Vegetable Soybeans:
To hours, enough. Remove enough
And. Remove enough
Minutes. And not Iowa
Water and Iowa simmer.
To or
Until simmer. Enough
Simmer. To. Remove and Iowa enough. Remove simmer.
Vegetable. Enough good enough to and buttered loaf, enough
Simmer. Or Iowa buttered enough and not simmer.Tomatoes, hot egg. Roll egg.
Added. Roll egg.
Minutes. Added, nutty in.
Wash added, in soak
Tomatoes, overnight,
Until soak egg,
Soak tomatoes. Roll added, in egg. Roll soak
Vitamins—egg, giving egg, tomatoes, added, beans, largest egg,
Soak overnight, in beans, egg, added, nutty soak
Stochastic. Existing parts are re-arranged and juxtaposed using chance. Example: Raymond Queneau’s Cent mille milliards de poèmes.
Random numbers in Python
Python makes it easy to work with random numbers. The random module includes several functions for generating random numbers and choosing random items from lists. Here’s a sample transcript from the interactive interpreter:
>>> import random >>> random.random() # random number between 0 and 1 0.90685046351757992 >>> random.randrange(1, 10) # random number from 1 to 10 2 >>> random.gauss(0, 1) # gaussian random, mean 0, stddev 1 -0.15235026257945011
Python: Simple models of text
So far, we’ve been working with programs that examined just one line of a file at a time. During this session, we’ll be expanding our scope a little bit: we want to make programs that can build a picture of how an entire text looks, seems and behaves. In order to facilitate that, we’ll be looking at a few simple data structures.
Lists
Lists in Python are a kind of object that stores other objects. (They’re a lot like arrays in Processing, but more powerful, as you’ll see.) Once you’ve created a list, or put objects into a list, you can retrieve them using the same syntax we used last week to get individual characters out of strings. You can also get slices of a list, using the same syntax we used to get slices of strings. Here’s some example code:
>>> parts = ['led', 'resistor', 'capacitor'] >>> len(parts) # how many elements are in the list? 3 >>> type(parts) # what kind of object is this? <type 'list'> >>> parts[1] 'resistor' >>> parts.append('ultrasonic range finder') # adds a new element to list >>> parts ['led', 'resistor', 'capacitor', 'ultrasonic range finder'] >>> parts[2:] ['capacitor', 'ultrasonic range finder'] >>> parts.sort() # sorts the list in-place (i.e., changes the list) >>> parts ['capacitor', 'led', 'resistor', 'ultrasonic range finder'] >>> parts.reverse() # reverses the list in-place >>> parts ['ultrasonic range finder', 'resistor', 'led', 'capacitor'] >>> 'led' in parts # is the string 'led' in the list? True >>> 'flex sensor' in parts False >>> more_parts = [] # create an empty list >>> more_parts = list() # same thing
As you can see, list literals are made by surrounding a comma-separated list of objects with square brackets. You can store any kind of object in a list: strings, integers, floats, even other lists (or sets or dictionaries)!
You can iterate over the elements of a list with for, just like you iterate over the individual characters of a string. Here’s a transcript to demonstrate, from the interactive interpreter:
>>> materials = ['poplar', '8-segment LED', 'photoresistor', 'felt', 'lard'] >>> for material in materials: ... print material ... poplar 8-segment LED photoresistor felt lard
What if I just want to count from one to ten?
Use Python’s built-in range() function, which returns a list containing numbers in the desired range:
>>> range(1,11) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
This script:
for i in range(1,6): print i
Will output:
1 2 3 4 5
Randomness with lists
Python’s random library provides two helpful functions for performing chance operations on lists. The first is shuffle, which takes a list and randomly shuffles its contents; the second is choice, which returns a random element from the list.
>>> import random >>> cards = ['two of cups', 'four of swords', 'the empress', 'the fool'] >>> random.shuffle(cards) >>> cards ['two of cups', 'the fool', 'four of swords', 'the empress'] >>> random.choice(cards) 'the fool'
Data structures to store text: randomize_lines.py
This brings us to our first full-fledged example program, randomize_lines.py. Instead of operating on one line at a time, this program stores all of the lines from standard input into a list, then uses the random.shuffle
function to print out the lines in random order. Here’s the code:
import sys import random all_lines = list() for line in sys.stdin: line = line.strip() all_lines.append(line) random.shuffle(all_lines) for line in all_lines: print line
The all_lines
variable points to a list. Inside the first for
loop, we add each line that comes in from standard input to the list. After calling shuffle
to re-order the list, we then print the list back out. If you pass in the William Carlos Williams poem, you’ll get back a delightful re-imagining:
for breakfast the icebox so sweet Forgive me the plums I have eaten and so cold This is just to say that were in saving and which they were delicious you were probably
Strings and lists: split and join
String objects in Python provide two helpful functions to break strings up into lists of strings, and join lists of strings back into a single string. The split method “splits” a string into a list of strings, using the parameter that you pass to the method as a delimiter. The join method uses whatever string you call it on to join together the list of strings passed in as a parameter, creating a list. These are a little tricky, so it’s helpful to see them in action. Here’s a transcript from the interactive interpreter:
>>> foo = "mother said there'd be days like these" >>> foo.split(" ") # split on white space ['mother', 'said', "there'd", 'be', 'days', 'like', 'these'] >>> foo.split("e") # split on the letter "e" ['moth', 'r said th', 'r', "'d b", ' days lik', ' th', 's', ''] >>> wordlist = ['this', 'is', 'a', 'test'] >>> separator = " " >>> separator.join(wordlist) 'this is a test' >>> " ".join(wordlist) # same thing 'this is a test'
We’ll most often be using the split method as a shorthand for “split this string into a list of words.” (We’ll find more robust solutions for the problem of parsing words from a string when we discuss regular expressions.) Here’s a program that shuffles the order of the words on each line of standard input (available in the examples as randomize_words.py):
import sys import random for line in sys.stdin: line = line.strip() words = line.split(" ") random.shuffle(words) output = " ".join(words) print output
Here’s the result from passing in our favorite Robert Frost poem:
Stopping Woods On Snowy A By Evening I think Whose I these woods know. are is village the house in though; His see He stopping here not me will snow. fill woods up watch with To his queer think My it horse little must without farmhouse stop To a near and the lake woods Between frozen darkest the of year. evening The bells his a harness shake He gives To some mistake. if is there ask sound's only The sweep the other flake. easy and wind downy Of lovely, woods dark and The are deep. to keep, have I But promises to go And sleep, miles before I sleep. to before I go miles And
sys.argv: Python’s important built-in list
Last week we learned how to run our Python scripts from the command line, as though they were UNIX text mungeing utilities. Most UNIX utilities take arguments on the command line: grep takes a pattern to search for, for example. We can read command-line parameters from Python as well, using the sys.argv list. This list contains all of the parameters passed on the command line, including the same of the script itself.
For example, take the following script, called argv_reader.py:
import sys for arg in sys.argv: print arg
If you ran it on the command line like so:
$ python argv_reader.py cat wallaby armadillo
You’d get the following output:
argv_reader.py cat wallaby armadillo
Sets
The set is our second important data structure. You can think of a set as a kind of list, but with the following caveats:
- Sets don’t maintain the order of objects after you’ve put them in.
- You can’t add an object to a set if it already has an identical object.
Objects can be added to a set by calling its add method (as opposed to the append method used for lists).
A corollary to #1 above is that you can’t use the square bracket notation to access a particular element in a set. Once you’ve added an object, the only operations you can do are to check to see if an object is in the set (with the in operator), and iterate over all objects in the set (with, for example, for). Here’s a transcript of an interactive interpreter session that demonstrates these basic features:
>>> foo = set() >>> foo.add(1) >>> foo.add(2) >>> foo.add(3) >>> foo set([1, 2, 3]) >>> foo.add(1) # will be ignored---only one of any identical object can be in set >>> foo set([1, 2, 3]) >>> 1 in foo True >>> 5 in foo False >>> for elem in foo: ... print elem ... 2 1 3
An additional aspect of sets to note from the transcript above: because sets don’t maintain the order of objects, you’ll get the objects back in (seemingly) random order when you iterate over the set. For most applications, this isn’t a problem, but it’s something to keep in mind.
Sets are great when you want to store data, but you want to ignore duplicates in the data. One classic example is to create a list of unique words in a text file. Here’s a program that does just that, available in this session’s example programs as unique_words.py:
import sys words = set() for line in sys.stdin: line = line.strip() line_words = line.split() for word in line_words: words.add(word) for word in words: print word
The important lines in this program are lines 8 and 9, in which we loop over every word from the current line and add them to the set. Because sets ignore any attempt to add in an object that is already in the set, once we’ve inserted one word, the set will only ever contain one copy of that word.
On lines 11 and 12, we loop over the contents of the set and print them out. (Note here again that the words in the set won’t appear in any particular order.) Here’s some sample output, obtained by running the program with this_is_just.txt as input:
and saving they just sweet is say have in breakfast cold for to Forgive This delicious which probably you plums icebox that I eaten me so were the
Dictionaries
The ”dictionary” is a very powerful data structure. You can think of it as an array whose indices are strings (or any other object) instead of numbers. In PHP, they’re known as ”associative arrays” and in Perl they’re ”hashes”; in Java, there’s a class called ”Map” that does the same thing. Dictionary literals in Python consist of comma-separated key/value pairs, with a colon between the key and the value (see the transcript below for an example). Keys can be any object (with some exceptions); values can be any object. You can access values of a dictionary with square brackets, much like the list indexing syntax. Some sample code for the interactive interpreter:
>>> assoc = {'butter': 'flies', 'cheese': 'wheel', 'milk': 'expensive'} >>> assoc['butter'] # access value at a key 'flies' >>> assoc['gelato'] = 'delicious' # assign to a key >>> assoc.keys() ['butter', 'cheese', 'milk', 'gelato'] >>> assoc.values() ['flies', 'wheel', 'expensive', 'delicious'] >>> 'milk' in assoc True >>> 'yogurt' in assoc False >>> foo = {} # create an empty dictionary >>> foo = dict() # same thing
The concordance
So what are dictionaries good for? One classic application is to build a simple concordance: a list of words that occur in a text, and how many times those words occur. Here’s the source listing for just such a program (concordance.py):
import sys words = dict() for line in sys.stdin: line = line.strip() line_words = line.split(" ") for word in line_words: if word in words: words[word] += 1 else: words[word] = 1 for word in words.keys(): print word + ": " + str(words[word])
This program illustrates several important idioms for working with dictionaries:
- It’s okay to assign to a key that doesn’t already exist in a dictionary, but it’s an error in Python to try to access a non-existent key. That’s what the code in lines 9-12 is for: first we check to see if the current word is already in the dictionary; if it is, then we increment its value by one. Otherwise, we assign a value to that key.
- There are many ways to iterate over the contents of a dictionary. One is to iterate over the list of keys returned from the dictionary’s keys method, then access the corresponding value by using that key as an index.
Exercises
We’ll do some of these in class.
1. How would you write a program that randomizes the words in each line of standard output, then prints out the randomized lines in random order? Can this be done without any writing any new code at all?
2. Let’s say that we wanted our concordance script to store not just how many times a particular word occurred, but on which lines. How would we go about doing that? What kind of data structure would we need? What additional information would we need that we aren’t already tracking in concordance.py?
3. The alpha_replace.py reads in a text file as a source file, then replaces the words in standard input with words in the source file that begin with the same letter. Write a version of this script that replaces words not according to their first letter, but according to the number of letters in the word.
4. Make any of the example scripts insensitive to case.
Reading for next week
- Six Selections from the Oulipo (from The New Media Reader)
- Burroughs, W. The Cut-Up Method of Brion Gysin
- Dean, J. Tag Clouds and the Decline of Symbolic Efficiency
Reply
You must be logged in to post a comment.
No comments
Comments feed for this article
Trackback link: http://www.decontextualize.com/teaching/dwwp/chance-operations-simple-models-of-text/trackback/