Last week we learned about the UNIX command line: specifically, programs for the UNIX command line that cut up, filter, and mangle text. This week’s goal is to learn how to make programs that behave like those UNIX programs. Our language of choice: Java.
Download the source code for this week’s examples here. I recommend going to your Terminal application and changing to the directory we created last week (a2z
) and using curl
to download the examples:
$ cd a2z $ curl -O http://a2z.decontextualize.com/code/week02.zip $ unzip week02.zip
In our examples for this week, we’ll be using a Java library called TextFilter
, a little project of mine that I’ve been working on for this course. The library is included in the examples file above. Find the complete documentation here.
Compiling Java source code
Our first example program is Echo.java
. Here’s the listing:
import com.decontextualize.a2z.TextFilter; public class Echo extends TextFilter { public static void main(String[] args) { Echo e = new Echo(); e.run(); } public void begin() { println("beginning"); } public void eachLine(String line) { println(line); } public void end() { println("at end of file"); } }
This program prints out the string beginning
, then prints out each line of input as it comes in (from the keyboard or from a redirected file). When the end of the file is reached (or when you hit Ctrl+D
), it prints out at the end of file
. It’s like our own version of cat
, but with some extra stuff.
(There’s a lot of strange syntax here for those of you only familiar with Processing. All that will be explained in a second. Sit tight.)
Before we can run this program, we need to compile it. The compiler takes your source code and converts it to Java bytecodes–a series of bytes that the Java virtual machine knows how to execute. If you’re running OS X, the compiler is already installed on your computer. It’s called javac
, and you run it like this:
$ export CLASSPATH=a2z.jar:. $ javac Echo.java
(The export CLASSPATH
line only needs to be executed once every time you open your terminal session. It tells Java where to find any external libraries. Come see me if you want tips on how to make this automatic.)
After you run javac
, there should be a new file in your directory: Echo.class
. This is the compiled version of our code, which Java can actually execute. To run Java code, use the java
command, passing it the name of the class you want to execute:
$ java Echo
Have fun with the ensuing madness, and hit Ctrl+D
to quit. (Try redirecting the output to a file, or piping it through another UNIX command.)
Defining and executing classes
If you’re only familiar with Processing, Echo.java
might seem a little bewildering. What’s all that extra garbage? Here’s what you need to know:
(1) Everything in Java is part of a class. One .java
file defines exactly one class, and the name of the file (e.g., Echo.java
) must match the name of the class defined in the file (e.g., class Echo
).
(2) If a class defines a main
method, then the java
command can run that class as a program. When you run java
, it searches for a class with the name you specified on the command line, and then executes the main
method of that class. (The main
method must be specified as static
, meaning that Java can use the method without actually creating an instance of the class.)
(3) Classes can extend other classes. When you’re extending a class, you gain access to all of its functionality, and have the option of overriding some of the behavior in the class with your own behavior.
(4) Variables and methods in a class can be defined as public
, private
, or protected
. These are called “access modifiers” and they affect how other classes can use the data and methods in your class. Don’t worry too much about these guys right now. A good rule of thumb is that variables in your class should be declared private
, and methods should be declared public
.
Processing’s dirty secrets
The Processing environment hides these details from you, but they’re still present in the code that Processing generates. In fact, whenever you press the “play” button in the Processing IDE (or when you export as an applet or application), Processing rewrites your code to follow the above conventions. You can see the “real” Java code by looking at the .java
file that Processing produces when you export a sketch. Compare the Java source and Processing source of this applet, for example.
TextFilter, a PApplet for text
As you can see from the source code above, Processing sketches are, at heart, just Java classes that extend a class called PApplet
. The PApplet
class provides functionality (drawing lines and polygons and tracking mouse position), but allows you to specify behavior: specifically, what happens before the sketch begins (setup
) and what happens whenever the sketch is supposed to draw something to the screen (draw
). You get the tough stuff for free and the fun stuff easy.
Likewise, TextFilter is a Java class (of your instructor’s design) that hides the complexity of Java’s input/output operations, and lets you get down to the work of mungeing text. A class that extends TextFilter need only define any of the following:
begin()
: This method will be called before any lines are read from input. (Analogous to Processing’ssetup()
method.)eachLine(String line)
: This method will be called for each line read from input, in the order they occur in the input. The current line is passed to the method as a parameter.end()
: This method will be called after all lines have been read from input.
Other methods available to your classes that extend TextFilter:
println
: prints a line to output; takes a String or a char as a parameterprint
: prints a String or char to output (without terminating the current line)
Comprehensive documentation here. We’ll be going over more advanced uses of the library as the course progresses.
Using TextFilter
In order to use the TextFilter library, the a2z.jar
file must be visible to your compiler (that’s what the export CLASSPATH=a2z:.
command that you executed earlier does). You must also put the following line at the top of your Java file:
import com.decontextualize.a2z.TextFilter;
This tells the Java compiler that you want to be able to use the TextFilter class in your program.
(Remember: If you want the java
command to be able to run your TextFilter class as a standalone program, you need to define a static method called main
in your class. This method should create an object of the class that you’ve defined, and call the run
method on that object. See any of this week’s examples for an idea of how it works.)
SimpleFilter.java
The example programs this week are all simple examples of programs that filter or analyze text. They’re also intended to exploit a number of features of Java’s String
class. Here’s SimpleFilter.java
, which you can think of as a rudimentary form of grep
:
import com.decontextualize.a2z.TextFilter; public class SimpleFilter extends TextFilter { public static void main(String[] args) { SimpleFilter s = new SimpleFilter(); s.run(); } public void eachLine(String line) { if (line.indexOf('a') != -1) { println(line); } } }
This program reads from input, printing out only those lines that contain the character a
. (Notice that we don’t have to define begin
and end
functions if we don’t want the program to do anything special when it begins or ends.) The important bit of the code is this:
line.indexOf('a') != -1
The indexOf
method of the String class (official documentation here) tells you where a particular substring or character occurs within the string object that you call it on. If the substring is present, indexOf
returns the index of the substring—i.e., where that substring begins. If the substring is not present, indexOf
returns -1. In this case, we don’t care where the character a
appears in the string. We just want to know whether or not it’s there—so the program checks the return value of indexOf
and prints out the line only if it isn’t -1.
Reverse.java
Next up, Reverse.java
:
import com.decontextualize.a2z.TextFilter; public class Reverse extends TextFilter { public static void main(String[] args) { Reverse r = new Reverse(); r.run(); } public void eachLine(String line) { for (int i = line.length() - 1; i >= 0; i--) { print(line.charAt(i)); } println(); } }
This program demonstrates two important methods of the String class, namely length
and charAt
. The length
method returns the length (number of characters) in the string. The charAt
function returns the character that occurs at a particular index of the string—the index of the first character is 0, the second character is 1, and so forth. Think of it as a way to use a String as an array of characters. For example:
String foo = "hello"; println(foo.length()); // prints 5 println(foo.charAt(0)); // prints h println(foo.charAt(4)); // prints o
In Reverse.java
, we’re using these two methods of the String object to print out each line of input in reverse. The for
loop starts at the last index of the string (line.length() - 1
), then counts backwards to 0, printing out the character at each index. After the for
loop completes, the program prints a new line character by calling println
with no parameters.
AverageWordLength.java
This program is a simple example of text analysis. Instead of printing out or mangling the input as it comes in, AverageWordLength.java
looks at each line and tries to extract some statistical information about it—in this case, the average length of every word in the input. When all lines have been read, that statistical information is printed out.
import com.decontextualize.a2z.TextFilter; public class AverageWordLength extends TextFilter { public static void main(String[] args) { new AverageWordLength().run(); } private int wordCount = 0; private int wordLengthTotal = 0; public void eachLine(String line) { String[] components = line.split(" "); for (int i = 0; i < components.length; i++) { wordCount++; wordLengthTotal += components[i].length(); } } public void end() { println(String.valueOf(wordLengthTotal / (float)wordCount)); } }
The first new bit of syntax in this program is new AverageWordLength().run();
. All we're doing here is calling a method on the AverageWordLength object without having assigned that object to a variable. This code does the exact same thing as the following:
AverageWordLength a = new AverageWordLength(); a.run();
The shorter form saves us a few keystrokes, and also cuts down on repetition (and therefore, a chance to introduce a typo).
Another new thing we've done in this program is introduce instance variables: wordCount
and wordLengthTotal
. The object uses these variables to keep track of how many words have occurred in the text so far, and the total length of all words in the text. (Remember, your TextFilter
classes are just like any other class: you can define your own variables and methods, in addition to using those that the TextFilter
class defines for you.)
Inside the eachLine
method, we use the split
method of the String class to "split" the incoming text into an array of strings. The split
method takes one argument, which indicates the string that separates the elements of the string that we want to retrieve. It's sort of like the UNIX cut
command. For example:
String foo = "hello there you"; String[] fooWords = foo.split(" "); // fooWords now contains "hello", "there", "you" String foo = "comma,separated,values"; String[] fooValues = foo.split(","); // fooValues now contains "comma", "separated", "values"
Using a space character as a delimiter isn't the best way to extract "words" from a text---we'll be counting strings like hello!
and said,
as words---but it's an easy implementation, and it's often good enough. (The parameter passed to split
is actually a regular expression, which we'll talk about next week. For now, simple patterns like " "
and ","
should work like you expect them to.)
After splitting the line into words, the program loops over the resulting array of strings, incrementing the wordCount
variable and adding the length of each string to wordLengthTotal
. In the end
method, the program prints wordLengthTotal
divided by wordCount
---i.e., the average number of characters per word.
Processing vs. Java: continued
Two things to notice in the end
method of AverageWordLength.java
. The first is that we used this strange syntax to convert wordCount
to a float
:
(float)wordCount
... instead of what we might use in Processing (i.e., float(wordCount)
). The other is that we used String.valueOf()
to convert that float value into a string, instead of just passing the value in to println
. ( Documentation of String.valueOf()
begins here.)
These are examples of functions that are either present in Processing, but not present in barebones Java (the float
function) or that work differently in Processing than they do in Java (println
). In fact, the vast majority of Processing's built-in functions (see here) are Processing-specific. Most of them---like the ones that deal with drawing stuff---you won't need for this class. Others---like data conversion functions, string manipulation functions---you will need. Fortunately, Java provides most of this functionality for you. You just have to dig a bit into the Java Standard Library.
Java API
The Java API is a set of classes that any Java program can use. They're all available by default and already present on your computer (and on the computer of anyone you might give your program to). Before you try to program something complex, it's best to look in the standard library to see if Java already has a class that you can use. Browse the library here.
The API is huge, though, so it may be hard to find what you need. The homework this week involves getting familiar with the String class. You might also look at the Math class for implementations of basic math functions (e.g., sin
, floor
, random
...). The Collection classes (ArrayList, HashMap...) will loom large for the remainder of the course, so poke at the documentation for those too.
Reading for next week
- Burroughs, S. The Cut-Up Method of Brion Gysin.
- Rubenstein, R. A Brief History of Appropriative Writing.
- Hartley, G. "Listen" and "Relate": Notes Towards a Reading of Jackson Mac Low
Assignment #2
Create a program (using, e.g., the tools presented in class) that behaves like a UNIX text processing program (such as cat
, grep
, tr
, etc.). Your program should take text as input (any text, or a particular text of your choosing) and output a version of the text that has been filtered and/or munged. Your program should use at least one method of Java's String class that we didn't discuss in class.
Be creative, insightful, or intentionally banal. Optional: Use the program that you created in tandem with another UNIX command line utility.
Reply
You must be logged in to post a comment.
2 comments
Comments feed for this article
Trackback link: http://www.decontextualize.com/teaching/a2z/strung-out-on-java/trackback/