Thursday, September 10, 2009

File Operations

The Path class offers a rich set of methods for reading, writing, and manipulating files and directories. Before proceeding to the remaining sections, you should familiarize yourself with the following common concepts:

Catching Exceptions

With file I/O, unexpected conditions are a fact of life: a file exists (or doesn't exist) when expected, the program doesn't have access to the file system, the default file system implementation does not support a particular function, and so on. Numerous errors can be encountered.

All methods that access the file system can throw an IOException. It is best practice to catch these exceptions by embedding these methods into a try block and to catch any exceptions in a catch block. Most of the examples in this lesson follow this protocol. Also, if your code has opened any streams or channels, you should close them in a finally block.

In addition to IOException, many specific exceptions extend FileSystemException. This class has some useful methods that return the file involved (getFile), the detailed message string (getMessage), the reason why the file system operation failed (getReason), and the "other" file involved, if any (getOtherFile).

The following code snippet shows how the getFile method might be used:

try {
...
} catch (NoSuchFileException x) {
System.err.format("%s does not exist\n", x.getFile());
}

For more information, see Catching and Handling Exceptions.

Varargs

Several Path methods accept an arbitrary number of arguments when flags are specified. For example, in the following method signature, the ellipses notation after the CopyOption argument indicates that the method accepts a variable number of arguments, or varargs, as they are typically called:

Path moveTo(Path, CopyOption...)

When a method accepts a varargs argument, you can pass it a comma-separated list of values or an array (CopyOption[]) of values.

In the moveTo example, the method can be invoked as follows:

import static java.nio.file.StandardCopyOption.*;

Path orig = ...;
Path new = ...;
orig.moveTo(new, REPLACE_EXISTING, ATOMIC_MOVE);

For more information about varargs syntax, see Arbitrary Number of Arguments.

Atomic Operations

Several Path methods, such as moveTo, can perform certain operations atomically in some file systems.

An atomic file operation is an operation that cannot be interrupted or "partially" performed. Either the entire operation is performed or the operation fails. This is important when you have multiple processes operating on the same area of the file system, and you need to guarantee that each process accesses a complete file.

Method Chaining

Many of the file I/O methods support the concept of method chaining.

You first invoke a method that returns an object. You then immediately invoke a method on that object, which returns yet another object, and so on. Many of the I/O examples use the following technique:

String value = Charset.defaultCharset().decode(buf).toString();
UserPrincipal group = file.getFileSystem().getUserPrincipalLookupService().lookupPrincipalByName("me");

This technique produces compact code and enables you to avoid declaring temporary variables that you don't need.

FileRef Interface

The Path class implements the FileRef interface. The FileRef interface includes methods for locating a file and accessing that file. Coverage for the FileRef methods is woven into the lesson where relevant.

What Is a Glob?

Two methods in the Path class accept a glob argument, but what is a glob?

You can use glob syntax to specify pattern-matching behavior.

A glob pattern is specified as a string and is matched against other strings, such as directory or file names. Glob syntax follows several simple rules:

  • An asterisk, *, matches any number of characters (including none).
  • Two asterisks, **, works like * but crosses directory boundaries. This syntax is generally used for matching complete paths.
  • A question mark, ?, matches exactly one character.
  • Braces specify a collection of subpatterns. For example:
    • {sun,moon,stars} matches "sun", "moon", or "stars."
    • {temp*,tmp*} matches all strings beginning with "temp" or "tmp."
  • Square brackets convey a set of single characters or, when the hyphen character (-) is used, a range of characters. For example:
    • [aeiou] matches any lowercase vowel.
    • [0-9] matches any digit.
    • [A-Z] matches any uppercase letter.
    • [a-z,A-Z} matches any uppercase or lowercase letter.
    Within the square brackets, *, ?, and \ match themselves.
  • All other characters match themselves.
  • To match *, ?, or the other special characters, you can escape them by using the backslash character, \. For example: \\ matches a single backslash, and \? matches the question mark.

Here are some examples of glob syntax:

  • *.html – Matches all strings that end in .html
  • ??? – Matches all strings with exactly three letters or digits
  • *[0-9]* – Matches all strings containing a numeric value
  • *.{htm,html,pdf} – Matches any string ending with .htm, .html or .pdf
  • a?*.java – Matches any string beginning with a, followed by at least one letter or digit, and ending with .java
  • {foo*,*[0-9]*} – Matches any string beginning with foo or any string containing a numeric value


Note: If you are typing the glob pattern at the keyboard and it contains one of the special characters, you must put the pattern in quotes ("*"), use the backslash (\*), or use whatever escape mechanism is supported at the command line.

The glob syntax is powerful and easy to use. However, if it is not sufficient for your needs, you can also use a regular expression. For more information, see the Regular Expressions lesson.

For more information about the glob sytnax, see the API specification for the getPathMatcher method in the FileSystem class.

No comments: