INTRODUCTION TO PROGRAMMING IN JAVA: THE CHARACTER TYPE

NOTE: This set of www pages is not the set of www pages for the curent version of COMP101. The pages are from a previous version that, at the request of students, I have kept on line.


CONTENTS

1. Introduction
2. ASCII Code
3. Alternative codes
4. The Character class
5. Character input
6. Example problem - lower to upper case conversion
 
6.1. Requirements
6.2. Analysis
6.3. Design
6.4. Implementation
6.5. Testing
7. The System.out.flush method

Example combines the two statement input data declaration used up until now, into a single statement declaration. Example also introduces the concepts of Boundary Value Analysis (BVA) and limit testing.


1. INTRODUCTION

The Java type character is used for handling single characters such as letters, digits and special symbols (e.g. question mark, full stop, colon etc.), or non-printable control character (e.g. tab, newline etc.). In Java (like many other programming languages) characters are written by enclosing them in single quotes. Examples:

'a'     'A'     '2'     '+'     '''



2. ASCII CODE

In the early days of computing characters were usually stored, in a computer, using a group of 8 bits, i.e. a byte. Originally, only seven of these bits were used. The eighth most significant bit, referred to as the parity bit, was used for error checking. Using only seven bits there are 128 different character codes available (2^7). There is a generally accepted standard, called the ASCII standard, which determines which characters can be encoded using the seven available bits, and which character code represents which character. ASCII (pronounced "ass-key") is an acronym for American Standard Code for Information Interchange.




3. ALTERNATIVE CODES

The ASCII standard was developed on the assumption that all computer usage would be in English. The English alphabet has 26 letters derived from the Latin alphabet. This set of letters is sufficient for only a small group of languages, e.g. English, Swahili and Hawaiian! All other living languages use either the Latin alphabet plus other characters, or other non-Latin alphabets, or syllabaries. Use of the ASCII standard therefore presents a problem in many countries.



3.1 LATIN-1 CODE

The obvious solution to addressing the above problem is to drop the use of the parity bit so that 256 character codes are available. There are a number of "8 bit" character standards available. Some languages (for example Ada) use what is commonly referred to as the LATIN-1 standard (ISO-8859). In this standard the first 128 codes (0 to 127) adhere to the ASCII standard, while the remaining codes provide for additional characters.



3.2 Unicode Worldwide Character Standard

The Unicode Worldwide Character Standard is a character coding system whereby characters are stored in two bytes of memory (i.e. 16 bits as opposed to 8 bits). "At time of writing" the Unicode standard contained 34,168 distinct coded characters. Java use the Unicode Standard.

Provided that we have an editor that supports the Unicode character set we can include any of the Unicode characters in our Java programs.



4. THE Character CLASS

The character class contains many useful methods for manipulating and testing characters. A Fragment of this class is presented in Figure 1. This fragment includes the following:

  • Character Constructor to create an instance of the class Character so that it represents the primitive value given as its argument.
  • charValue Returns the value of an instance of the class Character.
  • getNumericValue returns the Unicode numeric value of the character as a non-negative integer.
  • isDigit determines if the specified character is a digit (a number).
  • isLetter determines if the specified character is a letter.
  • toLowerCase maps the given character to its lowercase equivalent; if the character has no lowercase equivalent, the character itself is returned.
  • toUpperCase converts the character argument to uppercase.
 
THE CHARACTER CLASS

Figure 1: Class diagram for Character class

Note: the above five functions are all class methods so are invoked by linking the desired method to the class name Character, e.g.:

Character.isLetter(n);

where n is a data item of type char. Note also that the Character class contains many methods of the form is... for carrying out various test on instances of the type Character.




5. CHARACTER INPUT

Input, using the next method in the Scanner class is always in the form of a string. If, for example, we want integers or doubles we use the nextInt or nextDouble methods respectively. However there is no "nextChar" method. There are mechanisms for getting a single "char" from the input stream but at present we do not have sufficient knowledge to do this. However, what we can do is input a charcter as an ASCII integer and convert it to a "char" using a cast. Thus:

char inputInt  = input.next();
   char inputChar = (char) inputInt;

where input (in input.next()) is an instance of the Scanner class. Of course we can run the two statments together as follows:

char inputChar = (char) input.next();

The code example presented in Table 1 indicates how two characters may be input.

// CHARACTER INPUT APPLICATION
// Frans Coenen
// Thursday 3 August 2000
// Revised: Wednesday 30 June 2005 to be compatible with Java 1.5
// The University of Liverpool, UK   

import java.util.*;

class CharacterInputApp {

    // ------------------- FIELDS ------------------------ 
        
    // Create Scanner class instance

    private static Scanner input = new Scanner(System.in);  
		
    // ------------------ METHODS ------------------------  
    
    public static void main(String[] args)  {
    	
    	// Invite input
    
    	System.out.println("Input two characters seperated by a " +
					"carriage return:");
	
	// Read in input as a string.
	
	char inputChar1 = (char) input.nextInt();
	char inputChar2 = (char) input.nextInt();
	
	// Output the result
	
	System.out.println("input 1 = " + inputChar1 + " input 2 = " + 
					inputChar2);
	}
    }

Table 1: Character input code example



6. EXAMPLE PROBLEM LOWER TO UPPER CASE CONVERSION


6.1 Requirements

To produce a program that converts lower case alphabetic characters to upper case alphabetic characters (Figure 2). Note that lower case letters a..z have Unicodes 97..122, and upper case letters A..Z have Unicodes 65..90. Therefore to convert from lower case to upper case we must subtract -32 from the Unicode of the input character.

LOWER TO UPPER CASE CHARACTER CONVERSION

Figure 2: Lower to uppercase character conversion


6.2 Analysis

Using "noun extraction" the class diagram presented in Figure 2 is proposed.


6.3 Design

From Figure 3 the analysis indicates that we need to design a single class, Lower2UpperApp; all other methods used are contained in existing classes that come with the Java API.

UPPER TO LOWER CASE CLASS DIAGRAM

Figure 3: Lower to Upper case class diagram


6.3.1 Lower2UpperApp Class

Field Summary
private static Scanner input
           A class instance field to facilitate input from the input stream.

Method Summary
public static void main(String[] args)
           Main method to read in a character from the keyboard as a Unicode value, output this value (i.e. "echo" to the screen), and then convert to upper case equivalent by subtracting 32. Output this new Unicode value and the associated character.

A Nassi-Shneiderman in Figure 4.

NASSI-SHNEIDERMAN CHART FOR LOWER 2 UPPER APPLICATION METHOD

Figure 4: Nassi-Shneiderman charts for Lower2UpperApp class method


6.4. Implementation


6.4.1 Lower2UpperApp Class

The implementation for the Lower2UpperApp Class is given in Table 2. Points to note:

  1. We use the nextInt method contained in the Scanner class to input an Unicode integer.
  2. To covert a Unicode value into its character we use a cast:
    character = (char) unicodeValue;
    
// LOWER 2 UPPER APPLICATION
// Frans Coenen
// Tuesday 2 March 1999
// Wednesday 30 June 2005
// The University of Liverpool, UK   

import java.util.*;

class Lower2UpperApp {

    // ------------------- FIELDS ------------------------ 
        
    // Create Scanner class instance

    private static Scanner input = new Scanner(System.in);  
		
    // ------------------ METHODS ------------------------  
    
    public static void main(String[] args) {
    	char upperCaseChar;
	int  uniCodeValue;
	
    	// Input a unicode value and output associated charcater
	
    	System.out.print("Input a Unicode value:     ");
    	uniCodeValue = input.nextInt();
	System.out.println("Character equivalent is :           " + 
		(char) uniCodeValue);
	
	// Subtract 32 to find uppercase equivalent and output.
	
	uniCodeValue = uniCodeValue-32;
	System.out.println("Unicode upper case equivalent is: " + 
		uniCodeValue); 
	upperCaseChar = (char) uniCodeValue;
	System.out.println("Upper case charactere is:         " + 
		upperCaseChar);
	}
    }              

Table 2: Lower to upper case conversion application (Version 1)

Of course to be in tune with the spirit of OOP we should not write code where appropriate alternative pre-defined methods already exist (code reuse). Inspection of the character class indicates that there is a method toUpperCase already available. Thus an alternative encoding for the above might be as follows:

// LOWER 2 UPPER APPLICATION VERSION 2
// Frans Coenen
// Tuesday 2 March 1999
// Revised: Wednesday 30 June 2005
// The University of Liverpool, UK   

import java.util.*;

class Lower2UpperApp2 {

    // ------------------- FIELDS ------------------------ 
        
    // Create Scanner class instance

    private static Scanner input = new Scanner(System.in);   
		
    // ------------------ METHODS ------------------------  
    
    public static void main(String[] args) {
    	char lowerCaseChar, upperCaseChar;
	
    	// Input a character and output associated unicode
    
    	System.out.print("Input a Unicode value:     ");
    	lowerCaseChar = (char) input.nextInt();
	
	// Convert to uppercase equivalent and output.
	
	upperCaseChar = Character.toUpperCase(lowerCaseChar);
	System.out.println("Upper case charactere is:         " + 
		upperCaseChar);
	}
    }

Table 3: Lower to upper case conversion application (Version 2)


6.5 Testing

Boundary Value Analysis (BVA) Testing: When using input variables that can only take a particular "range" of values it has been demonstrated that errors often occur at the boundaries of the input domain. It is for this reason that Boundary Value Analysis (BVA) has been developed as a testing technique. Boundary value analyses leads to a selection of test cases that exercise bounding values for data items. At its simplest this involves the derivation of test cases with values just above and just below the bounding values. Thus suitable boundary values for the above application will be '`', 'b', 'y' and '{' (the Unicode character code for the symbol ``' is 96, and that for the symbol `{' is 123).

Limit testing is related to BVA testing, and is concerned with the generation of test cases to exercise the program when maximum and minimum input values are supplied. In the some cases this may be the maxima/minima for the type, in others this may be the limits of a particular range that we are interested in ('a' to 'z' in the above case).

An appropriate set of BVA and limit test cases is given in the table below. These test cases will also serve to test the arithmetic operation of the code with the inclusion of a sample input value near the middle of the prescribed range (e.g. 'm'). We should also carry out some random data validation testing.

TEST CASEEXPECTED RESULT
Unicode number ("char" equivalentOutput
96 (')'@'
97 (a) 'A'
98 (b)'B'
77 ('m')'M'
121 ('y')'Y'
122 ('z')'Z'
123 ('{')[

Some sample output using the above test cases is given in Table 4.

$ $java Lower2UpperApp
Input a Unicode value:     96
Character equivalent is :           `
Unicode upper case equivalent is: 64
Upper case charactere is:         @

$java Lower2UpperApp
Input a Unicode value:     97
Character equivalent is :           a
Unicode upper case equivalent is: 65
Upper case charactere is:         A

$java Lower2UpperApp
Input a Unicode value:     98
Character equivalent is :           b
Unicode upper case equivalent is: 66
Upper case charactere is:         B

$java Lower2UpperApp
Input a Unicode value:     109
Character equivalent is :           m
Unicode upper case equivalent is: 77
Upper case charactere is:         M

$java Lower2UpperApp
Input a Unicode value:     121
Character equivalent is :           y
Unicode upper case equivalent is: 89
Upper case charactere is:         Y

$java Lower2UpperApp
Input a Unicode value:     122
Character equivalent is :           z
Unicode upper case equivalent is: 90
Upper case charactere is:         Z

$java Lower2UpperApp
Input a Unicode value:     123
Character equivalent is :           {
Unicode upper case equivalent is: 91
Upper case charactere is:         [
  

Table 4: Sample output

Note that at present, given our current knowledge, we are still not in a position to prevent undesired inputs!

Further examples of character manipulation are available.




7. THE System.out.flush METHOD

When using System.out.print() to output data the data is first passed to a temporary storage area called a buffer from where it is output to (say) the screen. This arrangement is known as output buffering and is designed to save processing time, however it may cause code to appear to be behaving in a strange manner. This is because output is not always passed from the buffer to the screen immediately; the Java interpreter might process some further lines of code before doing this. To force the buffer to be flushed we can use the method:

System.out.flush();
 

contained in the PrintStream and PrintWriter classes. For exmple we might write:

System.out.print("Answer = ");
System.out.flush();
System.out.print(100/5);

This will cause the string "Answer = " to be output before the calculation is undertaken.

The buffer is always flushed whenever a "new line" character is encountered. Therefore when using System.out.println() the above is not a problem.




Created and maintained by Frans Coenen. Last updated 10 February 2015