| Jean LeLoup & Bob Ponterio
SUNY Cortland © 2003, 2009 |
One bit can have 2 possible states. 21=2. 0 or
1.
Two bits can have 4 possible states. 22=4. 00,01,10,11.
(i.e. 0-3)
Four bits can have 16 possible states. 24=16.
0000,0001,0010,0011, etc. (i.e. 0-15)
Seven bits can have 128 possible states. 27=128.
0000000,0000001,0000010, etc. (i.e. 0-127).
Eight bits can have 256 possible states. 28=256.
00000000,00000001,00000010, etc. (i.e. 0-255).
Eight bits are called a byte. One byte character sets can contain 256 characters. The new standard, though, is Unicode which uses two bytes to represent all characters in all writing systems in the world.
The original ASCII was a 7 bit character set (128 possible characters) with no accented letters. This was used in teletype machines. (The eighth bit was originally used to check parity - a way to look for errors.) IBM and Mac both created extended character sets for their personal computers, using the eighth bit to double the number of characters, Naturally, they didn't use the same characters in the same positions in their sets. Thus 8-bit character sets and incompatibility were born. For example, Microsoft Windows uses character 130 for the é, but the Mac uses character 142. Character 130 on the Mac is Ç.
On the Internet, many cables had wires that were designed to transmit 7-bit codes. In order to send more complex data, encoding schemes were developed to translate the more complex data (e.g. 8-bit, binary [graphics]) into something that could fit through a 7-bit pipeline. One such encoding scheme is MIME (actually many different schemes are part of MIME - Multipurpose Internet Mail Extensions). In order for MIME to work, two elements need to be defined, a content format or character set (what characters or other content are to be represented) and an encoding scheme (what codes will represent these characters) for the content.
A common code used for accented characters is Quoted-Printable. Any extended characters (higher than 127) are encoded using a string of three symbols. For example, é is =E9. 8BIT (essentially uncompressed character data) is also a valid MIME code and is the most common way to send characters with accents.
To allow a code to function two different kinds of machines with different operating systems and different built-in character sets, we all have to have a standard character set into which we will translate. The International Organization for Standardization (ISO) has established such standards. The standard character set for Western European languages is ISO-LATIN-I (or ISO-8859-1). But as long as a computer knows which character set is being used, it can be programmed to display those characters, no matter what the computer's native character set may be. The é is character 130 in ISO-LATIN-I.
A MIME compliant email program will use the email headers to keep track of which character set and which encoding scheme are applied to each email message. A web browser will do the same. This allows the program to know how to display the characters on any given machine so the entire encoding system is transparent to the user. For MIME Quoted-Printable in western European languages, these headers might look like:
And these same MIME headers are also used in web pages so that a web browser such as Internet Explorer or Firefox knows how to display each page, no matter where it was created. As long as the computer knows which character set is represented, it knows which character to display.
Keyboards
You also have to get the characters into the computer in the first place. Windows and Mac have long allowed this to be done using keyboard shortcuts. The best way to enter characters in Windows is to select a keyboard layout that includes the characters that you want to type. For typing western European languages on American keyboards, the safest bet is the US-International keyboard. In Windows XP look for keyboard configuration under the Regional and Language Options Control Panel.
For online help with keyboards see: