Web File Formats

Understanding Formats and Conversion Issues

3/4/98
J. L. Mohler

A couple of years ago, surfing the web would have revealed a myriad of file formats floating around the web. Likely if you were part of the “in-crowd” who found this funny little piece of software called Mosiac and installed it, you would have found many, many file formats that couldn’t be viewed (without the aid of a helper application). Yet today’s web is a little more “civilized.” The predominance of graphics files on the web are GIF and JPEG, with PNG quickly catching. Yet, you still may find a straggling TIFF, BMP, PICT, or EPS.

At this point you may be asking yourself, “What are all these file format acronyms?” That's what this article is for.

Using GIF and JPEG Formats
As most people are aware, the Graphics Interchange File Format (GIF) and the Joint Picture Experts Group (JPG or JPEG) format are among the most commonly used graphics formats found on the web. As you saw in the last chapter, “managing Color Differences”, the biggest difference between these two files is the amount of color data that they can contain. In addition, the two formats provide some other unique features.

Although the web is designed to be open to all graphics formats, note that the formats that can be delivered and received depend upon the client as well as the server. Nonetheless, the even bigger issue is focused at whether a particular format is an open standard or a commercial standard. Open standards (formats) can be used without regard to copyright or patent considerations. Commercial standards, such as GIF and some varieties of the JPEG format, have patent or copyright considerations that must be taken into account. As the developer or a page, program, or application, you become liable for fees associated with commercial standards. This is the main reason for the development of open standards such as the new Portable Network Graphics (PNG) format.

The GIF format can be interpreted directly by the browser. This file format can support up to 256 colors (8-bit color data) as well as a special layer of data called the transparency layer. Yet, realize that the GIF format was developed by CompuServe as a means of distributing display quality images over their online service. GIF files were never actually intended for print and this is the main reason that they only support 256-color image data. They were designed to be lightweight files that could be exchanged electronically.

Aside: Remember at the time the GIF format was developed, 300 BPS was the common modem transfer speed and even the GIF format taxed this extremely underdeveloped technology.

However, the GIF format uses a proprietary compression scheme which has cause quite a copyright skirmish over the past couple of years. In short, UniSys corporation, developers of the LZW compression scheme used in the GIF format (developed by CompuServe Information Services), announced they would be suing for patent fees from software developers using the format. This "problem" has led to the development of a new and unique file format, Portable Network Graphics (PNG), which is focused at quickly replacing both the GIF and JPEG formats. Later in this chapter you’ll read more about the PNG format.

Yet even though the GIF format is pretty common, there are actually two different flavors of this image format: 87a and 89a, presumably named after the years in which they were developed. The one of most interest to web developers is the GIF 89a format which supports transparency.

Much like the masking feature of PhotoShop, the transparent GIF format correlates a single color in the image as being transparent. Therefore, when the image is displayed in the browser, specified <IMG> tag, background elements appear in place of the assigned transparent color value.

As the alternative to the low color limit of the GIF format, the JPEG format presents a standard that excels at delivering photo realistic images. Yet note that JPEG delivers rasterized vector images poorly. Only use the JPEG format for high color raster images that have not been palettized or reduced in color depth. Probably the biggest advantage to JPEG is its lossy compression scheme, which allows high quality images to be delivered with close to a 10:1 space savings compared to GIF.

Progressive Images & Interlacing
These new JPEG files, called Progressive JPEGs, allow the browser to create low resolution representations of the graphics, which become clearer and higher quality as the browser downloads more of the file. Much like focusing the lens on a camera, the image begins blurry and then becomes clear when the image is completely downloaded as shown in Figure 1. JPEG files themselves can contain both 8- and 24-bit information, which makes them a good candidate for delivering graphics at greater than 256-colors.

Figure 1. Interlacing the file as it is saved allows the image to be progressively viewed.

To enable a progressive download the image must be digitally recorded or saved in a special way. The interlace option in any graphic file format causes the data to be saved non-sequentially. Rather than storing each line of pixels as they appear from top to bottom in the image, interlacing stores every 4th, 8th or 16th line in that order. So rather than storing line 1, 2, 3 and so on, an interlaced file stores line 1, line 8, line 16 and so on. It then repeats at the top of the image with line 2, line 17, and so on as shown in Figure 2. This way the image can be progressively drawn as the image is downloaded.

Figure 2. Interlacing saves the file a special way.

Many of the latest imaging applications such as Adobe PhotoShop 4.0 supports this new rendition of the original JPEG format. The progressive JPEG is quite impressive but still data loss can be a negative since it uses lossy compression.

Create GIF and JPEG Graphics
To create a GIF graphic from PhotoShop 4.0:

Save the image as a high resolution TIFF or PSD file first. This allows you to go back to the original image if modifications are necessary.
Set the image color depth to 8-bit, using the Image | Mode menu option. Remember the GIF format can only contain 256-color image data and the PhotoShop will not show GIF as a Save As option unless the image is in 8-bit mode.
When prompted, choose to save the image as a Normal GIF or as an Interlaced GIF.

To create a JPEG image from PhotoShop 4.0:
Save the image as a high resolution TIFF or PSD file first. Remember that JPEG’s lossy compression will loose some data. Saving a high quality version allows you to go back to the original image if modifications are necessary.
Verify that the image you are saving is a 24-bit image using the Image | Mode menu. If the image is an 8-bit image, use the GIF or PNG format instead of the JPEG format.
Choose the Save As or Save option from the file menu. Set the Save As drop-down menu to JPEG format.
When prompted with the JPEG Options dialog box (shown in Figure 3) accept the default compression settings. You can test several lossy settings to get a desirable file size. However, remember that the “smaller” the file, the more data is lost. If you want the image to be progressive, select that option in the dialog box as well.

Figure 3. The JPEG options dialog box in PhotoShop 4.0.

Using the Portable Network Graphics (PNG)
One of the most noteworthy occurrences that has developed in the world of web graphics as of late is this new graphics format. The Portable Network Graphics (PNG, pronounced “ping”) format has some very distinct advantages over both the JPEG and GIF formats and seeks to better standardize the graphics found on the web in addition to making it legally open.

The new PNG or Portable Network Graphics format supports Indexed-color up to 256-colors, true color images, progressive display, transparency, and automatic lossless compression. An additional feature is the use of pre-compression filters which prepare the image data for optimal compression. In general there are five filter types that can be applied to data within the image.

Note that the filters used with the PNG format are applied to the bytes that make up the image, and not the pixels or their colors. In addition, the filter works across individual scanlines that make up the image.

Keep in mind that the biggest reason for the introduction of the PNG format is to eliminate many of the problems associated with the LZW compression scheme. In addition, certain optional characteristics of JPEG could also lead to similar proceedings. The biggest reason for the introduction of the PNG format is the need for an open standard (format).

In addition the PNG format includes several features that also make it more advantageous than JPEG or GIF. The PNG format supports RGB color images up to 48 bits, full masking (alpha channels), and image gamma information. It will be interesting to see how quickly this format catches on. The big two (Netscape and Explorer) currently support the new format. In addition, many of the latest image editors allow the developer to generate files in the new format.

Note that if you are using an older browser, you can probably get a plugin that will allow your browser to view PNG images.

Creating a PNG File
To create a PNG graphic in PhotoShop 4.0:

Choose the Save As or Save option from the File menu. Set the Save as drop-down menu to PNG.
When prompted with the PNG options dialog box (shown in Figure 4), determine if you want a progressive image and select the appropriate option.

Figure 4. The PNG options dialog box in PhotoShop 4.0.

In the filter section, select one of the available filters. Since the filters affect the image’s bytes, rather than pixels, you may want to try several of the filters to obtain better compression results.

Using Alternative Formats
Although GIF and JPEG images are the most common formats found on the web, there are many others that you can run across as you are surfing. You’ll find that each has its own quirks and advantages. Yet keep in mind that to use these formats you must make sure the client and server support them.

TIFF
Tagged Image File Format or TIFF (pronounced “tif”) is a special computer graphic file format that was designed to support the output of high-resolution raster images. This particular file format will allow up to 32-bit data and is a very robust format. It is not uncommon for TIFF files to be quite large. Since it is designed to hold image data for printing, the TIFF format uses a special internal compression scheme called Lepel-ZivWelch (LZW) compression.

PICT
Most of the formats mentioned thus far are raster formats -- holding only pixel data. The PICT format, used predominantly on the Macintosh, is a special type of file format called a metafile format. Metafile formats allow either raster or vector data within them. In fact, they can store both simultaneously. PICT files are very popular in both vector and raster imaging on the Macintosh due to their very small file size. However, take PICT to the PC and you’ll be lucky if you can find software that’ll be able to open it. If you are working cross platform, PICT is not a good choice for a file format.

Compression
As you work with various graphics file formats you’ll find that there are a variety of compression schemes that can be used to reduce the size of a file. Actually, you’ll find that for web-based graphics and files, there are two main types of compression that can be used: lossy and lossless. These terms describe how the compression scheme works.

As you may have read in other articles, one of the biggest problems with raster images is their size. To overcome this hurdle, compression schemes have been developed to help reduce the file size of raster images. Realize that almost every raster image has redundant data. For example, an image with many blue hues in it has redundant data due to the repeated definition of the blue pixels in the image. Compression schemes take the redundant or repeating data and substitute tokens or representative characters for the repeating data, thus reducing the file size. Most compression schemes, such as the ones used in BMP and TIFF files, are transparent to the user. Many times you don’t even know that the compression is occurring, but the compression can significantly reduce the size of the file.

Compression schemes use an algorithm, or codec to compress and decompress the image file. A codec stands for Compressor / Decompressor which is an algorithm used to expand and compress the file. However, the compressibility of a file is dependent upon how much redundant data there actually is in the file. A file with a lot of similar hues will compress more than an image with a wide variety of colors. Compression is dependent upon the amount of redundant data.

Compression schemes are judged by the amount that they compress the file, described by the compression ratio. The compression ratio is the ratio of the uncompressed file’s size to the compressed file’s size. Many of the compression schemes claim a ratio of 2:1. While others can only perform 1.25:1.

You must be careful companies that claim significantly high compression ratios. You must make sure you are comparing two lossy or two lossless compression codecs. Comparing lossy to lossless is like comparing apples to oranges. To understand this, let’s look at the difference between lossy and lossless compression.

Lossy Compression
When files are compressed, not all codecs reproduce an exact copy of the original file when the are uncompressed. Some data is sacrificed to attain smaller file sizes. This is the case with lossy compression. Lossy compression is a compression scheme in which certain amounts of data are omitted to attain smaller file sizes.

Lossy compression schemes, such as those used with JPEG images and many of the video formats, do not create an exact replica of the original file after decompression. They loose some of the original data. This may alarm you at first but lossy compression schemes are usually used when the files that are being compressed don’t need the extra data.

For example, an image that you display on screen requires less data than a file that you’re going to print. Therefore you can sacrifice some of the data for the sake of a smaller file size. This is also true in the digital video realm. Again, a certain amount of data can be sacrificed without significantly hurting the playback performance.

If you decide to use a lossy compression scheme you do have a choice concerning how much data is lost. Most of these schemes allow you to choose a loss rate. For example when you create a JPEG file you can adjust how much data is lost as shown in Figure 5. The same is true if you are creating video snippets. Of the file formats you have read about in this chapter, only JPEG uses lossy compression.

Figure 5. Adjusting the data loss in a JPEG image.

If you decide to use JPEG images keep two things in mind. First, after compression, if you ever try to print the JPEG file, more than likely it will look bad. Second, you should keep a back up of your JPEG images in a format that either doesn’t use compression, or that uses lossless compression.

Lossless Compression
As its name implies lossless compression can be used for image files that you want to print and use in situations where loss of data is detrimental. Lossless compression is a compression scheme in which a decompressed file creates an extract replica of the original file.

Lossless compression schemes do not sacrifice data. In fact they create an exact copy of the original file when they are decompressed. Lossless compression schemes are often used with files that need to maintain the highest level of data. Often they are using in the desktop publishing field for printing purposes, where loss of data would be unacceptable. Lossless compression schemes include the TIFF and GIF LZW (Lempel-ZivWelch) and the BMP RLE (Run-Length Encoding) compression schemes. Additionally, the compression used in the new PNG format is lossless.