Visual Effects

1. VISUAL PHYSIOLOGY

Basic Optics

Human Eye

Rods and Cones

Evolution of Vision

Color Sensitivity of the Eye

Vision and the Brain

Psycho-Visual Effects

1. COLOR SPACES

Introduction

Tri-Stimulus Color Space

CIE Chromaticity Diagram

RGB Color Space

CRT Phosphors

sRGB Standard

YUV Standard

xvYCC Color Space

HSV Color Space

Subtractive Colors

CMYK Color Space

Practical Color Space Conversion Caveats

1. VIDEO CORRECTIONS

Color Temperature

Flicker

Time Base Correction (Jitter Removal)

Gamma Correction

1.     VISUAL PHYSIOLOGY

1.1  BASIC OPTICS

To review the subject of “optics” from freshman Physics class, please note that a lens can be characterized by a standard parameter called the “focal length” as illustrated below:

The “image distance” is a characteristic of the lens which in the human eye is changed by muscles making the lens either thinner or wider.  As the image distance gets BIGGER,

a)     the focal length also gets longer, i.e. increases in value or gets bigger

b)    the field of view as expressed as a solid angle gets smaller

c)     the effective magnification becomes greater

d)    and finally the amount of light entering the lens is reduced which is said to make the camera or the eye a SLOWER lens because it takes longer to get enough photons of light to make a visible exposure.

Note that for objects which are very far away as in astronomy, the focal length is equal to the image distance.

A very closely related parameter is the “F-Stop” is the aperture width divided by the focal length as follows:

a)     Most cameras come with standard or fixed values for F-Stop choices which are approximately powers of the square root of two, [i.e. 1, 1.4, 2, 2.8, 4, 5.6, 8, 11, 16, 22, 32, 45, 64, 90, 128, etc].

b)    Each smaller value allows TWICE as much light as the preceding value requiring a shutter speed one-half of the prior value to get the same exposure.

c)     As the F-Stop gets bigger, the “depth of field” which is the range of “object distances” for which the image is effectively in focus, gets bigger also.

d)    Objects which are very close to the camera or the eye, require a very small F-Stop in order to bring things into focus.

1.2  HUMAN EYE

The lens is adjusted by muscles in the eye to focus an external image on the retina.  Some other features are:

a)     External objects are focused upside down on the back of our eye.

b)    Our sharpest vision is at the “fovea centralis.”  A Stanford student damaged that in a laser experiment.  He had trouble reading and couldn’t look through a gun sight so he got a deferment from the military.

c)     We have a blind spot at the junction of the optic nerve.  There is an interesting optical illusion that makes objects at a special location to the side disappear while we look straight ahead.

d)    The human eye is almost identical to that of the octopus.   Our focal length is roughly 22 mm, the aperture is 7 mm, resulting in an Fstop of from 3.2 to 3.5.

e)     The human eye was designed to see large objects about 15 feet away in bright light.  We are people of the “daytime meadow” and NOT the “nighttime forest.”

1.3  RODS AND CONES

.

Some features of rods include:

a)     Rods are more numerous than cones totaling about 120 million.

b)    Rods are sensitive to almost all colors and are the only part of the eye sensitive to low light intensities.

c)     Rods give us our “night” or scotopic vision.

Some features of cones include:

a)     Humans only have 6 to 7 million cones.

b)    We have 3 different types of cones ( 64% red, 31 % green and 5% blue).

c)     Each different type of cone is most sensitive to a different color [i.e. light frequency].

d)    ALL cones much less sensitive to the intensity of light as compared to rods and very adaptable to bright light.

e)     Cones give us our “day” or photopic vision.

The relative sensitivity of rods and cones is shown below:

Although we have a very small number of BLUE cones, these are much more sensitive than the red and green; and thus we can see blue about as well as other colors although with reduced resolution.  Nevertheless, because of this peculiarity of the eye, CCDs can take advantage and are generally built using the “Bayer” cell as shown below:

Some perceptual consequences of our physiology are:

a)     We can’t see color at night.  Stars in the night sky have very distinct colors but appear white to our eyes.

b)    At night you can see better off to the side rather than straight ahead but close to the center.   In the day, you can see best straight ahead.

c)     It takes about 30 minutes of darkness for the rods to become active so we can see.  After that, the rods will integrate an image for about 15 seconds.

d)    Blue cones are fewer but much more sensitive so we can see as well as in the other colors but with less detail.

e)     In daylight we see yellow best.  This is the same color of the sun.  At night we see blue-green best.

f)     Our eyes adjust very rapidly to differences in brightness during the day because cones are about 600 times less sensitive.  Humans can see contrast differences by distinguishing objects over a brightness range of 10 million.

g)    A photoreceptor chemical, especially for rods, is Vitamin A which is found in carrots and other foods and helps night vision.

1.4  EVOLUTION OF VISION

Humans have three different types of cones or “color receptors.”    The evolution proceeded as follows:

a)     Early fish had three sets of cones as did early insects (bees) and vertebrates (lizards, alligators, crocodiles, etc).

b)    Dinosaurs evolved into birds with four sets of cones.

c)     Mammals evolved from vertebrates but lost one set of cones.  So lions, bears, dogs, cows, and whatnot only have two sets of cones.  This gives a slightly better distinction between khaki colors in bright sunlight.

d)    Mammals which returned to the sea (walruses, whales, seals, etc.) have better vision than land animals.

e)     Modern fish have four or five and more sets of cones.

f)     Old world monkeys, but not most new world monkeys, and humans re-invented an extra cone and have three sets.  But the third set is not as well positioned as the one we lost earlier.

g)    There is recent research indicating that some women, but not men, may have four sets (i.e. tetrachromatic because of small variations in the “red” cone).  About 7% of men only have two cones and are “red-green” color-blind.

1.5  COLOR SENSITIVITY OF THE EYE

For our purposes, light is simply an electro-magnetic wave and part of the total spectrum as shown below:

On earth, most of the sun’s light fortuitously occurs in a frequency band to which the atmosphere is transparent as shown below:

Interestingly, the human eye has become adapted to the available light.  Indeed, the eye is most sensitive to that frequency of light radiated by the sun.

1.6  VISION IN THE BRAIN

Pre-processing in the eye is not limited to recording the pixels of an image or even to simple processing such as the identification of colors.  Rather two more sophisticated tasks are also performed, namely:

a)     Edge Detection is performed mostly in the center of retina (fovea).  The luma or black-and-white intensity is high pass filtered and then used to detect the edge. The over and under shoots of the filter step response can cause the well known “mach-bands” effect.  Chroma is also used, especially to verify continuity of the detected edges.  Poor color depth can

cause the well known “false-contours: effects.

b)    Motion Estimation is performed mostly in the periphery of the retina by our peripheral vision.

But the eye is not the only “hard-wired” biological hardware for video processing.  Processing in the brain is not only complex, subject to a variety of interesting quirks, but also not usually subject to our conscious intervention.  The visual cortex, especially the v1 through v4 regions are even more important than our eye and optic nerve as illustrated below:

Some features include

a)     The visual cortex needs visual stimulation early in life to organize itself.  There is a critical period for newborn babies lasting until about 6 months of age.  If during this period the eyes are opaque, perhaps because of cataracts, the babies remain forever blind even though their eyes are later perfectly repaired.  Individuals who have optic nerve damage from birth, which is repaired only when they are adults, remain blind.

b)    Small groups of neurons become wired in neural nets for very specific motion [location in visual field, speed of motion, direction of motion, increasing brightness, panning, zooming, etc.]  Probes were placed in the brain of an anesthetized cat and specific neurons only “fired” under extremely specific images.

1.7  PSYCHO-VISUAL EFFECTS

Some psycho-visual effects which are more or less important in video processing and compression include:

a)     The brain shuts down for an instant when you blink.  Even if light gets through your eyelid, the brain doesn’t process it.

b)    The human brain is able to recognize horizontal motion better than vertical motion.  Our response time and acuity is 3-6% better horizontally.

c)     The human brain is unable to see as much detail in moving objects.

d)    The human brain is much more sensitive to edges than to solid fields.

e)     The eye-brain-perception mechanism has a “persistence of vision” that effectively sees an object for a fraction of a second after it has disappeared.  Some frame rates are

i.    The earliest silent movies before 1920 operated at 12 frames/second which is at the threshold of appearing to have continuous motion.  Later silent movies just before sound was introduced about 1934 operated at 16 frames/second and appeared jerky.

ii.    Modern movies are shown at 24 frames/second but each frame is shown twice or even three times in succession to eliminate flicker.  Conversion to 30 fps is by 3:2 PULLDOWN.

iii.    Broadcast TV is at 50 (PAL/SEECAM) or 60/1.001 (NTSC) interlaced fields/second.  The 1.001 is to prevent interference with audio in color TV.

iv.    Computer monitors operate at 60-120 frames/second progressive.  Beyond about 85 Hz, much less than 1% of people can discern flicker.

2.     COLOR SPACES

2.1  INTRODUCTION

There are two basic “color-space” classification schemes in general usage:

a)     Additive systems which mimic human visual response and are tuned to human anatomy.

i.    The TV and PC monitor systems to include the tri-stimulus {x,y,z}, {L*a*b}, and the various flavors of RGB from different manufacturers to include sRGB.

ii.    The Standard Definition analog broadcast TV broadcast systems as well as the newer HDTV systems  to include {Y, B-Y, R-Y},  {YUV and its phase shifted analog YIQ},  {YCbCr aka YPbPr}, and finally {xvYCC}.

iii.    The scientific systems which represent the actual spectrum to include HSV, HSL, and Munsell.

b)    Subtractive systems which are necessary for printer inks to include CMYK.

2.3  TRI-STIMULUS COLOR SPACE

Because humans have three color receptors, any “apparent” color can be made to appear in our minds by a careful mixture of the three basic colors, red, green, and blue.  A broad enough “yellow” color will excite our red and green “cones” just as much as two pure colors of red and green separately.

In particular, the nerve signal is given by

Diffuse Color Nerve Signal   = Integral { Cone(freq) * DiffuseColor(freq)   d(freq) }

Discrete Color Nerve Signal = Integral { Cone(freq) * DiscreteColor(freq) d(freq) }

as shown below.

These are the three “Gaussian” distributions showing how the human optic nerve is sensitive to light as a function of frequency.  Each bump respresents the separate response from a different type of cone, red green, or blue.

These are the frequencies from two different light sources which enter the eye.  The first graph shows a “diffuse” set of frequencies and the second shows a set of two relatively discrete colors with narrow frequency bands.

Note that the more “diffuse” the frequency spectrum is, the more “white” it appears. Note that the more “discrete” a frequency spectrum is, and the fewer individual frequencies, the more LIKELY it is that the color will be perceived as pure, i.e. monochromatic.

Each frequency from the above light sources has to be multiplied by the response of the “cones” at the back of the eye on the retina as shown below.

These graphs represent how much the optic nerve is stimulated as a function of frequencies present in the input light source.   The total integrated area under each curve provides the information necessary to estimate the color of the input light source.

These nerve signals for each of the three different cones give a tri-stimulus values

X = Integral { Red Cone(freq)    * DiffuseColor(freq)   d(red cone frequency range)     }

Y = Integral { Green Cone(freq) * DiffuseColor(freq)   d(green cone frequency range) }

Z = Integral { Blue Cone(freq)    * DiffuseColor(freq)   d(blue cone frequency range)   }

And these can be combined into the signals

x = X/(X+Y+Z),  y = Y/(X+Y+Z),  and z = 1-(x+y) = Z/(X+Y+Z)

These are really only two independent variables x and y because we have “normalized”  by the sum of all the values.

2.4  CIE CHROMATICITY DIAGRAM

Some items to note on the Commission Internationale de l'Eclairage (CIE) from 1931 Chromaticity Diagram are the following:

a)     The human visual system can’t see colors outside of those indicated.  Each of the visible colors could have been made by MANY, MANY different frequency spectrums.

b)    The center is roughly white, or grey with varying degrees of brightness depending on the value of the “z” axis because the red and blue cones at opposite ends of the visual spectrum have about the same signals, i.e. x = y roughly.  This implies a very broad and diffuse frequency spectrum.

c)     Those colors on the edges could have been made by discrete frequencies.

By creating a single monochromatic color and changing the wavelength between 400 and 700 nm, we get a “spectral locus” of all single colors as shown below.

Now that we have this locus, we can also see that there is no triangle, and therefore no primaries, that encompasses the whole space, and therefore there is no set of colours that can be additively combined to form all other colours.

There is only one way around this problem, and that is to use primaries that can't be found in the spectrum.  The CIE chose three primaries called X,Y and Z which are theoretically defined super-saturated colours, which lie outside the bounds of the spectral locus, and because of this fact the XYZ system never has to use negative values.

2.5  RGB COLOR SPACE

[Hearn 1997, Fig.15.5, p. 568] Amount of RGB primaries needed to display spectral colors, notice negative quantities of red

The experimental evidence creates color matching functions which show how much of each primary is needed to produce a given spectral color.  The negative values for red mean that subtractive matching is required to match light at that wavelength with the RGB primaries.  It is interesting to note that because computer monitors use additive mixtures of red, green, and blue, it is impossible for them to produce wavelengths around 500nm.

We can transform the tri-stimulus values to a linear combination of three nearly pure colors each of which are at discrete frequencies.  The new color is given by:

New Color = a1 * Red + a2 * Green + a3 * Blue

where the constants  a1,a2, and a3 are positive values.

As a consequence, we can transform between an RGB color space and the Tri-stimulus space using:

| X |    | 0.4124 0.3576 0.1805  |  | R |

| Y | = | 0.2126 0.7152 0.0722  |  | G |

| Z |     | 0.0193 0.1192 0.9505  |  | B |

Not all the colors, which we can see can be expressed as a LINEAR combination of the three pure mono chromatic colors, red, green, and blue.  This triangle is called the “gamut.”

2.6  CRT PHOSPHORS

The three discrete central wavelengths are 440 nm for blue, 555 nm for green, and 690 for red.  Note that the “red” phosphor also has “subtractive qualities.

`The coefficients that correspond to the "NTSC" red, green and blue CRT phosphors of 1953 are standardized in ITU-R Recommendation BT. 601-2 (Standard Definition TV).  Contemporary CRT phosphors are standardized in Rec. 709 (High Definition TV).`

.

61 [Foley et al 1993 Plate 15] CRT Monitor Gamut, Slide Film Gamut and Offset Printer Gamut are shown on a CIE chromaticity diagram

2.7  sRGB STANDARD

Recently, Hewlett-Packard and Microsoft combined to create a slight variation of the normal RGB color space that was defined for Broadcast SD TV (NTSC, SEECAM, and PAL).  This “standard-RGB” (sRGB) was intended for use over the Internet and has found a large following.  The white point has a color temperature of D65 but the gamma correction is slightly different.

2.8  YUV STANDARD

The YUV is a color space used for the transmission of both Standard Definition and High Definition TV.  It is directly related to the color signals (Y,I,Q color space) needed to be modulated onto a NTSC or PAL analog broadcast TV signal.  Although it is a linear mapping from RGB, this color space is more efficient needing fewer bits to accurately specify a color.  It is very closely related to YCbCr, the main difference being that YCbCr is restricted to the 8-bit range of 16 to 235/240; the upper limit being larger for chroma.  The YPbPr standard is IDENTICAL to YCbCr except it is not restricted to integer values but rather is expressed as real numbers.

Basically the luma or Y value is the integrated intensity of all the RGB components and is effectively the green value.  The luma most closely corresponds to the brightness on a black and white picture.  The chroma components, Cb and Cr represent the difference between the luma and the blue and red components respectively.  The conversion matrix is

Y     = (77/256)R´ + (150/256)G´ + (29/256)B´                        Intensity (Mostly Green)

Cb  = -(44/256)R´ - (87/256)G´ + (131/256)B´ + 128            Blue – (Red+Green)

Cr   = (131/256)R´ - (110/256)G´ - (21/256)B´ + 128            Red  - (Green+Blue)

R´   = Y + 1.371(Cr - 128)

G´   = Y - 0.698(Cr - 128) - 0.336(Cb - 128)

B´   = Y + 1.732(Cb - 128)

2.9  xvYCC Color Space

The xvYCC is a standard released in October 2005 based on an expanded YUV color system.  This was recently approved in IEC 61966-2-4 and is used in the new HDMO 1.3 specification.  It can display about 1.8 more colors than RGB.

Figure 3 below illustrates xvYCC's increased color gamut over conventional color spaces such as sRGB.

The white triangle gives the limits of the RGB space while the black arrows indicate the extensions allowed by xvYCC.

2.10                HSV COLOR SPACE

The conical representation of the HSV model is well-suited to visualizing the entire HSV color space in a single object.

The HSV Color space has three components, to wit Hue, Saturation and Value.  These components are better described as

a)     Hue – this is the actual color, i.e. red, green, blue or anything in between.  It is expressed as an angle from 0 to 360 degrees around the white point on the CIE Chromaticity Diagram.

b)    Saturation – this is the width of the frequency spectrum.  A diffuse or wide spectrum with lots of colors has a low saturation whereas a monochromatic color has a high saturation.  This is expressed as a percentage from 0 to 100% and represents the distance from the white point on the CIE Chromaticity Diagram.

c)     Value is the brightness of the color and is normalized from 0 to 100% in arbitrary units.

The translation to RGB space is highly non-linear.  This is a more natural, better, more logical system.  The values are illustrated in graph below:

Note that the HSV effectively fits the frequency spectrum to a “Gaussian” as described below:

P(x) = exp{-(x-xave)^2/(2*s^2)} * 1/(s*sqrt(2*pi))

where xave is the average and s is the standard deviation and pi is the constant.

2.11                SUBTRACTIVE COLORS

All of the color ideas previously presented represent SOURCES of light like we might get from light emission from a Cathode Ray Tube (CRT) or a Flat Panel Display (FPD).  But inks on a paper or spotlights with filters on the stage absorb light rather than creating it.  For these colors

C = 1 - Red

M = 1 - Green

Y = 1 – Blue

and K is pure black because that is hard to get exactly right by mixing the others.

The CMYK color space can be transformed to the RGM or Tri-stimulus.

 RGB Color Space CMYK Color Space (Destination Space)

2.12                PRACTICAL COLOR SPACE CONVERSION CAVEATS

Some practical considerations when converting between different color spaces are:

a)     Note that the same color space may be either specified in the digital domain as YCbCr or as a set of voltages in the analog domain as YPbPr.  Note that YCbCr is exactly the same thing as YUV.

b)    How many digital bits does one have?  Broadcast studio standards use 10 bits while commercial stuff for the home user uses 8 bits.  How do you round between the two?

c)     Is the 8-bit digital range from 0 to 255 as for PC applications or is it the Standard Definition Broadcast application of 0 to 235 for the brightness or luma and 0 to 240 for the additive colors or chroma

d)    Is the value for the luma in exactly the same spatial (x,y) location as for the chroma value or is there ½ or ¼ pixel offset?

e)     Has the Gamma correction been included in the conversion or not?

f)     What is the white balance?  Note that the color temperature for RGB for Standard Definition is slightly different than for High Definition.

g)    If you are writing a program in “C” to handle video pixel data, PLEASE use the data type “unsigned character” or “integer” as anything else, especially the normal “character” will not work.

3.     VIDEO CORRECTIONS

Note that the color white is really just a very diffuse (broad frequency spectrum) signal which has nearly equal parts of all colors present.   But this is not the whole story because not all whites are equal.   In fact, the choice of a particular “white” color is chosen based on the “Color Temperature” of a “Black Body.”

A black body is theoretically one which doesn’t REFLECT any light so that it appears black.  In practice, we create black bodies using large cavities with very small openings.  Not much light can get into a small opening so not much is reflected out.  Almost all the light coming out has to have been EMITTED by the insides of the cavity.   The diagram above shows the radiation for a black body near the temperature of the sun.

The equation of the curve below is the “Planck” equation.

3.2  COLOR TEMPERATURE

As the temperature goes up, the peak of the black body radiation shifts to higher frequencies, the black body gets “red-hot”, and then hotter to “white-hot” and eventually hotter still to “blue-hot” and on into the ultraviolet.  Note the heating elements on a stove go from dark to infra-red and then to red.  They aren’t allowed to get hotter because they would start to melt.   Note that the curve is very diffuse.

Some common examples of color temperatures for various settings are:

1700 K: Light of matches

1850K : a candle  - romantic dinner captured by Kodak film

2800 K: tungsten lamp (ordinary household bulb whatever its power)¨

3350K : studio "CP" light

3400 K: studio lamps, photofloods,

5000 K: Daylight°

5500 K: average daylight, electronic flash (can vary between manufacturers)

5770 K: effective sun temperature

6420 K: Xenon arc lamp

6500 K: Daylight°

9300 K: TV screen (analog)  - Microsoft window screen

28000 - 30000 K: a lightning bolt

General computer users, users of digital cameras should set their temperature to “D65” or 6500 degrees Kelvin.  Standard Definition TV (NTSC, SEECAM, and PAL) specify at 6,504K, but typically manufacturers ship their TVs with color temperatures ranging from about 7,000K to 12,000K, on the blue side of the color spectrum, to make sets as bright as possible to stand out on a brightly lit showroom sales floor.  High Definition TV (1080i, 720p, etc.) specifies a slightly hotter color temperature than Standard Definition.

3.3  FLICKER

The flicker problem was well known - the jerky silent movies of the day ran at a frame rate of 16 whereas the new talkies with their frame rate of 24 gave virtually flicker-free movement.  In Broadcast TV, frames are transmitted in two halves at twice the frame rate.  This interlacing also reduces flicker.

As the level of luminance increases, the refresh rate must also increase to prevent flicker. Increasing the field of view increases the probability that an observer will perceive flicker since the peripheral visual system is more sensitive to flicker than the fovea. There is a wide range of sensitivities to flicker between individuals, and also a daily variation within individuals. The threshold for detection of flicker tends to reduce at night; q.v Boff and Lincoln, 1988.

3.4  TIME BASE CORRECTION (Jitter Removal)

This compensates for stretching VCR tape or other effects that change the time between two successive lines.

3.5  GAMMA CORRECTION

Plot of the sRGB intensities versus sRGB numerical values (red), and this functions slope in log-log space (blue) which is the effective gamma at that point.

4.     REFERENCES

Optics

The Eye and Color

CCDs and the Bayer Pattern

The Brain and Visual Psychology

Tri-Stimulus Colors

RGB Color Space

14.  Blackwell, J. Opt. Society America, v 36, p624-643, 1946.

15.  Middleton, Vision Through the Atmosphere, U. Toronto Press, Toronto, 1958

CRT Monitors

Color Spaces

Flicker

Gamma Correction