I. What is Intelligence?
Long story short: There is
no universally accepted definition of intelligence. In our own intuitive
understanding of it, we might say that someone who is intelligent is able to
make logical reasonable decisions, to size up situations quickly and well, has
read extensively, comes up with good ideas and is an expert on a particular
subject. But how do we measure intelligence objectively? There is controversy over how well tests capture intelligence, how closely those pen-and-paper measurements of intelligence apply to success in everyday life and how much genetics and experience contribute to intelligence. Given just those questions, you can see why it's not a simple thing to come up with a definition of intelligence that will satisfy everyone. For our purposes, though, we'll use very
basic, simple and general definition:
Intelligence is the capacity to learn from experience and adapt
successfully to the environment.
II. The Tests:
Individually Administered
One thing to keep in mind is that not that many of us have actually ever taken a formal intelligence test. Sure we may have taken a test in a magazine or on-line or the SAT and perhaps gotten some sort of feedback about how smart we are, maybe even with a number attached. But those really weren't formal intelligence tests. Formal intelligence tests are individually administered, never in groups, by a trained professional. Why and how did these formal tests arise? Well, first, let's look at the history of intelligence testing and why it was even begun in the first place.
A. Binet
The first test was the Binet-Simon
test, developed by French psychologists Alfred Binet and Theodore Simon, who together developed the test and Binet-Simon
intelligence scale, first published in 1905. The created it at the commission of the French Ministry of
Education to identify students who needed special help in coping with the
school curriculum and place them in the appropriate grade level with the proper
instructional help in the aftermath of laws requiring mandatory public
education for all children under the age of 15 in 1882. Before this, most schoolchildren came
from upper-class families. With the requirement of education for all children, schools
had to educate a much more diverse group of children, some with little to no
prior education regardless of age and some who appeared retarded or otherwise
incapable of benefiting from education. Teachers had no way of knowing which of
the students experiencing had true retardation and which simply had behavioral
problems or poor prior education. The Binet-Simon test was to help assign
children to the appropriate class.
Binet did not coin the term I.Q. (intelligence quotient). Binet
and Simon noted that children follow the same course of intellectual
development but develop at different rates. They then employed the concept of mental
age, which was independent of chronological
age. If a 9 year old and a 12 year
old both displayed mastery of knowledge, problem-solving, and other skills that
were age-appropriate for a 10 year old, then they both would be assigned a
mental age of 10. However, though
this test was a breakthrough in psychometrics at the time, Binet cautioned
these scores should not be taken too literally and relied upon too heavily because
of the plastic nature of intelligence and the inherent margin of error in the
test.
B. Stanford-Binet
The Stanford-Binet test
was developed in 1916 at Stanford University (hence, Stanford-Binet), by
the American psychologist, Lewis Terman.
It used the Binet test as its basis but was substantially revised, it was
not a mere translation, with some items dropped and new items added. The
result was an ŇAmericanizedÓ version of the test, with questions that were
culturally relevant to American children.
It was Terman that coined the term I.Q. (intelligence
quotient) which was calculated by dividing
the mental age by the chronological age and multiplying the quotient by 100
(Mental Age/Chronological Age x 100 = IQ). However, this original definition led to potential problems
of interpretation. For instance, since the original versions of the Binet-Simon
test and intelligence scale were only intended to be administered to French
school children for scholastic purposes, one could in theory be 30 years of age
and take the test ands score a perfect maximum mental age of 15. Then according
to TermanŐs new formula 15/30 x 100 = 50 which would be the IQ of a person with
severe mental retardation. Eventually, the IQ definition was changed to reflect
the comparison of the personŐs test score with the mean score of other people his
or her own age. Over the years the
test became less favored to other tests. The last version of the Stanford-Binet
test the S-B 4, had 15 subtests
and was intended for use with ages 2-23.
The current standard version of the Stanford-Binet 5, has been
substantially revised to 10 subtests with an applicable age range of
2-85+. This degree of reorganization
and revision was likely inspired by its chief competitor and more widely used intelligence
test, the Wechsler.
C. Wechsler
David Wechsler devised the
Wechsler-Bellevue test and intelligence scale in 1939 to facilitate the
assessment of his patients for whom he thought the Stanford-Binet test was
yielding unsatisfactory results. That initial test was eventually developed
into multiples specialized IQ tests for targeted subpopulations such as the Wechsler
Adult Intelligence Scale (WAIS; ages 16 and above), the Wechsler
Intelligence Scale for Children (WISC; for
ages 6-16), and the Wechsler Preschool and Primary Scale of
Intelligence (WPPSI; for ages 2.5 to 7.25
years). The WAIS has two major
scales, verbal and performance (non-verbal) with 14 subtests (7 verbal and 7
performance). The test yields a verbal IQ, a performance IQ and a combined IQ
score. The advantages of the
Wechsler tests over the early Stanford-Binet were its performance scale (which
was independent of language skills and reading and writing thus enabling the
testing and assessment of the illiterate, verbally impaired and otherwise
verbally noncommunicative) and the different test versions which were tailored
to specific aged populations.
The assignment of IQ scores, as mentioned earlier, are calculated from the comparison of an individual's test score with the mean score of the cohort of other people his or her own age. The test performance of the cohort is
graphed and calculated statistically (see Chapter Two, section VI, for a
refresher). The group mean is
labeled an IQ of 100 and average intelligence. One standard deviation is subdivided into 15 units so that
plus one standard deviation is an IQ of 115 and minus one standard deviation is
an IQ of 85. This range 85-115 is considered the normal range (if you remember from the section on statistics, this would include 2/3Õs of all subjects taking the test), with 85-99 being low average and 101-115 being high average. Scores that are plus or minus 2 standard deviations from the mean are considered exceptional for obviously different reasons. Scores from 116 to 129 are considered superior, with scores above 130 considered gifted or "genius" in everyday language. Scores from 84 to 71 are considered
borderline and scores 70 and below are labeled retarded.
III. The Tests: Group
Aptitude
The tests that most of us
are actually familiar taking are not, as we mentioned previously intelligence
tests. The large group
administered tests are aptitude tests. They are more restricted than IQ tests
in the range of skills and abilities that they attempt to psychometrically assess.
A. US Army Alpha & Beta
Tests
The Alpha and Beta tests
were the first aptitude tests. They were inspired by the success of the
Stanford-Binet and developed by Robert Yerkes for quick large scale assessment
of US Army recruits during World War I.
The Alpha was a written test the Beta was an oral version of test for
those that could not read. The
tests were successful within their limited intended goal of improving
selection, placement, and training for specific occupations within the army and
proved the viability of large group psychometric testing to measure certain
task- or endeavor-specific skills.
B. All The Rest
The Alpha and Beta tests,
therefore were the forerunners of all the aptitude tests we see today, from the
SAT, to the MCAT, the LSAT, the GRE and other test to measure the likelihood of
successful performance in a variety of endeavors. The first versions of the SAT that had started in 1901 were
nothing like an aptitude test, administered by a small number elite colleges to
keep out undesirable students (minorities and foreigners) by testing general
knowledge. It was never a widely taken test in those days. But in 1926 a new
version based on the Army Alpha test was developed by Carl Brigham, a former
assistant of Robert Yerkes, who taught at Princeton. Brigham adapted the Alpha test (mainly by making it more difficult) with intentions for widespread use as a college admissions test. It was first administered on a trial basis to a few thousand college applicants in 1926. This was the beginning of the popularity of aptitude testing, but the SAT did not begin to gain wide acceptance and popularity until the 1950s. SAT scores by themselves are better correlated with high school performance than college performance. However when SAT scores, high school grades and extracurricular activities are combined, there is a better correlation with college performance.
There is
some question as to how effective paid test preparation services are. Evidence
suggests that by themselves they have a minimal impact on test performance
(30-50 points on the SAT). However, many people who take them are more highly
self-motivated and were likely to retake the test anyway (and retaking the
tests also tends to raise scores in absence of a test-preparation class) as
well as practice on their own.
IV. What Goes Into A
Test?
In order to make any sense of a test we have to look at the construction of the test (are the results consistent, does it seem to measure what it's supposed to) and a statistical rationale and framework comparing and interpreting an individual score. A great deal of work by psychologists
specializing in psychometrics goes into the design and development of an
intelligence or aptitude test.
A. Standardization
The first thing we'll cover is standardization and norms. To
understand norms and statistical assessment one first needs to understand
standardization. Standardization is
the systematic process of developing, administrating, and scoring tests. This consists of testing a group of people determine the range of scores that are typically attained. Then where an individual participant's score falls can be compared to the group's performance. With standardization the group must be a representative sample and therefore reflect the population for which the test was designed. These procedures are supposed to ensure that all participants are tested under the same conditions, that they are all given equal opportunity to determine the correct answer, and that all scores are established and interpreted using appropriate criteria. The group's performance is the basis for the tests norms, which are the scores and statistical values
themselves from the representative group.
B. Reliability
Reliability is a measure of the consistency and dependability of a test's ability to represent a participant's knowledge or ability. This requires the analysis of a participant's scores across such factors as time, different administrations of the same test (test-retest reliability), different tasks or questions that measure the same
skill (split half or alternate forms reliability), or different score raters of
the same performance question. To understand the importance of reliability To
understand the basics of test reliability, think of a bathroom scale that gave
you drastically different readings every time you stepped on it regardless of
whether your had gained or lost weight. That scale would be unreliable and
therefore useless.
C. Validity
Validity refers to the degree that an individual's test score accurately reflects the knowledge and/or skills that a test is intended to measure (called content
or construct
validity) or
predicts the performance it is designed to forecast (criterion or predictive
validity). Test
validity is a necessary precondition to test reliability. If a test is invalid,
then there is no point in discussing reliability because test validity is
required before reliability can be considered in any meaningful way. Remember
the bathroom scale. Let's say that the reason for taking a weight measurement is to assess your cardiovascular health. But both skinny and fat
people can be in poor cardiovascular shape, so that rationale is invalid as a
measure of your cardiovascular health. Therefore, the issue of whether or not
the scale is reliable becomes pointless.
However, if a test is not reliable it is also not valid. We can change the example to weighing yourself to predict if you can fit into an old pair of jeans. Now the basic concept may be valid, but if the scale is unreliable, then the rationale behind just that specific instance of weighing yourself to predict whether you can fit into the old pair of jeans would become invalid, since your weighing yourself wouldn't predict if you could fit into your jeans. But as soon as you get a reliable scale, the test would become valid again.
V. Theories Of
Intelligence: Three Examples
There are several different theories of intelligence. We'll highlight three that have clear differences in how they view intelligence and the numbers or types of intelligence.
A. Spearman's (1927) General Intelligence
Charles Spearman was an
early psychometric psychologist who believed that there was a single basic general
intelligence which he called G, a single dominant broad intellectual ability factor. He came to that position because he
found that the grades of schoolchildren across a wide variety of seemingly
unrelated subjects were strongly positively correlated (associated). He
believed that G interacted with a factor specific to each individual mental task, S, which was the individual ability that would make a person more or less skilled at a given mental task. In other words, Spearman's idea was based on the observation that if a person has a good vocabulary, there is a better than 50-50 chance that they have a good memory and that they are also good at math. Likewise, if a person is good at math, they are also probably likely to have a good vocabulary or memory. These associations aren't perfect, but they are usually true. General intelligence, G, was the conceptual explanation for why people's scores generally tend to correlate across subjects, and the specific abilities or skills, S, explained the differences in the individual scores. So, there's one S for math, a different S for vocabulary, a different S for memory, and so forth for each
type of cognitive task.
B. Gardner's Frames of Mind
Howard Gardner (1983)
proposed a theory of independent multiple intelligences,
originally seven of them:
Gardner came to his point of view because he had come to consider standard tests or other assessments used to measure IQ to be inconclusive. He argued the IQ number did not predict or reflect school outcomes or success in life. Gardner holds each individual has varying levels of these different intelligences, and this accounts fro each personÕs unique cognitive profile. In a sense, comparing his point of view to Spearman's, Gardener would say there is no G, only S's and those S's are more than just a skill or ability, but an independent form of intelligence unto themselves.
C. Sternberg's Triarchic Theory
Robert Sternberg (1985)
proposed in his Triarchic theory that there are three forms of
intelligence: analytical, creative and practical. In Sternberg's views current intelligence testing does not test all three forms of intelligence.
He holds that current psychometric tests only appreciably tap analytical intelligence which allows an individual to quickly break down problems and be able
to see solutions. This form of
intelligence also consisted of numerous subcomponents which enabled this
analytical ability, but the key is that they all serve the process of analyzing
problems. While people high in this form of intelligence can break down
problems they do so from the basis of their acquired knowledge. They may not
necessarily be good at creating new ideas or knowledge Creative
intelligence involves synthetic thinking,
the ability to put together knowledge and understanding in new and intuitive
ways. Often, individuals with the highest conventionally measured IQs are not
good at this form of thinking. And
people with high levels of creative intelligence, such as artists, are often
unidentified by conventional IQ tests because there are not currently any tests
that can sufficiently measure the attributes involved in creating new ideas and
solving new problems. In practical intelligence is basically related to street smarts or common sense. It involves the ability to
apply creative and analytical intelligence to everyday situations. Those high
in practical intelligence are superb in their ability to succeed in any
setting. Even if they are limited in their creative and analytical
intelligence, they are able to use these skills to their best advantage. In the
end, Sternberg reminds us that an individual is not necessarily restricted to
having excellence in only one of these three intelligences. Many people may
have integrated all three very well and even have high levels of all three
intelligences.