I. What is Intelligence?
Long story short: There is no universally accepted definition of intelligence. In our own intuitive understanding of it, we might say that someone who is intelligent is able to make logical reasonable decisions, to size up situations quickly and well, has read extensively, comes up with good ideas and is an expert on a particular subject. But how do we measure intelligence objectively? There is controversy over how well tests capture intelligence, how closely those pen-and-paper measurements of intelligence apply to success in everyday life and how much genetics and experience contribute to intelligence. Given just those questions, you can see why it's not a simple thing to come up with a definition of intelligence that will satisfy everyone. For our purposes, though, we'll use very basic, simple and general definition: Intelligence is the capacity to learn from experience and adapt successfully to the environment.
II. The Tests:
One thing to keep in mind is that not that many of us have actually ever taken a formal intelligence test. Sure we may have taken a test in a magazine or on-line or the SAT and perhaps gotten some sort of feedback about how smart we are, maybe even with a number attached. But those really weren't formal intelligence tests. Formal intelligence tests are individually administered, never in groups, by a trained professional. Why and how did these formal tests arise? Well, first, let's look at the history of intelligence testing and why it was even begun in the first place.
The first test was the Binet-Simon test, developed by French psychologists Alfred Binet and Theodore Simon, who together developed the test and Binet-Simon intelligence scale, first published in 1905. The created it at the commission of the French Ministry of Education to identify students who needed special help in coping with the school curriculum and place them in the appropriate grade level with the proper instructional help in the aftermath of laws requiring mandatory public education for all children under the age of 15 in 1882. Before this, most schoolchildren came from upper-class families. With the requirement of education for all children, schools had to educate a much more diverse group of children, some with little to no prior education regardless of age and some who appeared retarded or otherwise incapable of benefiting from education. Teachers had no way of knowing which of the students experiencing had true retardation and which simply had behavioral problems or poor prior education. The Binet-Simon test was to help assign children to the appropriate class. Binet did not coin the term I.Q. (intelligence quotient). Binet and Simon noted that children follow the same course of intellectual development but develop at different rates. They then employed the concept of mental age, which was independent of chronological age. If a 9 year old and a 12 year old both displayed mastery of knowledge, problem-solving, and other skills that were age-appropriate for a 10 year old, then they both would be assigned a mental age of 10. However, though this test was a breakthrough in psychometrics at the time, Binet cautioned these scores should not be taken too literally and relied upon too heavily because of the plastic nature of intelligence and the inherent margin of error in the test.
The Stanford-Binet test was developed in 1916 at Stanford University (hence, Stanford-Binet), by the American psychologist, Lewis Terman. It used the Binet test as its basis but was substantially revised, it was not a mere translation, with some items dropped and new items added. The result was an Americanized version of the test, with questions that were culturally relevant to American children. It was Terman that coined the term I.Q. (intelligence quotient) which was calculated by dividing the mental age by the chronological age and multiplying the quotient by 100 (Mental Age/Chronological Age x 100 = IQ). However, this original definition led to potential problems of interpretation. For instance, since the original versions of the Binet-Simon test and intelligence scale were only intended to be administered to French school children for scholastic purposes, one could in theory be 30 years of age and take the test and score a perfect maximum mental age of 15. Then according to Terman's new formula 15/30 x 100 = 50 which would be the IQ of a person with severe mental retardation. Eventually, the IQ definition was changed to reflect the comparison of the person's test score with the mean score of other people his or her own age. Over the years the test became less favored to other tests. The last version of the Stanford-Binet test the S-B 4, had 15 subtests and was intended for use with ages 2-23. The current standard version of the Stanford-Binet 5, has been substantially revised to 10 subtests with an applicable age range of 2-85+. This degree of reorganization and revision was likely inspired by its chief competitor and more widely used intelligence test, the Wechsler.
David Wechsler devised the Wechsler-Bellevue test and intelligence scale in 1939 to facilitate the assessment of his patients for whom he thought the Stanford-Binet test was yielding unsatisfactory results. That initial test was eventually developed into multiples specialized IQ tests for targeted subpopulations such as the Wechsler Adult Intelligence Scale (WAIS; ages 16 and above), the Wechsler Intelligence Scale for Children (WISC; for ages 6-16), and the Wechsler Preschool and Primary Scale of Intelligence (WPPSI; for ages 2.5 to 7.25 years). The WAIS has two major scales, verbal and performance (non-verbal) with 14 subtests (7 verbal and 7 performance). The test yields a verbal IQ, a performance IQ and a combined IQ score. The advantages of the Wechsler tests over the early Stanford-Binet were its performance scale (which was independent of language skills and reading and writing thus enabling the testing and assessment of the illiterate, verbally impaired and otherwise verbally noncommunicative) and the different test versions which were tailored to specific aged populations.
The assignment of IQ scores, as mentioned earlier, are calculated from the comparison of an individual's test score with the mean score of the cohort of other people his or her own age. The test performance of the cohort is graphed and calculated statistically (see Chapter Two, section VI, for a refresher). The group mean is labeled an IQ of 100 and average intelligence. One standard deviation is subdivided into 15 units so that plus one standard deviation is an IQ of 115 and minus one standard deviation is an IQ of 85. This range 85-115 is considered the normal range (if you remember from the section on statistics, this would include 2/3Õs of all subjects taking the test), with 85-99 being low average and 101-115 being high average. Scores that are plus or minus 2 standard deviations from the mean are considered exceptional for obviously different reasons. Scores from 116 to 129 are considered superior, with scores above 130 considered gifted or "genius" in everyday language. Scores from 84 to 71 are considered borderline and scores 70 and below are labeled retarded.
III. The Tests: Group
The tests that most of us are actually familiar taking are not, as we mentioned previously intelligence tests. The large group administered tests are aptitude tests. They are more restricted than IQ tests in the range of skills and abilities that they attempt to psychometrically assess.
A. US Army Alpha & Beta
The Alpha and Beta tests were the first aptitude tests. They were inspired by the success of the Stanford-Binet and developed by Robert Yerkes for quick large scale assessment of US Army recruits during World War I. The Alpha was a written test the Beta was an oral version of test for those that could not read. The tests were successful within their limited intended goal of improving selection, placement, and training for specific occupations within the army and proved the viability of large group psychometric testing to measure certain task- or endeavor-specific skills.
B. All The Rest
The Alpha and Beta tests, therefore were the forerunners of all the aptitude tests we see today, from the SAT, to the MCAT, the LSAT, the GRE and other test to measure the likelihood of successful performance in a variety of endeavors. The first versions of the SAT that had started in 1901 were nothing like an aptitude test, administered by a small number elite colleges to keep out undesirable students (minorities and foreigners) by testing general knowledge. It was never a widely taken test in those days. But in 1926 a new version based on the Army Alpha test was developed by Carl Brigham, a former assistant of Robert Yerkes, who taught at Princeton. Brigham adapted the Alpha test (mainly by making it more difficult) with intentions for widespread use as a college admissions test. It was first administered on a trial basis to a few thousand college applicants in 1926. This was the beginning of the popularity of aptitude testing, but the SAT did not begin to gain wide acceptance and popularity until the 1950s. SAT scores by themselves are better correlated with high school performance than college performance. However when SAT scores, high school grades and extracurricular activities are combined, there is a better correlation with college performance.
There is some question as to how effective paid test preparation services are. Evidence suggests that by themselves they have a minimal impact on test performance (30-50 points on the SAT). However, many people who take them are more highly self-motivated and were likely to retake the test anyway (and retaking the tests also tends to raise scores in absence of a test-preparation class) as well as practice on their own.
IV. What Goes Into A
In order to make any sense of a test we have to look at the construction of the test (are the results consistent, does it seem to measure what it's supposed to) and a statistical rationale and framework comparing and interpreting an individual score. A great deal of work by psychologists specializing in psychometrics goes into the design and development of an intelligence or aptitude test.
The first thing we'll cover is standardization and norms. To understand norms and statistical assessment one first needs to understand standardization. Standardization is the systematic process of developing, administrating, and scoring tests. This consists of testing a group of people determine the range of scores that are typically attained. Then where an individual participant's score falls can be compared to the group's performance. With standardization the group must be a representative sample and therefore reflect the population for which the test was designed. These procedures are supposed to ensure that all participants are tested under the same conditions, that they are all given equal opportunity to determine the correct answer, and that all scores are established and interpreted using appropriate criteria. The group's performance is the basis for the tests norms, which are the scores and statistical values themselves from the representative group.
Reliability is a measure of the consistency and dependability of a test's ability to represent a participant's knowledge or ability. This requires the analysis of a participant's scores across such factors as time, different administrations of the same test (test-retest reliability), different tasks or questions that measure the same skill (split half or alternate forms reliability), or different score raters of the same performance question. To understand the importance of reliability To understand the basics of test reliability, think of a bathroom scale that gave you drastically different readings every time you stepped on it regardless of whether your had gained or lost weight. That scale would be unreliable and therefore useless.
Validity refers to the degree that an individual's test score accurately reflects the knowledge and/or skills that a test is intended to measure (called content or construct validity) or predicts the performance it is designed to forecast (criterion or predictive validity). Test validity is a necessary precondition to test reliability. If a test is invalid, then there is no point in discussing reliability because test validity is required before reliability can be considered in any meaningful way. Remember the bathroom scale. Let's say that the reason for taking a weight measurement is to assess your cardiovascular health. But both skinny and fat people can be in poor cardiovascular shape, so that rationale is invalid as a measure of your cardiovascular health. Therefore, the issue of whether or not the scale is reliable becomes pointless.
However, if a test is not reliable it is also not valid. We can change the example to weighing yourself to predict if you can fit into an old pair of jeans. Now the basic concept may be valid, but if the scale is unreliable, then the rationale behind just that specific instance of weighing yourself to predict whether you can fit into the old pair of jeans would become invalid, since your weighing yourself wouldn't predict if you could fit into your jeans. But as soon as you get a reliable scale, the test would become valid again.
V. Theories Of
Intelligence: Three Examples
There are several different theories of intelligence. We'll highlight three that have clear differences in how they view intelligence and the numbers or types of intelligence.
A. Spearman's (1927) General Intelligence
Charles Spearman was an early psychometric psychologist who believed that there was a single basic general intelligence which he called G, a single dominant broad intellectual ability factor. He came to that position because he found that the grades of schoolchildren across a wide variety of seemingly unrelated subjects were strongly positively correlated (associated). He believed that G interacted with a factor specific to each individual mental task, S, which was the individual ability that would make a person more or less skilled at a given mental task. In other words, Spearman's idea was based on the observation that if a person has a good vocabulary, there is a better than 50-50 chance that they have a good memory and that they are also good at math. Likewise, if a person is good at math, they are also probably likely to have a good vocabulary or memory. These associations aren't perfect, but they are usually true. General intelligence, G, was the conceptual explanation for why people's scores generally tend to correlate across subjects, and the specific abilities or skills, S, explained the differences in the individual scores. So, there's one S for math, a different S for vocabulary, a different S for memory, and so forth for each type of cognitive task.
B. Gardner's Frames of Mind
Howard Gardner (1983) proposed a theory of independent multiple intelligences, originally seven of them:
Gardner came to his point of view because he had come to consider standard tests or other assessments used to measure IQ to be inconclusive. He argued the IQ number did not predict or reflect school outcomes or success in life. Gardner holds each individual has varying levels of these different intelligences, and this accounts fro each person's unique cognitive profile. In a sense, comparing his point of view to Spearman's, Gardener would say there is no G, only S's and those S's are more than just a skill or ability, but an independent form of intelligence unto themselves.
C. Sternberg's Triarchic Theory
Robert Sternberg (1985) proposed in his Triarchic theory that there are three forms of intelligence: analytical, creative and practical. In Sternberg's views current intelligence testing does not test all three forms of intelligence. He holds that current psychometric tests only appreciably tap analytical intelligence which allows an individual to quickly break down problems and be able to see solutions. This form of intelligence also consisted of numerous subcomponents which enabled this analytical ability, but the key is that they all serve the process of analyzing problems. While people high in this form of intelligence can break down problems they do so from the basis of their acquired knowledge. They may not necessarily be good at creating new ideas or knowledge Creative intelligence involves synthetic thinking, the ability to put together knowledge and understanding in new and intuitive ways. Often, individuals with the highest conventionally measured IQs are not good at this form of thinking. And people with high levels of creative intelligence, such as artists, are often unidentified by conventional IQ tests because there are not currently any tests that can sufficiently measure the attributes involved in creating new ideas and solving new problems. In practical intelligence is basically related to street smarts or common sense. It involves the ability to apply creative and analytical intelligence to everyday situations. Those high in practical intelligence are superb in their ability to succeed in any setting. Even if they are limited in their creative and analytical intelligence, they are able to use these skills to their best advantage. In the end, Sternberg reminds us that an individual is not necessarily restricted to having excellence in only one of these three intelligences. Many people may have integrated all three very well and even have high levels of all three intelligences.