慶應SFC 2003年 総合政策学部 英語 大問1 全文(正答済み)

 You have probably taken many tests in your life. Perhaps some questions have occurred to you as you struggled through a test. How good is the test I am taking? Does it really work? These questions could and occasionally do result in long hours of useless discussion. Subjective opinions, hunches, and personal biases may lead either to extravagant claims regarding what a particular test can accomplish, or to its stubborn rejection. The only way questions such as these can be conclusively answered is by empirical trial. The objective evaluation of tests primarily involves the determination of the reliability and the validity of the test in specified situations.

 In terms of testing, reliability means consistency. Test reliability refers to the consistency of scores obtained by the same persons when retested with the same or equivalent test. For example, if a child receives an IQ of 110 on Monday and an IQ of 80 when retested on Friday, it is obvious that little confidence can be placed in either score. Likewise, if on one vocabulary test a student gets 40 words correct and on another test of similar difficulty the same student gets only 20 words correct, then neither test can be taken as a dependable measure of the student’s verbal ability. In both of these examples, it is possible that only one of the two scores is in error, but this could only be demonstrated by further retesting. Whether one of the scores or neither is an adequate measure of the individual’s ability cannot be established without additional information.

 Before a test is released for general use, a thorough, objective check of its reliability should be carried out. There are different types of test reliability, as well as methods of measuring reliability. Reliability can be checked with references to fluctuations over time, the particular selection of items or behavior sample constituting the test, the role of different examiners or scorers, and other aspects of the testing situation. It is essential to specify the type of reliability and the method employed to determine it, because the same test may vary in these different aspects. The number and nature of individuals on whom reliability was checked should likewise be considered. With such information, the test user should be able to predict how reliable the test would be for any given group.

 Undoubtedly the most important question to be asked about any test concerns its validity. Validity refers to the degree to which the test actually measures what it intends to measure. Validity provides a direct check on how well a test fulfills its function. The determination of validity usually requires independent, external criteria of whatever the test is designed to measure. For example, if a medical aptitude test is to be used in selecting promising applicants for medical school, ultimate success in medical school would be a criterion. The process of determining the validity of such a test would begin by administering the test to a large group of students at the time of their admission to medical school. Later, some measure of performance in medical school would be obtained for each student on the basis of grades, ratings by instructors, success or failure in completing medical training, and similar criteria. Such a composite measure would constitute the criterion against which each student’s initial test score is then correlated. The measure of this correlation is called the validity coefficient. A high correlation between the initial test scores and measure of each student’s performance would signify that those individuals who scored high on the test had been relatively successful in medical school. This would indicate a high validity coefficient. A low correlation would show little correspondence between test scores and criterion measure and would indicate a poor validity coefficient for the test. The validity coefficient enables researchers to determine how closely any individual’s criterion performance could be predicted from that individual’s test score.

 In a similar manner, tests designed for other purposes can be validated against appropriate criteria. A vocational aptitude test, for example, can be validated against the on-the-job success of a trial group of new employees. A pilot aptitude test can be validated against achievement in flight training. Tests designed for broader and more varied uses are validated against a number of criteria and their validity can be established only by the gradual accumulation of data from many different kinds of investigations.

 There is an apparent paradox in the concept of test validity that needs to be addressed. If it is necessary to follow up the subjects of a test, or in other ways try to obtain independent measure of what the test is trying to predict, then why not dispense with the test? The answer is to be found in the distinction between the validation control group and the groups on which the test will eventually be used for operational purposes. Before a particular test is ready for general use, its validity must be established on a representative sample of subjects. The scores of these persons are not used for operational purposes, but serve only in the process of testing the test. If the test proves valid on a control group, it can then be used on other groups without resorting back to other criterion measures.

 It might be argued that tests themselves are not needed; that over time the criterion measures will indicate the same information that a given test is trying to predict. But such a procedure would be virtually wasteful of time and energy as to be prohibitive in most instances. Imagine the consequences, for example, if all of the applicants for a job were hired, or all of the students who wish to attend a school were admitted, and then a final decision was made only after time had determined which individuals were most likely to do the job well or satisfactorily finish the schooling. It is the very wastefulness of this procedure and its emotional impact on individuals that tests are designed to minimize. By means of a test, a person’s present level of required skills, knowledge, and other relevant characteristics can be assessed with a determinable margin of error. The more reliable and valid the test, the smaller will be this margin of error.

AO入試・小論文に関するご相談・10日間無料添削はこちらから

「AO入試、どうしたらいいか分からない……」「小論文、添削してくれる人がいない……」という方は、こちらからご相談ください。
(毎日学習会の代表林が相談対応させていただきます!)

コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です