Base+group+1


 * //Chapter 2 – “Reliability of Assessment”//**

Reliability refers to the consistency with which a test measures whatever it’s measuring

-Consistency appears in 3 varieties: 1. Stability Reliability (Test-Retest) -consistency of results among different testing occasions -no new knowledge given between tests -Correlation coefficient: reflects the degree of similarity between students’ scores on the 2 tests (1.0 has a strong relationship, 0 has weak relationship) 2. Alternate Form Reliability -Consistency of results among 2 or more forms of a test -Typically employed in high school diploma tests and exams governing entry to a profession -Correlation coefficient used 3. Internal Consistency -Consistency in the way an assessment’s items function -Encountered most frequently -there are different formulae for computing a test’s I.C. -Kuder-Richardson Procedure: multiple choice tests & Cronbach’s coefficient alpha: essay tests

Standard Error of Measurement- reflection of the consistency of an individual’s scores if a given assessment procedure were administered again, and again, and again -the smaller the reliability coefficient the larger the SEM

1.  There are three types of reliability evidence, Stability Reliability, Alternate Form, and Internal Consistency, that are not interchangeable. 2.  Learning about reliability is important because as a teacher, you may be called upon to explain to parents the meaning of a student’s standardized test scores, and you’ll want to know how reliable the test is. 3.  The standard error of measurement helps remind teachers that students’ test scores are never exact.
 * Most Important Ideas:**

1.  What are the three ways to determine the reliability of an assessment? 2.  In a group of 4, decide what you would do in this situation: One of your strongest students has recently received her scores on a nationally standardized achievement test used in your school district. Her subtest scores were in the 90th percentile in every subject (Language Arts, Science, and Social Studies) except Math, which she was in the 80th percentile. Her test scores and class work in Math have been above average thus far. You have just received a phone call from her father asking for a conference. After researching online about test reliability, he feels the math subtest score is unreliable. How would you respond to his concerns in the conference?
 * Discussion:**

Chapter 3

__Validity:__ The most significant concept in assessment.
 * //3 Types of Validity Evidence//**

// 1. ////Content-Related Evidence of Validity:// §  //Refers to the extent which an assessment procedure adequately represents the content of the curricular aim being measured.// §  //A test should be viewed as the set of skills or knowledge embraced by the teacher’s instructional objectives.// §  //After identifying the content of the curricular aim, the teacher can create an assessment that will identify the content properly.// // 2. ////Criterion-Related Evidence of Validity:// §  //The degree to which performance on an assessment procedure accurately predicts a student’s performance on an external criterion.// ·  //For example: SAT// o  //High SAT score = high GPA in college// o  //Low SAT score = low GPA in college// // 3. ////Construct-Related Evidence of Validity:// §  //Test results, on predictor tests as well as on any educational assessment procedures, should always be used to make better educational decisions.// §  //Educators tend to make one or more hypotheses about students’ performances on the test they are gathering construct related evidence of validity and then they gather experimental/observed evidence to see if it supported or not.//

//3 types of strategies used in construct-related evidence of validity studies://

// 1. //**//Intervention Studies//**//- hypothesize that students will respond differently to the assessment instrument after having received some type of treatment.// // 2. //**//Differential-Population Studies//**//- hypothesize that individuals representing distinctly different populations will score differently on the assessment procedure under consideration.// // 3. //**//Related//**//-**Measures Studies**- hypothesize that a given relationship will be present between student scores on the assessment device being scrutinized and their scores on related assessment device.// //1. Why do you think it is important to have more validity than less?// //2. Why is it necessary to be family with all three forms of evidence and what are the benefits of using content-related evidence in the classroom?//

Chapter 11 __Most Important Points:__ When judging you own (or a colleague’s) assessments, consider the following five review criteria: (1) adherence to item-specific guidelines and general item-writing commandments, (2) contribution to score-based inference, (3) accuracy of content, (4) absence of content lacunae, and (5) fairness.


 * P value** (item difficulty index) and **item discrimination index** along with **distracton** **analysis** are empirically based item improvement procedures that work best with norm referenced measurement.

Item analysis for criterion referenced measurement can occur with the same group of students (Dppd) or with two different groups of students (Duigd)

__Reading Questions:__ Which approach to item improvement more realistically represents most classroom teachers? Why? What are the benefits of improving teacher developed assessments? What improvement process discussed in this chapter most suits you as a teacher? Why?