Homogeneous_Grouping

<<Back to the EDC 529 home page

Heterogeneous Grouping or Homogeneous Grouping
 * Teaching** **Reading****:**

Team Research Design

EDC 529
Professor Adamy Kathryn Kocab, Stephanie Spaziano, Lori Facer


 * Research Project Question**

Is there a statistically significant difference in reading achievement, measured by New England Common Assessment Program, between sixth grade students who are grouped in homogenous groups (based on reading ability level) versus students who are placed in a heterogeneously mixed classroom?


 * Introduction and Literature Review**

Over the years there has been much debate over grouping students homogenously or heterogeneously, and how that grouping affects student learning. Many studies have been done over the years to find out if students learn more when they are grouped homogenously or if they are to grouped in a heterogeneous classroom for specific subject areas, including a 1933 study done by Parl West, PhD that sought to answer the question: “To what extent does the practice of ability grouping reduce the variability in achievement of classes in elementary schools which attempt to adapt to each group the method and content of instruction” (West, 1933). In the study, researchers found that the greater number of ability groups or “tracks” that a school had the less variability in achievement those groups had compared to unselected groups (West, 1933). Another study, entitled //The Effectiveness of Homogeneous and Heterogeneous Ability Grouping in Ninth Grade English with Slow, Average, and Superior Students//, conducted by Michigan State University in conjunction with the Cooperative Research Program of the Office of Education, U.S. Department of Health, Education, and Welfare, attempted to determine “whether or not the division of students into superior, average, and slow homogeneous classes would facilitate greater academic development and productivity of the individual student (Drews, 1963). The study advocates the use of homogeneous grouping and suggests that any data collected to the contrary was the result of achievement tests of the time not being an adequate measure of student learning. Because of such studies, schools throughout the United States have either tracked students according to their math, science, and reading ability.

Despite studies in favor of homogeneous ability grouping, there are those that believe that a heterogeneously mixed classroom can increase the learning ability and content knowledge of both the lower leveled and high-leveled students within the classroom (Saleh, Lazonder, and De Jong). One theorist, Lev Vygotsky, believed that children should actively construct their knowledge. Humans are social beings, and by providing students with social relations, their cognitive skills will increase within the classroom. Vygotsky believed that knowledge was collaborative. Therefore, heterogeneously mixed classrooms for English, math, science, etc. will increase student’s cognitive learning. According to Vgyotsky, “knowledge is distributed among people and environments, which include objects, artifacts, tools, books, and the communities in which people live” (Santrock, 2004). It was Vygotsky’s assertion that knowledge can be advanced through social cooperative groups. Heterogeneous grouping is a means to accomplish such groups. A 1973 study done by Chicago State University, again in conjunction with the U.S. Department of Health, Education, and Welfare, entitled, //Ability Grouping and Reading Achievement by Slow Learners// stated that there was “no significant difference in reading achievement” between homogeneously and heterogeneously grouped first graders (Bennett & Oglitree, 1973) contradicting the assertations of a similar study done by Michigan State University (Drews, 1963). Other studies have shown that students who are placed in a homogenously lower level math class can increase their math level ability when placed in a class with students of a higher math ability level (Reed, 2004).

As a result many schools across the country have opted to create heterogeneously mixed classes, which contain students with a wide range of different math, science, and reading level abilities/skills.

This study seeks to determine which method of grouping is more beneficial to student learning and performance in reading. Based on our review of the literature on this ongoing debate, our team thought it would be important to create a research project that tried to find if there was a statistically significant difference in reading achievement, measured by the NECAP, between sixth grade students who are grouped in homogenous reading classes based on reading ability level, versus students who are heterogeneously mixed. In teacher education courses, pre service teachers have learned the value of differentiated instruction and the pros and cons of tracking. We felt that it was important to research, study, and experiment with sixth grade classrooms to see if there is a statistically significant difference between reading achievement for students placed in a homogenous class versus a heterogeneous class.


 * Definitions**

//Homogenous grouped// reading classroom refers to a classroom that contains a group of students at the same or very near the same ability level (reading ability or mental ability or both). (Flor, 1980)

//Heterogeneously mixed// reading classroom refers to a group of students that had a wide range of reading abilities from above proficiency to below proficiency.

//Reading// //achievement// is defined as being able to read and comprehend text.


 * Null Hypothesis**

H0 - There will be no statistically significant difference in reading achievement, measured by the NECAP between sixth grade students who are grouped in homogenous reading classes based on ability level versus students who are heterogeneously mixed.


 * Directional Hypothesis**

H1 - There will be a statistically significant increase in reading achievement, measured by the NECAP, for sixth graders who are grouped heterogeneously for reading, versus sixth grade students who are grouped in homogenous classes.


 * Dependent Variable**- Reading achievement


 * Independent Variable**- Homogeneously mixed classroom


 * Sample**

Colby Middle School has 200 sixth grade students. From those 200 students, 100 students will be sampled. Four (4) classes will be observed, each class will contain twenty-five (25) students. There will be two (2) treatment groups X1, which consists of high ability level students, and X2, which consists of low ability level students. There will be two (2) control groups made up of heterogeneously grouped students. Colby Middle School currently implements a tracking program in which students are placed in reading classes by ability. Each teacher has two high ability groups and two low ability groups. Our team seeks to integrate one high level group and one low level group into two heterogeneous groups that will serve as our control group. Although this is a convenience sample, all groups will have the same instructor in order to minimize variables. The classes will be on a rotating schedule so that the homogenous class is not always the first class or the last class of the day. The students will be learning the same content for reading in each class, but the classes will either be homogenously mixed or heterogeneously mixed.

__OX1O__ OCO __OX2O__ OCO


 * Data Collection**

Our experiment is a quantitative study that will examine 100 different sixth grade students using one (1) teacher from a Rhode Island public school to facilitate instruction for both the treatment and experimental groups.

We will be measuring student reading achievement through the statewide assessment, the NECAP test. Scoring for the New England Common Assessment Program is based on four different levels of achievement which include: 1. Proficient with distinction 2. Proficient 3. Partially proficient 4. Substantially below proficiency

The NECAP exam has been adopted as the new statewide assessment for reading, writing, and mathematics for grades three through eighth. The test was developed for the New England Compact. Founded in 2002, the New England Compact (Educational Development Center, 2003) consists of Maine, New Hampshire, Rhode Island, and Vermont. The purpose of the Compact is to “provide a forum for states to address issues arising from the federal No Child Left Behind legislation.” (EDC, 2003). The New England Compact “focuses on the development and implementation of grade level expectations (GLEs), and state assessments based on those expectations.” (EDC, 2003) Currently, only Vermont, New Hampshire and Rhode Island administer the test. (Rhode Island Department of Elementary and Secondary Education, 2005)

The test is administered every October, and grades 5 through 8 have a performance of writing section to the NECAP test. For this study, we will be only looking at the results for the reading section of the test.

The NECAP test will be given during the first week of October, as a pretest. The scores for each student will be based on the four different levels of achievement detailed above. At the end of the year the reading section of the NECAP will be administered again as a post-test.

The class averages for the homogeneously high level reading students, homogeneously low level reading students, and the heterogeneously mixed reading level students will be examined from the pre-test and the post test. Averages for each class for the pre-tests will be compared to the class averages of the post tests. Heterogeneously mixed class averages will then be compared to the homogeneously grouped class averages.

The NECAP exam will be used as the standard to see if there was a significant difference in reading achievement for students grouped in homogenous classrooms versus students in heterogeneously mixed classrooms. The averages of each class score will be taken to see how much the score went up or down for each class from pre-test to post-test.

Students will be given the exam on the same day at the same time in the same room (for example, the auditorium, or cafeteria) in order to decrease certain threats to validity and results for student performance. Each class will stay within their group in the testing area, so as not to mix students up.

Although the NECAP has not yet been evaluated by Buros M//ental Measurements Yearbook//, the instrument is deemed valid and reliable by the three states participating in the NECOMPACT. "Prior to development of the NECAP tests, significant attention was paid to test length because of concerns raised about this issue [test length] and the Vermont State Board of Education. As [a] result, NECAP test specifications were based on the minimum number of items needed to meet No Child Left Behind Act requirements and to maintain test validity and reliability." (State of Vermont, 2005)


 * Data Analysis**

Class scores will be averaged at the end of the pre-test as well as the post test for the NECAP. The pre-test score averages for each class will be compared to the post-test score averages from each class to find out if there was any significant difference between the homogenously grouped students and the heterogeneously grouped students' reading level ability. The two different homogenous group scores will be averaged as separate homogeneous groups and compared to the average scores of all of the high and low level students within the heterogeneous class scores provided. For example, we would compare the average scores for the homogeneous low level class to the average scores of the heterogeneous class, as well as the average scores that the low level students in the heterogeneous class obtained. We will be looking to see the average increase of each group, based on the four levels of achievement from the NECAP test.

Three groups will be compared: the control or heterogeneous group; the homogenous group containing low ability level students; and the homogenous group containing high ability level students. Because we are making multiple comparisons, an //analysis of variance (ANOVA)// test will be used. In this case, an f-test will be used to see if the data found and compared is significant difference in reading achievement among the groups. A p value of .05 will be used to determine if we are to reject or fail to reject the null hypothesis.

Validity and Reliability
The research team is aware of several threats to the internal validity of our study. One such threat is //sampling error//. Due to the nature of the study and our limited control over the public school system, our team is forced to take a convenience sample. Another possible threat is //history//. We know that events outside of our study that occur prior to the post-test might cause any difference we may find. Another threat is //testing//. Using the same test at the beginning of the year, and then at the end may influence the later results. //Maturation// may also threaten the validity of our study. The students in our study are sixth graders. They may be experiencing growth or change that is unrelated to our treatment, and these might affect the measured effect of our study.

Our research team has reduced these risks as much as possible by having a control group//.// The threat of //testing// has been minimized by having students take the test at the beginning of the school year, and then at the end. It is unlikely that students will remember what is on the test over such a large span of time. Even if some do remember, we do have a control group, so it is likely that any knowledge of the test that they do have will be reflected in the scores of both the control and the treatment groups. Having a control group has minimized the risks of //history// and //maturation// as well. Any developmental changes or outside occurrences that affect students in one group are just as likely to happen to all the groups in our study, due to our random assignment to groups.

Our research team has considered several threats to external validity as well. //Obtrusiveness// and //reactivity// were examined, and we anticipate some difficulty with the parents of some students (or even students themselves) wanting to be placed into the treatment group. A //rivalry// between the experimental and control group may occur. One of the biggest problems we were concerned with is //demoralization// of the lower level homogeneous group. We would not want for members of the group, or other students in the school, to know that those students have been ‘labeled’ as lower ability. These threats may affect the behavior and performance of our groups.

The research team has done its best to reduce those threats. Our research team has tried to use unobtrusive procedures as much as possible. The test that students in our groups take (NECAP) is taken in October by all students regardless of being participants in our study. We are only asking our participants to take the test one extra time. We do not anticipate this being a major concern. To address the rivalry and demoralization risks, we have tried to make sure that researchers and teachers do not label the groups in front of participants or other students, and to refrain from discussing confidential information.


 * Strength of Design**

Our research team believes that we have a very strong study. Having a control group and using the same instructor for all groups (both treatment and control) has minimized our threats to internal validity. By using the NECAP assessment as our means for data collection, we have ensured that we are using a well-respected test that is considered by three states to accurately assess students’ reading achievement. The test ranks students using scores from 1 to 4. This universal scoring system makes it easy for researchers to compare the change in each student’s score. Researchers will compare scores taken by each student at the beginning of the year to that of the post-test at the end of the year. The two different homogenous group scores will be averaged as separate homogeneous groups and compared to the average scores of all of the high and low level students within the heterogeneous class scores provided. We will be looking to see the average increase of each group. We will ensure that any difference in data we find is statistically significant by using an f-test and using a p value of .05 which is the acceptable amount of risk used by the social sciences.

Another strength of our study is that we will be using the same instructor for all of the groups. Some teachers may have more experience, training, or a better relationship with his/her students. This could affect student performance. By using only one teacher, we remove having the quality of the teacher as a variable that may affect our results. Using a school where the classes are taught on a rotating schedule helps reduce any risk that a particular time of day may affect our results. Lastly we assigned the teacher to teach all groups the same content material, so that the curriculum itself will not affect our results.

The biggest weakness to our study is the fact that a convenience sample was used. Using a random assignment to equivalent groups would have been the ideal design. Unfortunately, due to the fact that our experiment takes place in a real-world setting, we were unable to implement that design. Another potential weakness is the instrument. The test has not been evaluated by an independent source, such as Buros //Mental Measurements Yearbook//. Although all three states that implement the test claim to have researched the validity and reliability of the test, none of the states cite where to find documentation of those evaluations. Other weaknesses of our study lie in its external validity. Our research team has fully considered the effects that //obtrusiveness// and //reactivity// may have on our participants. Although we have done our best to use unobtrusive procedures as much as possible, it was not possible to remove all threats to validity entirely.

If our team had chosen to do this study as a qualitative study instead of a quantitative study, our study would have looked completely different. We would have probably used a much smaller student population to collect our data and we would have spent more time in the classroom observing and encoding our observations. Instead of using a standardized test to observe student achievement in reading we would have most likely used interviews with both teachers and students to determine how they felt the structure of their classrooms, whether homogeneously or heterogeneously grouped, affected their learning and achievement in reading. Due to the emerging design of qualitative research, as researchers using a qualitative approach, we would be able to “probe in different directions” (Orcher, 2005) should our questions to participants point to something we had not considered during the planning process. We would observe student reactions to the different learning environments of either homogeneously or heterogeneously grouped classrooms while documenting what we observed through note-taking paying particular attention to demeanor, attentiveness and group interaction of the students in question. //Demeanor//, //attentiveness//, and //group interaction// would serve as the first categories for the encoding of our study, which we would then build upon as the study progressed. After we had encoded the information and reached the point of saturation, we would then analyze our data and draw conclusions based upon our interviews and observations.

Though we could have approached the study from a qualitative standpoint, the subject of student achievement lends itself to the quantitative method since there are many standardized tests created to measure student achievement. In the end, our team decided that it would best suit our needs to design our study as a quantitative study.


 * References**

Bennett, B., and Ogletree, E.J. (1973). //Ability Grouping and Reading Achievement by Slow Learners//. __Eric Document__ 121902.

Drew, E.M. (1963). __Student Abilities, Grouping Patterns, and Classroom Interaction__. East Lansing, Michigan: Michigan State University.

Ediger, M. (2001). //Homogeneous Grouping and Heterogeneous Grouping//. __Eric Document__ 455536.

Educational Development Center (2003). __New England__ __Compact__ [Online]. Available: http://www.necompact.org/.

Flor, R.A. (1980). //A Review of the Literature Concerning Grouping Plans for Elementary// //Reading// //Instruction//. __Eric Document__ 186877.

Moller, Karla.(2004) Creating Zones of Possibility for Struggling Readers: A study of fourth graders shifting roles in literature discussions. __Journal of Literacy Research__. v 36 n4 pg 419-460

Orcher, L.T. (2005). __Conducting Research: Social and Behavioral Methods__. Glendale, CA: Pyrczak Publishing.

Reed, Catherine. (2004). Mathematically Gifted in the Heterogeneously Grouped Mathematics Classroom. __The Journal of Secondary Gifted Education__. Vol. XV. No.3 pp. 89-95

Rhode Island Department of Elementary and Secondary Education (2005). __Rhode Island Department of Elementary and Secondary Education__ [Online]. Available: http://www.ridoe.net/assessment/NECAP/NECAP_Default.htm.

Saleh, Mohammed; Lazonder, Ard; and De Jong, Ton. (2000). Effects of within-class ability grouping on Social Interaction, Achievement, and Motivation. __Instructional Science__.

Santrock, John. (2004). __Educational Psychology__. Second Edition. McGraw-Hill

State of Vermont (2005). __Department of Education__ [Online]. Available: www.state.vt.us/educ/new/pdfdoc/pgm_assessment/necap/evaluation_report_123005.pdf.

West, P. (1933) __A Study of Ability Grouping in the Elementary School__. New York: Teachers College, Columbia University.


 * Distribution of Duties**

The team research project was a collaborative with all three authors actively engaging in discussion of ideas and methodology. Although sections were divided and assigned to certain team members, everyone on the team had input regarding content. The //Research Question//, //Hypotheses//, and method of //Sampling// were developed by all three team members with an //Introduction and Literature Review// written by Stephanie Spaziano and Kathryn Kocab. //Data Collection// was written by Kathryn Kocab with research on the development, publication and validity and reliability of the NECAP test written by Stephanie Spaziano. The //Data Analysis// section was written by Kathryn Kocab then revised by Stephanie Spaziano after the group became aware of an alternate means of data analysis (the f-test). Lori Facer was responsible for writing the //Validity and Reliability// section. The Strength of Design was written by both Lori Facer and Stephanie Spaziano.

<<Back to the EDC 529 home page