- employs systematic, empirical methods that draw on observation or experiment;
- involves rigorous data analyses that are adequate to test the stated hypotheses and justify the general conclusions drawn;
- relies on measurements or observational methods that provide reliable and valid data across evaluators and observers, across multiple measurements and observations, and across studies by the same or different investigators;
- is evaluated using experimental or quasi-experimental designs in which individuals, entities, programs or activities are assigned to different conditions and with appropriate controls to evaluated the effects of the condition of interest, with a preference for random-assignment experiments, or other designs to the extent that those designs contain within-condition or across-condition controls;
- ensures that experimental studies are presented in sufficient detail and clarity to allow for replication or, at a minimum, offer the opportunity to build systematically on their findings; and
- has been accepted by a peer-reviewed journal or approved by a panel of independent experts through a comparably rigorous, objective, and scientific review.
No Child Left Behind Act : Title IX – General Provisions: Part A – Definitions Sec. 9101
1. The term “scientifically based research” includes research that employs systematic, empirical methods that draw on observation or experiment.
The defining principle of scientific evidence is systematic empiricism. Empiricism is "watching the world," relying on careful observation of events to make conclusions (Stanovich & Stanovich, 2003, p. 33). Systematic empiricism requires conducting those observations in a careful manner in order to answer a specific question. In the realm of educational research, systematic empiricism requires an exact definition of the intervention and program being studied and a careful measurement of its outcomes.
Example: Following a state referendum restricting "bilingual education" in California, the popular media reported a dramatic improvement in the academic achievement of nonnative English speakers in a school district that switched to an English-only curriculum. This result was touted as evidence for the superiority of English-only instruction over bilingual instruction. However, this report did not constitute scientific evidence, because no one had systematically compared the two differing curriculum approaches. In fact, upon careful analysis, it became known that the school did not even have a bilingual curriculum to start with. Subsequent studies that precisely defined the key features of bilingual education and evaluated specific educational outcomes can be called systematic and empirical (see Krashen, 2002).
This criterion requires quantitative research, the hallmark of which is the use of numerical measurement of student outcomes. In order to know if one method truly caused an improvement, it is necessary to quantify the improvement in student performance. For example, studies about the effectiveness of certain mathematics instructional practices measure the improvement in mathematics ability, perhaps by quantifying changes over time in the percentage of math problems that students are able to answer.
2. The term “scientifically based research” includes research that involves rigorous data analyses that are adequate to test the stated hypotheses and justify the general conclusions drawn.
It is necessary to analyze data from a study using appropriate statistical procedures that can support the conclusions. Failure to apply the appropriate statistical procedures calls the results into question. Reputable research does not issue strong claims for the effectiveness of a program or practice based on modest differences or gains in student achievement. It is necessary to use statistics to determine whether the results were significant and important.
Example: Research on the influence of class size on literacy achievement compared the reading ability of students in classrooms with 12 to 15 students to classrooms with 20 to 25 students. The students from the smaller classes scored higher on reading achievement tests. The researchers calculated the statistical significance of this difference to determine whether it was likely that such a result could have been possible by chance.
A great deal of technical expertise is necessary to understand whether statistical procedures have been performed and reported adequately. Fortunately, the publication of research in reputable sources and the replication of the results by different researchers give the layperson some degree of confidence that the research claims are above board. On a superficial level, quality research reports basic statistical information such as the following:
- Sample size and representativeness. The sample refers to the selection of participants in the study. The sample must be representative of the population of people about whom the researchers wish to learn. If a researcher wishes to demonstrate an intervention for improving reading skills among youth in poverty, the sample must be drawn from youth in poverty.
- Statistical procedures to interpret data. Research that compares the effectiveness of an intervention almost always reports statistical tests such as t-tests or analyses of variance (ANOVAs). A study lacking such information is unlikely to provide convincing proof of effectiveness.
- Supplementary descriptive statistics. Quality research provides numbers that describe the results, such as means and standard deviations.
- Significance. Statistical significance is expressed as the probability that the observed differences could have happened by chance. When this is very low (i.e., .05 or less), the results are deemed statistically significant.
- Effect size. The effect size is a description of how large an effect the treatment had. It should be reported in real-world terms, such as percentage of children reading at or above grade level (Coalition for Evidence-Based Policy, 2003). The size of an effect indicates its importance. Some effects can be statistically significant, but of such a small magnitude that they are unimportant.
3. The term “scientifically based research” includes research that relies on measurements or observational methods that provide reliable and valid data across evaluators and observers, across multiple measurements and observations, and across studies by the same or different investigators.
Scientific research needs to use reliable methods of collecting data. A reliable testing instrument will give you the same result each time you use it on the same person or situation. Whenever a study evaluates students in a manner that relies on human judgment, as with assessments of writing ability, it is essential for the research to report interrater reliability, an index of how closely the different raters agree. Studies that rely on testing instruments typically establish test-retest reliability by administering it to the same group of people twice. The main point is that SBR documents the reliability of its procedures for data collection.
Data about a particular outcome (e.g., mathematics achievement) are valid if they truly reflect that outcome and not some unrelated factor.
Example: Research that examines the effect of art education on mathematics achievement should use a measure that reflects that outcome and is not influenced by unrelated outcomes. For example, if the test of this outcome contains questions that are difficult to understand, then the test may measure verbal ability as well as mathematics achievement. Its validity would then be in doubt.
4. The term “scientifically based research” includes research that is evaluated using experimental or quasi-experimental designs in which individuals, entities, programs or activities are assigned to different conditions and with appropriate controls to evaluated the effects of the condition of interest, with a preference for random-assignment experiments, or other designs to the extent that those designs contain within-condition or across-condition controls.
Experimental design. This criterion specifies that in order to be deemed scientific by the NCLB Act, research needs to conform to an experimental or quasi-experimental design. The reasoning is that it is difficult to understand the effectiveness of any educational approach without comparing it to a different approach. For this reason, this criterion states that evidence for the effectiveness of any practice needs to include a comparison group to show what would happen if that practice had not been used. An ideal comparison group is similar in every important way that could influence the outcome of interest. Because the comparison group allows researchers to control for the influence of external factors unrelated to the intervention, it is sometimes called a control group. By contrast, the group of people (or schools) that uses the practice under investigation is typically called the treatment or experimental group.
Example: Consider an educational program to decrease tobacco use among teenagers. Suppose that the promoters of the program tout its effectiveness by noting that a school that began using this program in 2002 reported a decrease in smoking from that point on. Is this convincing evidence? According to NCLB guidelines, no. Because there are so many other variables that affect the smoking rate, it is not possible to identify any one cause. After all, perhaps an increase in the cigarette tax—or a national advertising campaign to deter tobacco use—caused the decline. To make a claim about the effectiveness of the educational intervention, researchers would need to compare the students at the school that implemented the program to students at a similar school that did not implement it.
This criterion makes an additional statement about comparison groups and treatment groups: The best way to assign people to these groups in through a random process. Random assignment is the hallmark of the experimental design. When researchers randomly assign students (or classrooms or schools) to the experimental or control groups, any given participant in the study has an equal chance of ending up in the control group or the treatment group. The purpose of this procedure is to make sure that the two groups are as equivalent as possible in terms of the background characteristics that could influence the outcome variables. Any preexisting differences between the comparison and the treatment group can confound—that is, spoil—the results. Random assignment eliminates, for the most part, the concern that the control group comprises people (or schools) that are fundamentally different from the treatment group.
Example: Continuing the tobacco education example, if the students in the treatment school were from poorer families than students at the comparison school, they might be more impacted by an increase in the tobacco tax. This preexisting difference would account for a decrease in the use of tobacco in the school where the anti-tobacco program is implemented. However, suppose the researchers randomly assigned 20 schools, all of which were similar in their major demographic traits, to either the treatment or control condition. If the treatment group reported a substantial decrease in student tobacco use in comparison to the control group, one could be highly confident that the education program worked.
Practical and ethical concerns with experimental design. Random assignment is not always possible, for both practical and ethical reasons. As a practical matter, the administration of a school district might insist on deciding which of its elementary schools adopt a new curriculum. It would therefore not be possible to randomly assign schools (or classrooms or students) to particular treatment or control groups to study the effectiveness of this curriculum. Other practical dilemmas abound but are beyond the scope of this discussion. As an ethical matter, random assignment often is not an appropriate way to determine which students in a school benefit from an experimental approach.
Quasi-experimental design. Because of these concerns, most educational research does not utilize a pure experimental design, but rather a quasi-experimental design. One such approach is to select a comparison group that closely matches the control group in all relevant factors. For example, a study of an intensive professional development program might select five schools to participate in the program, and five other similar schools to serve as comparison schools. Although this sounds very much like an experiment, it lacks the key factor of random assignment; the schools that received the program may have volunteered or been selected to participate. For this approach to be considered SBR by NCLB measures, the five comparison schools would need to closely match the treatment group in all of the factors that could influence the intended outcome of the program (e.g., demographic composition, academic achievement and timing of evaluation).
It must be noted that this criterion has generated much controversy due to what some perceive as its exclusion of legitimate methods of scientific research such as qualitative designs and other nonexperimental approaches.
5. The term “scientifically based research” includes research that ensures that experimental studies are presented in sufficient detail and clarity to allow for replication or, at a minimum, offer the opportunity to build systematically on their findings.
Scientific research is open to the public. A person who claims to have discovered an effective teaching technique needs to submit evidence for its effectiveness to public scrutiny. If the results are sound, and the practice is truly effective, other people should be able to get the same results. For this reason, SBR must be reported in sufficient detail to allow for replication of the intervention and the scientific findings. One type of replication involves practitioners reproducing the educational intervention in their own schools. Another type of replication is more demanding; it involves another researcher attempting to replicate the original findings by following the same research procedures. This is an important process because it allows researchers to independently confirm the legitimacy of purported scientific evidence. For this reason, scientific research also needs to include all of the details about the educational intervention, participants, materials, outcome measures (e.g., tests and questionnaires), and the statistical procedures that were employed. Vague reporting of methods or results is a red flag, because it makes it seem as if the authors have something to hide. By the same token, successful replication of the research from a variety of sources ensures that the research is truly objective.
Example: In the early 1990s, psychology researchers published a study in which they claimed that listening to a Mozart sonata temporarily boosted the IQ of college students. The results of this small study were reported widely in the popular media, unleashing a torrent of marketing of classical music as a way to improve intelligence. Subsequent researchers have precisely replicated the methods of the original experimental, but have not replicated the findings of increased IQ. For this reason, the validity of the original findings is highly doubtful.
6. The term “scientifically based research” includes research that has been accepted by a peer-reviewed journal or approved by a panel of independent experts through a comparably rigorous, objective and scientific review.
The process of peer review is essential to SBR. Many journals of educational research, such as the American Educational Research Journal, accept their articles based on the review of other researchers who understand the research topic. The purpose of peer review is to submit research to public criticism—to shine the light of objectivity generated by independent minds. This process helps to screen out poor quality research, especially research that has serious problems in any of the areas discussed here. A variety of journals—with varying degrees of stringency of standards—exists, so peer review is a minimal standard. Yet because it is minimal, its absence is a sure sign that a particular method is lacking in quality (Stanovich & Stanovich, 2003). It is possible to determine whether a journal is peer reviewed by reading its editorial policy for acceptance of manuscripts.
In summary, SBR is submitted to public scrutiny through peer review, and is replicated by independent researchers. Educators should therefore be wary of programs or practices whose support comes only from unpublished "in-house" studies conducted by its commercial vendors.