Blogging for Development: Randomized Field Experiments: A Gold Standard in Research?

June 02, 2019
By Mangarai Zhou

Randomized Field Experiments (RFEs) are used in research, evaluations and assessments to explain a phenomenon. Other research designs include quasi-experiment designs, non-experimental designs, and natural experiments. RFEs have come to be highly regarded to the point of being popularly known as the ‘gold standard’ in research. They are known to have a strong indisputable causal inference which successfully explains attribution in relation to an intervention and its outcomes. RFEs have been in use since the 1920s when they were first introduced by Neyman and in 1925 by Fisher in his controlled experiments to improve farming. The history of RFEs began in the psychology field and gained traction in medical clinical trials before taking the social science sector by storm. The 'gold standard' claims of RFEs come from the belief that they provide more credible evidence than other research designs and has been considerable debate among scholars and researchers over whether RFEs should be the ultimate standard design for research and evaluation over the years.

The goal of RFEs is to determine if a specific therapy actually makes a positive difference to the people receiving it (Kennedy Institute). RFEs work through the process of comparison between groups whereby one group is the treatment group (receiving complete treatment) and the control group (the untreated group). The control group is the benchmark upon which the effectiveness of the treatment is measured or proved. For better understanding by readers, this article will use a pseudo education development programme where one English literature class was taught using both film and books whilst the other was only taught by reading the literature books. for the purposes of the programme, one single class was divided into two classes using random assignment. The classes had one teacher for the subject and the intervention ran for one month before a test was administered to assess impact. The aim of the intervention was to decide whether film can be used as a method of teaching to improve the pass rate of literature students. After the experiment, the evaluation findings concluded that the intervention of using film to teach literature was effective in increasing the pass rate on the subject, because the class who had film as an extra tool performed far much better than those who only had books. Before intervention, the big class had taken a pretest and obtained a 67% pass rate. A posttest exam revealed the following: the intervention class obtained 92% pass rate whilst the control class obtained a pass rate of 73%.

The 'gold standard' status of RFEs must be understood largely in terms of linking outcome to intervention, which in this regard they have the capacity to provide rigorous and irrefutable conclusions as they are built on credibility which is based on evidence. They clearly prove that a result may not have happened without the intervention. The first part of this article will attempt to justify why RFEs are known as the 'gold standard' of all research designs.

At the centre of RFEs is randomization. “The random assignment of subjects to one or another of two groups is the basis for measuring the marginal difference between these groups in the relevant outcome” (Kendall, 2003). The randomness by which participants are chosen into either control or treatment group ensures there is no un-observable characteristics of the units being reflected in the assignment. Randomization also ensures that baseline characteristics, not known to be related to the outcome of interest, are equally distributed among the groups. In the pseudo example used in this article, the students were randomly selected to the two literature classes and each student had a chance of being thrown into either class during selection. This assurance was important to ensure that any differences observed between the two groups can confidently be attributed to the treatment.

Randomized experimental design yields the most accurate analysis of the effect of an intervention (Gerber et. al, 2003). By randomly assigning subjects to be in the group which receives the treatment (the class using film) or to be in the control group (the class without film), researchers can measure the effect of the film aided teaching method regardless of other factors that may make some people or groups more likely to perform better in their final examinations.

To bring things into perspective, had the students not been randomly selected but allowed to choose which class they wanted to be in, then the result may not have been enough to attribute it to the intervention, because there are many reasons why a person may choose one thing over another. As such, the students choosing to go into the film class could have been because they are very artistic, love and understand film more than those who would have chosen to go into the control class. This would not have reflected truthfully on the effects of the intervention. Because of the randomization, the results of the film aided education tool were difficult to dispute due to the way that the planning process and execution was planned, which left no room for other variables to do with the participants to distort the findings and this is one of the major strengths of RFEs when used effectively.

Although other research designs are also able to highlight the association between outcome and intervention, they are not designed in a way where there is no doubt to the causal relationship.There is always a question when an intervention that did not have any credibility checks in place claims that an outcome was a direct result of an intervention. This doubt is eliminated in RFEs, through randomization which eliminates all systematic differences between participants, strengthening its position as a gold standard in research design.

When performing experimental research, there are specific control set ups as well as strict conditions to adhere to (Gerber et al, 2003). With these in place, better results can be achieved. With this kind of research, the experiments can be repeated and the results checked again. An interrupted time series design with multiple pre and post tests can be used here to give the researcher a boost of confidence.

Besides randomization, RFEs also have high internal validity. Their internal validity is enhanced by the fact that all participants have an equal chance of making it into either of the two groups thus reducing bias at the most. In the example, each student stood a chance to make it into the film class or the control class. This reduced or eliminated any possible researcher selection bias which may result from favoritism or bribing.

However, it is important to not take RFEs 'gold standard' claim at face value. Like any other research design, RFEs are riddled with their own limitations. These include questions of appropriateness, external validity, feasibility, ethics, costs among others. Other designs such as triangulation, regression discontinuity designs and interrupted time series design can yield information on causation for far less cost and time than RFEs (Scriven, 2005).

McMillan (2007), emphasizes that doing randomized experiments may be misleading unless attention is paid to three important conditions namely: being sure that the design actually accomplishes the reason for using random assignment – (to achieve statistical equivalence of the experimental and control group prior to, during, and after the intervention is implemented); need to evaluate internal validity on the basis of many factors that are common in field studies; and determining causality, which is why experiments are conducted, is heavily dependent on contextual factors peculiar to each study.

There are ethical issues involved with RFEs which weaken their 'gold standard' claims. Ethical issues come into play when one considers that RFEs systematically deny a deserving group treatment for the sake of experiment. For one, not every variable that can be manipulated should be, (Behaghel and Zamora, 2012). For example, it would not be appropriate to use RFEs in a research concerned with new treatments for sexually transmitted infections like HIV because it is not ethical to deny a group treatment which they obviously need and deserve just for the sake of experiment and control. Ethically, the film aided education intervention was wrong for denying the control class the same treatment because at the end of the day this could have affected their overall examination grades in favor of those who were in the treatment class.

On the other hand, the above point should not be used to crucify RFEs on basis of ethics. The random assignment of participants in RFEs is not always for control of the participants. There are other reasons which justify the random selection and use of a control group. In most cases, financial and administrative resource constraints often do not allow for everyone who could benefit from the program to enroll simultaneously, so randomizing is often the fairest way to allocate treatment order. All eligible beneficiaries have an equal chance of being selected first. This is especially important when participating in programs is highly desirable (Gertler et al., 2011).

Coming back to their weaknesses, RFEs are subject to experiment effects whereby the researcher knows which subject got what treatment and this may influence how the subjects are treated as well as how effects of the treatment are recorded (Bickman and Reich, 2014). There is a way to counter this threat if researchers have done their homework before commencing an RFE. One of the best ways to counter this would be to use blinding or double-blinding whereby both the researcher and the participants do not know who is receiving what treatment. Double blinding ensures that the preconceived views of subjects and researchers cannot systematically bias the assessment of outcomes (Yale University, 2019). Intention to treat analysis maintains the advantages of random allocation, which may be lost if subjects are excluded from analysis through, for example, withdrawal or failure to comply. Failure to conceal random allocation and the absence of double blinding may yield exaggerated estimates of treatment effects.

In education studies however, it is often difficult to blind learners to their assigned group. Without blinding, students can react to the knowledge that they are being studied or assigned to a particular group. With or without blinding it may be difficult to prove that the students in the film aided class did not interact with those from the control class and discussed their advantage over the group during their extracurricular activities among other interaction events. There is therefore possibility for contamination which reduces the credibility and validity of RFE results. Contamination of RFE groups normally happens when the organization has not met its promises made to each group, particularly the control group. For example, students from the control class can sneak into the treatment class and watch the literature film so that they also benefit like others if there are no strict enforcement rules in place.

In other instances, treatment and control groups may be assigned, but compliance to the group assignments may be imperfect. Some participants who were supposed to receive treatment may not get it, or participants in the control group may receive treatment. This can happen “if eligibility cut-offs are not strictly enforced, if selective migration takes place based on treatment status, if there are administrative or implementation errors, if some participants in the treatment group choose not to participate, or for many other reasons” (Gandhi, 2016).

The issue of blinding of researchers and participants brings to the fore the question of feasibility of RFEs in all research studies. To begin, RFEs were originally derived from clinical research and is now used in education research among other researches. In relation to the film education example used in this essay, the highly complex system of education may be a poor fit for the RFE model, which requires clear inclusion/exclusion criteria and interventions administered identically via multiple physicians (ie, teachers) (Sullivan, 2011). In education studies, variables can rarely be controlled tightly and blinding of subjects and study personnel may be unethical or even impossible.

RFEs also have shortcomings related to resentful demoralization. Sometimes when the control group finds out that it is not getting the full treatment, it may be demoralized and under-perform. When assessment of results comes, what researchers find will not be a true reflection of the intervention effects, the resultant gap between the control group and the treatment group will be a result of decreased performance by the control group and not necessarily the increased performance by the treatment group. For example, it is possible that the reason for the weaker performance in final examinations by the control class was due to resentful demoralization at having been denied the same chance to watch film as the other class, and not because film enhanced the treatment class students’ understanding of English literature.

The opposite of the above may happen with RFEs where the control group knows it is being denied full treatment. The control class can develop what is known as a compensatory attitude or effect. Instead of losing morale, they will be motivated to over perform just to prove a point to the treatment group that they can do better than them even with the limited treatment resources. At the end of the day, what the trial concludes would not be a true reflection of the effectiveness or non-effectiveness of the treatment but rather an outcome that has been influenced by other factors. The film evaluation left room for control group students to over perform as they were very much aware of what was happening in the other class and thus possibly affecting the outcome of the program.

Another major criticism of the RFEs and their 'gold standard' claim is their reduced external validity, against their trusted internal validity. The volunteerism nature and random assignment somewhat limits the generalizability of its findings. “Unless an experiment can be generalized at least a bit, time and resources have been wasted” (Berk, 2005). Although RFEs are regarded as the gold standard with regard to level of evidence, the extent to which their results can be extrapolated to the wider patient population is often questioned, because standardized and controlled study conditions do not adequately reflect clinical reality (Kabisch et al, 2011).

Norman (2003), argues that RFEs are most useful in examining relatively standardized interventions, such as web-based learning and, possibly, clinical simulation. He argues that one cannot ‘‘apply curriculum daily’’ in the same way that one can prescribe a medication, stressing that RFEs may not be suitable for education research. He recommends that randomization be considered when (1) prior observational studies support the hypothesis; (2) the mechanism of learning is understood; (3) the outcome of the intervention is easily measured and accepted as related to the intervention; (4) the subgroups likely to benefit from the intervention are also easily identified; (5) the effect size of the intervention is small; and (6) the results from the trial may have a large impact, to justify the costs of an RFE. In context, these criteria are not always satisfied in most research studies or interventions and this leaves room to question the 'gold standard' claims of RFEs.

The use of non randomized methods is common in education research and considered by experts as not inferior to RFEs. Sullivan (2011), recommends using longitudinal studies which may use ongoing surveillance or repeated cross-sectional methods to measure change over time. In his view, to strengthen these other research designs, one must include a comparison group. The comparison group may be a concurrent cohort or matched ‘‘case-control’’ format. This will improve the credibility of findings by other research designs which has been viewed as one of the major weaknesses of non randomized research designs.

Moreover, credibility of findings is highly subjective even with RFEs. The quality of evidence cannot be determined without knowing which questions were asked and why, what evidence was gathered to answer these questions, who asked the questions and gathered the evidence, how the evidence was gathered and analyzed, and under which conditions the evaluation was undertaken (Bickman &Reign, 2014). This means that there is more that is needed to measure credibility than just a study’s design, as is the case with RFEs defenders. In this regard, it can be argued that other non random designs can also be highly credible if they put all the necessary issues into consideration.

As a solution to finding the most appropriate and highly valid research design, it may be important to combine the strengths of each design, i.e, taking from both random and non random designs and triangulating these in a single research or assessment exercise. Clearly, on their own, RFEs are not always the gold standard in research. However, their random assignment of participants cannot be ignored because it greatly strengthens one's attribution claims. Therefore, researchers should use the RFEs but enhance the validity of their findings by taking into consideration all the possible threats to internal validity which have the potential to weaken an attribution claim.

Blogging for Development

Sunday, June 2, 2019

Randomized Field Experiments: A Gold Standard in Research?

No comments:

Post a Comment

Promoting "Learning" for better results- The value of Learning in NGO programming work

Report Abuse