Randomized Control Trial

Share Embed


Descrição do Produto






The Randomised Controlled Trial (RCT) is often cited as the gold standard in quantitative social science research (Robson, 2011; Shadish, Cook, and Campbell, 2001). Such a design involves the random assignment of participants to either a group that receives treatment, or a control group, in order to best compare the effect of a treatment and eliminate confounding variables (Robson, 2011). This paper will first discuss the importance of randomisation within social research, focusing on RCTs, followed by a discussion of some quasi-experiment designs possible to a researcher when randomisation is not possible. Repeated treatment, matching, weighting, regression discontinuity, and differences-in-differences methods are often considered a comparably inferior choice to RCTS and, indeed, have many drawbacks to take into consideration, they can effectively reflect a treatment's influence and statistical significance (Robson, 2011).
Randomisation assures that each participant has an equal probability of being placed in a treatment or control group, so historical, genetic, personality, and social factors are controlled for without the researcher having to consider equalizing these factors (Webster and Sell, 2007). Validity, or having results that are reflective of the real phenomena, is always a significant concern when designing research and randomisation is considered the best possible assurance of gaining valid results (Robson, 2011 p 88). This concept can also be extended to internal validity, or the establishment of a causal relationship between the treatment and outcome, and external validity, the application of findings outside of study conditions (Robson, 2011 p88). Of course it is possible that even a large random sample is misrepresentative of the actual population, but as the sample gets larger, it becomes more probable that the sample will reveal the average population's characteristics (Huck, 2000). Randomisation also allows for easier replication by other parties—an important factor in making judgements about the social world (Robson, 2011; Webster and Sell, 2007; Huck, 1999). An experimental design like an RCT is particularly valuable since it includes both randomization and a controlled environment (Webster and Sell, 2007). Without a random sample, results sacrifice the unbiased and efficient estimation of the treatment effects, and it becomes difficult to generalize findings to the wider population (Alferes, 2012; de Vaus, 2001). Despite its gold-standard status, there are practical issues with randomization such as running into a problematic gatekeeper who wants to include their best people, time constraints refusal of participants to take part, inconsistent interventions, or simply having a small population or sample available (de Vaus 2001; Bowen, Horvath, and Williams 2007). Pawson and Tilley (1997) consider RCTs as generally being unable to reflect complex social issues, but RCTs continue to have considerable value.
In quasi-experiments, randomisation is not possible since the treatment does not take place in a controlled environment, but rather a "natural social setting in which the research person can introduce something like experimental design" (Campbell and Stanley, 1963 p204). While these studies can be valuable, validity concerns especially must be considered during the design.
One available avenue of gaining validity in a non-randomised study is to measure a treatment(s) more than once with one or more combinations of factors (Huck, 1999). In a repeated treatment design, a treatment is introduced, removed, and re-introduced again, as often or desired as possible (Shadish, Cook, and Campbell, 2002). If the lack of any treatment by the researcher remains stable throughout, and there is a consistent shift between this control phase and each treatment phase, a researcher can define a causal relationship on a much stronger basis than possible from just a single case (Robson, 2011). Confounding variables can be effectively controlled for, given the repetition of baseline and treatment conditions (Robson, 2011). Francesconi, Jenkins, and Siedler (2010) analyse the role of family in schooling outcomes by comparing the outcomes of children from five different family structures. Cases of a child who's father died are here compared with their sibling who was born to a step-father and had always had an 'intact family.' This allows the baseline of 'intact family' to be compared to many cases of 'non-intacteness.' Compared to a single case study studying only children with different family intactness from different families, this repeated treatment design seems more efficient. However, much more data could be collected when there is only one required factor for each participant, so the external validity of findings in repeated measure designs is often limited compared to the same amount of data from single cases (Huck, 1999). In other cases, the first treatment(s) always have the possibility of affecting subsequent treatments (Huck, 1999). Likewise, relying so heavily on a small sample that was chosen on the basis of having only a few variables in common could mask a more significant factor that describes difference in outcome (Shadish, Cook, and Campbell, 2002). So, although seeking out repeated measurements seems like a practical method for approximating the benefits of randomisation, internal and external validity are threatened in many cases of repeated measures.
Matching, or placing a participant into a group because a particular characteristic is similar to that group, is another perhaps more obvious method of gaining a correlated sample that mimics a randomised sample (Huck, 1999). Twin studies such as Ashenfelter and Kreuger's (1994) are unique in the sense that genetic and environmental factors are controlled for so a specific factor like schooling level can be researched. By matching a set of twins with their level of schooling, the researcher can make strong claims about the effect of schooling. Increasingly, matching and reweighting methods are being used in tandem to re-create randomisation in quasi-experimental and experimental design (Achen, 1986 p 84). Reweighting is a method that allows certain characteristic or demographic categories to be weighted using statistics to balance an unrepresentative sample (Achen, 1986). A sample that contains a proportionate number of each category but is not representative of the population will not be an effective means of exploring the extent of differences (Achen, 1986). The combined result of both matching and reweighting is two or more groups of alike subjects who are weighted to approximate the population. Matching specifically faces issues of undermatching, or matching variables that are not as determinate on outcomes as other unmatched variables (Shadish, Cook, and Campbell, 2001). Equally, the most obvious categories to be matched may be at the extremes of the actual case, so the results do not reflect the reality of the population (See Campbell and Erlebacher, 1970). Weighting can easily inflate the estimated effect of a treatment because even with the power of statistical software, the researcher only has access to the sample (Achen, 1986). Again, matching and reweighting are able to approximate the effect of randomisation, but even with carefully alignment of the sample with population data, validity may be sacrificed.
Often used with test scores, regression discontinuity requires the researcher to discover or create a cut-off point to divide a sample into two groups based on a pre-determined value (Robson, 2011). For example, Lewis-Beck and Alford (1980) use the introduction of 1952 safety legislation in the US to see if there was a decrease in the number of serious coal mining injuries and deaths between 1941 and 1969. In this design, the 'treatment' group are the post-1952 coal miners, who were treated as such solely because of the 1952 cut-off point. Although there was no rise in the expected injuries and fatalities of miners after the legislation was introduced, it is possible that safer technology was introduced, or the number of miners themselves decreased. In regression discontinuity designs, these variables must be considered and controlled for, unlike in a pure RCT experiment where randomisation serves as the source of control. Even with careful consideration of these factors, comparing a treatment group to a predicted course of a trend makes generalizing on this basis easily misrepresentative (Green 2010). Formal statistical analysis also becomes more difficult when having to consider any confounding variables (Robson, 2011). Despite concerns about validity offered by an RCT, this analysis allows researchers to study a real event that is often difficult to re-create with other methods (Green, 2010).
A Differences-in-Differences (DD) method is used to compares the differences between a treatment group and a non-treatment group before and after an intervention (Bertrand, Duflo, and Mullainathan, 2002). Looking specifically at the differences between the two units is a rather simple method of comparison, and can avoid many of the issues present in other methods that are more concerned with confounding variables. Card and Kreuger (1993) analyse the effect of a minimum wage increase in New Jersey by comparing wages, employment, and pricing to those in Pennsylvania, where the minimum wage remained the same and acted as a control. To find out whether or not one unit affected the other, levels of each variable had to be measured before and after the introduction of the increased minimum wage. The crucial assumption here is that New Jersey would continue on the same trajectory of Pennsylvania without the intervention of the new minimum wage. Abadie and Gardeazabal (2003) wanted to compare Basque country to other Spanish regions, but unlike Card and Krueger, the surrounding regions that would have the least confounding variables were not following the same terrorism trend that they wanted to investigate. By combining differences in differences with a weighting technique, other regions approximated a control group, and an effective differences comparison could be made. DD methods are of course susceptible to errors in measurement and the presence of confounding variables, making both internal and external validity a concern. Although it approximates an RCT in that it has a control unit and an imposed treatment, differences-in-differences is by no means a perfect replication.
As illustrated in this paper, quasi-experimental methods require much consideration of potential problems, especially the requirement of valid research. Randomised control trials are cited as the gold standard because they statistically represent the closest sample researchers have to the real population. However, the social world is full of cases where a researcher can effectively study a treatment without having to place participants under experimental conditions. A repeated treatment method allows a researcher to observe control and treatment conditions many times, but initial treatments can have an effect on following ones, and can potentially mislead the researcher to make conclusions that aren't generalizable. Matching provides uniform groups to compare but, again, the variables used to define these groups may be misrepresentative. Reweighting, often used alongside matching in contemporary research, also allows the researcher to create a sample more representative of the population, but the application of treatment can still have misleading outcomes. Using a method of regression discontinuity is very useful in conditions where there is an obvious cut-off point to be observed, but is susceptible to confounding variables and misrepresenting observations. As discussed in the previous paragraph, a differences in differences method is a simple one, but can become complicated or impossible without a baseline comparator. The quasi-experimental methods analysed here have many valuable applications. However, when comparing potential issues of internal and external validity to those of randomised control trials, RCT's take precedence. The researcher must take these shortcomings into consideration in order to approximate the benefits of randomisation.



Francesconi, Jenkins, Siedler (2010) 'Childhood family structure and schooling outcomes: evidence for Germany', J Population Economics, 23, 1201–1231

Achen, C. H. (1986) The Statistical Analysis of Quasi-Experiments. Berkeley: University of California Press.
Alferes, V. R. (2012) Methods of Randomization in Experimental Design. Available at: http://srmo.sagepub.com.gate2.library.lse.ac.uk/view/methods-of-randomization-in-experimental-design/n1.xml (Accessed: 13 January 2016).
Ashenfelter, O. and Krueger, A. (1994) 'Estimates of the economic return to schooling from a new sample of twins', American Economic Review, 84(5), pp. 1157–1173.
Bertrand, M., Duflo, E. and Mullainathan, S. (2002) 'HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?', Working Paper.
Campbell, D. T. and Stanley, J. C. (1963) 'Experimental and Quasi-Experimental Designs for Research on Teaching', in Gage, N. L. (ed.)Handbook of Research on Teaching. Chicago: Rand-McNally and Company, pp. 171–247.
De Vaus, D. A. (2001) Research design in social research. Thousand Oaks, CA: SAGE Publications.
De Vaus, D. A. (2001) Research design in social research. Thousand Oaks, CA: SAGE Publications.
Duflo, E., Bertrand, M. and Mullainathan, S. (2002) 'How much should we trust differences-in-differences estimates?', National Bureau of Economic Research, .
Francesconi, M., Jenkins, S. P. and Siedler, T. (2009) 'Childhood family structure and schooling outcomes: Evidence for Germany', Journal of Population Economics, 23(3), pp. 1073–1103. doi: 10.1007/s00148-009-0242-y.
Green, J. (2010) 'Points of Intersection between Randomized Experiments and Quasi-Experiments', The Annals of the American Academy of Political and Social Science, 628(97).
Huck, S. W. (1999) Reading statistics and research: Part 3. 3rd edn. New York: Longman.
Pawson, R. and Tilley, N. (1997) Realistic evaluation. Thousand Oaks, CA: SAGE Publications.
Robson, C. (2011) Real world research: A resource for users of social research methods in applied settings. 3rd edn. United Kingdom: Wiley-Blackwell (an imprint of John Wiley & Sons Ltd).
Shadish, W. R., Cook, T. D. and Campbell, D. T. (2001) Experimental and quasi-experimental designs for generalized causal inference. 2nd edn. Boston: Houghton Mifflin.
Sprinthall, R. C. (2011) Basic statistical analysis. 8th edn. Boston: Pearson Education (US).
Webster, M. and Jane, S. (2007) 'Experimentation and Teory', in Outhwaite, W. and Turner, S. P. (eds.) The SAGE handbook of social science methodology. Los Angeles, CA: Sage Publications, .
Citations, Quotes & Annotations
Achen, C. H. (1986) The Statistical Analysis of Quasi-Experiments. Berkeley: University of California Press.
(Achen, 1986)
Alferes, V. R. (2012) Methods of Randomization in Experimental Design. Available at: http://srmo.sagepub.com.gate2.library.lse.ac.uk/view/methods-of-randomization-in-experimental-design/n1.xml (Accessed: 13 January 2016).
(Alferes, 2012)
Ashenfelter, O. and Krueger, A. (1994) 'Estimates of the economic return to schooling from a new sample of twins', American Economic Review, 84(5), pp. 1157–1173.
(Ashenfelter and Krueger, 1994)
Bertrand, M., Duflo, E. and Mullainathan, S. (2002) 'HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?', Working Paper.
(Bertrand, Duflo, and Mullainathan, 2002)
Campbell, D. T. and Stanley, J. C. (1963) 'Experimental and Quasi-Experimental Designs for Research on Teaching', in Gage, N. L. (ed.)Handbook of Research on Teaching. Chicago: Rand-McNally and Company, pp. 171–247.
(Campbell and Stanley, 1963)
De Vaus, D. A. (2001) Research design in social research. Thousand Oaks, CA: SAGE Publications.
(De Vaus, 2001)
De Vaus, D. A. (2001) Research design in social research. Thousand Oaks, CA: SAGE Publications.
(De Vaus, 2001)
Duflo, E., Bertrand, M. and Mullainathan, S. (2002) 'How much should we trust differences-in-differences estimates?', National Bureau of Economic Research, .
(Duflo, Bertrand, and Mullainathan, 2002)
Francesconi, M., Jenkins, S. P. and Siedler, T. (2009) 'Childhood family structure and schooling outcomes: Evidence for Germany', Journal of Population Economics, 23(3), pp. 1073–1103. doi: 10.1007/s00148-009-0242-y.
(Francesconi, Jenkins, and Siedler, 2009)
Green, J. (2010) 'Points of Intersection between Randomized Experiments and Quasi-Experiments', The Annals of the American Academy of Political and Social Science, 628(97).
(Green, 2010)
"lysis of relevant demographic and political differences showed that systematic differences between the "treatment" and comparison groups were negligible. By controlling for these potential biases, the authors claim an effect of presidential persuasion distinct from the possible artifact eff"(Green, 2010)
Huck, S. W. (1999) Reading statistics and research: Part 3. 3rd edn. New York: Longman.
(Huck, 1999, pp. 117–118)
Pawson, R. and Tilley, N. (1997) Realistic evaluation. Thousand Oaks, CA: SAGE Publications.
(Pawson and Tilley, 1997)
Robson, C. (2011) Real world research: A resource for users of social research methods in applied settings. 3rd edn. United Kingdom: Wiley-Blackwell (an imprint of John Wiley & Sons Ltd).
(Robson, 2011)
Shadish, W. R., Cook, T. D. and Campbell, D. T. (2001) Experimental and quasi-experimental designs for generalized causal inference. 2nd edn. Boston: Houghton Mifflin.
(Shadish, Cook, and Campbell, 2001)
Sprinthall, R. C. (2011) Basic statistical analysis. 8th edn. Boston: Pearson Education (US).
(Sprinthall, 2011)
Webster, M. and Jane, S. (2007) 'Experimentation and Teory', in Outhwaite, W. and Turner, S. P. (eds.) The SAGE handbook of social science methodology. Los Angeles, CA: Sage Publications, .
(Webster and Jane, 2007)
Select A




Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.