Social Experiments “for” Skeptics: #1.3
CRITIQUE PART II: My first “reply” included some initial thoughts about the hypothetical study I’d described. Below are some other ideas off the top of my head that might be worth considering before, say, committing resources to actually doing the study (in case anyone’s seriously considering it). The first several of these are potential problems with the initial study, and the later ones are more about extensions to address relevant issues. (In practice it’d be good to be more systematic and thorough by considering threats to the four major types of validity—the above thoughts mainly concern statistical-conclusion and internal validity—but I’ll defer that.)
1. Did the survey actually include people likely to see the signs? For instance, what if people who work in a given city tended to see the signs but those who live in that city were surveyed about DG? Or what if some people saw both signs (e.g., commuters)? It might be useful to include a “manipulation check” in the survey by assessing which sign (if any) each respondent saw (or heard about?)—being careful not to let this influence their response to the DG question (e.g., by asking the DG question first).
2. Was DG measured appropriately? Whatever abstract, latent construct DG is meant to represent, one self-report item probably doesn’t measure it reliably or validly. Would the results have been markedly different had the question been “Do you believe in a god?”, “Do you believe god exists?”, “Is there a god?”, “Are you sure there’s a god?”, “Do you doubt god?”, or any of several other variants? What if some type of rating scale had been used instead, or perhaps indirect or behavioral measures (e.g., physiological responses to god-related cues)? If there’s not a psychometrically sound measure of DG that’s practically feasible for the survey (e.g., not too time-consuming or complicated for respondents), a pilot study could be used to develop one.
3. How do people in the focal region actually interpret the message of each type of sign? For instance, what does it make them think about, or how does it make them feel? This could be addressed using focus groups or some sort of qualitative research—not something I’m used to, so I’d be interested in thoughts about this.
4. Do the ad campaigns have negative side effects, like increasing the prevalence of “immoral” behavior, mental-health problems, natural disasters, pestilence, famine, etc.? If so, this would be important to know so these side effects could be avoided. If not, this would be valuable empirical evidence for responding to critics who say the ads are bad for the communities.
5. Do the ad campaigns differ in their long-term effects on DG, such as weeks, months, or years after the campaign ends? In some respects this is more important than their immediate impact, and for the planning of future campaigns it could be useful to know how long it takes for a campaign’s effect to “wear off.”
6. Which type of campaign is more cost effective? For example, with the above results campaign A might be judged much more cost effective if its signs were substantially cheaper to produce and display, yielding more bang for the buck.
7. Does a given ad campaign work better for some types of communities than others, or for some types of persons than others? To address these questions we’d need to measure some things about the communities and about the individual respondents to use them as explanatory variables in more sophisticated analyses (e.g., multilevel/mixed models).
8. Are there interesting effects of variations in the ads, such as particular wording, visual design features (e.g., colors, layout, graphics, fonts), or placement of the signs (e.g., side of bus, back of bus, inside bus)? What about variations in the timing of the ad campaigns, such as the overall length or maybe the frequency of rotation among alternative signs (e.g., a different sign each week)?
9. How could similar campaigns be implemented in communities without public transportation? Posted flyers? Billboards? Sandwich boards? Rented ad vehicles? Sky-writing?
10. Would other study designs be better? For example, we might include the same people in both the before and after surveys, and/or we might use more than one campaign in each of the cities with breaks in between and counterbalancing of the order (e.g., ABC in one city, BCA in another, CAB in a third, and reflected versions of these sequences in three more). Other design strategies could be used to deal with issues that are somewhat technical, such as Latin squares or fractional factorials to study things that can be manipulated experimentally (e.g., features of the signs or their placement), or blocking cities on important features to reduce error variance due to factors that can’t be manipulated (e.g., demographics, pre-campaign measures of religious behavior).
That’s about all I have time for right now. Again, I’m curious not only about any thoughts on this particular hypothetical study—admittedly a kind of silly toy example, but perhaps instructive—but also on the bigger issue of whether CFI or similar organizations might ever undertake large-scale social experiments like this to gather scientific evidence about how they might more effectively accomplish their stated missions.
Cheers.