好工具变量太难找：怎么办？

作者：Alexandra Cirone 2021年4月9日

翻译：向迪校对：红牛

好工具变量太难找：怎么办？

无论你属于哪个学派，要估计因果关系的影响都是困难的。而HPE研究的一个关键特征是，数据是观测得到的，因此我们作为研究者无法控制处理效应的分配。我们不在那时那地！它已经过去了！

所以今天，我要聊一聊工具变量。对那些在HPE中处理因果推断的研究者来说，工具变量（IV）分析是一种潜在的识别策略（对IV的简介见Dunning或Sovey and Green）。太长不看版是：如果我们感兴趣的自变量（我们假定的处理效应）是内生的，那我们试图找到第三个“似随机的”变量，这个变量有助于确定对处理效应条件的选择——这第三个变量就是一个工具变量。那些找到成功的工具变量的文章经常是很酷的论文。

这听起来很棒，除了IV很容易被诋毁。部分原因是我们不得不把识别声明放在“排斥性约束”上，没有统计检验（因此有许多胡说八道的空间），同时也因为幼稚的（糊涂的？）学者认为IV分析很容易完成（实际上并不是这样）。有时，要找到历史学的工具变量也很复杂。

但是别慌！俗话说“有备无患”……

HPE的排斥性约束

让我们从困难的那一部分开始。

在本研究设计中，有许多识别假设，但是最突出的是，IV假定工具变量只通过其与自变量的关系来影响因变量。这就是声名狼藉的“排斥性约束”，这也是应用工具变量论文的成败依赖之处。我不打算白费力气重复这些事情，只是想指出在历史研究中，在这一维度上有数个额外的挑战。

要记住的一件事是，排除性假设在长期中有时更难说明。如果潜在的工具变量与出现在多个时间点的事件有关，那么就会有多条导致结果的路径（并且可能会违反更多的排斥性约束）。或者，如果一个协变量的滞后值被用作工具变量，而那个协变量所影响的结果是持久的或者持续的，那么，又出现了一个识别的问题（更具体地说，那些研究历史上的迁移、种族语言分化或宗教的人一定要查看一下由Gallen和Raymond所写的工作论文Broken Instrument）。

一般来说，绘制出几十年来的混杂因素和因果路径也很困难，并且我们经常低估干预措施可能导致的下游影响。读者可能注意到“后工具变量偏误（post-instrumentbias）”，并且阅读了Glynn和Ruede的工作论文。在这里，他们指出，研究经常包含后工具协变量（post-instrumentcovariates），以便说明排斥性约束，但是他们认为这可能会抵消自然实验带来的所有好处。

任何排斥性约束必须以特定案例的知识来说明，而最佳的辩护通常涉及历史原始来源证据、创造性的描述性数据和引用其他领域成果三者之间的结合（你好，历史学家！）。幸运的是，这一类型的研究通常是HPE学者的强项。（数据缺失和缺少历史记录，可能也会使对排斥性约束的辩护更加困难。）

最后——尽管这可能不算是历史学工具变量的一个问题，因为找到历史学工具变量更加困难——值得记住的是，对于流行的工具变量来说，它们已经被用于预测许多不同的事物，这一真相恰恰表明可能存在对排斥性约束的违背。举例来说，降雨是一个同样十分流行的近似外生的工具变量。这在Jon Mellon标题极佳的论文Rain, Rain, Go Away...得到阐述。他查阅了185项社会科学研究，发现了137个与天气有关的不同变量（这还是个保守估计）。Mellon对读者提供的另一个帮助是，他提供了系统性地查阅现存文献中使用的流行工具变量的步骤，以找到潜在的排斥性约束违背。

有时候批判工具变量很容易，但是如果你去读这些批评（更多的论文见Bazzi and Clemens以及Betz, Cock and Hollenbach），你会更有机会了解你正在面对什么！

一个好的工具变量很难找到

如何在历史学研究中找到一个好的工具变量？

最佳建议是非常非常熟悉你所考虑的案例（并且读一读Thad Dunning的有关自然实验的教科书，这样你的潜意识就知道在找什么）。对排斥性约束的辩护无论如何都需要详尽的知识，但是有时候最好的工具变量是在进行背景研究的时候找到的。这就是我和我的合作者如何发现了一个基于彩票的程序，并将其用作评估委员会服务对职业生涯的因果影响的工具变量的故事——我当时在档案馆里读了一本19世纪的法国书，偶然发现了它。

另一个极好的建议直接来自Scott Cunningham的Mixtape。他写道：“使用满足排斥性约束的工具变量的一个必要非充分条件是，当你告诉他们工具变量与结果的关系时，人们感到困惑和惊讶。工具变量是不和谐的……因为这两件东西（Zi和Yi）看起来不是一路的，如果它们是，那很有可能意味着排除性假设被违背了。但如果它们不是一路的，虽然人们会感到困惑，但那至少才是一个好的工具变量的可能候选。

或者，你可以阅读大量“历史研究+IV”的文章来获得灵感。我列出了一些最受欢迎的：

Nunn（2008）：着眼于分析奴隶贸易对非洲经济发展的影响；使用到主要奴隶贸易港口的距离作为奴隶贸易强度的工具变量。

Dube and Harish（2020）：研究十五世纪到二十世纪的欧洲女王发动战争的倾向；用第一个出生的孩子的性别和前代君主是否有一位女性手足作为女王统治的工具变量。

Acharya,Blackwell and Sen（2018）：研究1860年的奴隶制如何与现在的政治态度和党派关系有关；使用棉花适应性作为奴隶制普遍性的工具变量。

Biavaschi, Giulietti and Siddique（2017）：研究移民如何“美国化”他们的名字以提升其职业前景；使用基于拼字游戏点数的语言复杂性指数作为移民名字美国化的工具变量。

Cirone and van Coppenolle（2018）：研究预算委员会服务如何影响长期政治生涯；使用基于彩票的过程作为委员会选拔的工具变量。

Kern and Hainmueller（2009）：研究接触西德电视台是否会使东德市民减少对共产主义理念的支持度；使用对西德电视台的地区访问作为工具变量。

Gibleb and Giuntella（2017）：研究天主教学校出勤率对更好的学生成果的影响；使用女性职位的突然下降（从第二次梵蒂冈会议制定的改革开始）作为天主教学校教育的工具变量。

我觉得我没有充足的图片，所以，假装这个家伙正在寻找一个工具变量。来源：https://www.loc.gov/item/2016854431/

所以你想使用一个工具变量

IV是历史研究的一个完美的近似识别策略，但是这里有一些注意事项要牢记。

要在你的论文中划分一个单独的部分，讨论识别假设。当你不能“证明”排斥性约束，你可以提供支持你对因果模型的解释的描述性数据或历史理由（引用其他领域！）。上面提到的Glynn和Rueda的论文收集了在政治科学领域的三大顶级期刊——APSR, AJPS和JOP——上的工具变量论文的数据，发现在155篇使用IV的论文中，只有116篇明确地讨论了排斥性约束。其他的论文在干什么？谁知道呢。

不要轻易加入IV。这不是一个能够轻松辩护的识别策略，也不应该将其作为一个稳健性检验来对待。单独一段话和文后的回归表不会让你通过顶刊的评审（并且更有可能偶然地提醒评审者你不懂这个方法）。

要在同一篇文章中考虑替代性的估计策略。讨论或包含“幼稚的”说明，例如OLS对理解IV的偏误和潜在的排斥性约束违背的重要性；那也可以是有关模型之间区别的丰富讨论。

不要因为你想在你的摘要中使用“因果”这个词而使用它。在那里有其他适用于观测数据的识别策略——双重差分，断点回归，倾向匹配，合成控制——而一个做得很差的IV不是因果。

要学着爱上有向无环图（Directed Acyclic Graphs,DAGs）——这些能帮你预见排斥性约束违背，并且更好地帮助你理解你自己的研究（查阅Mixtape来学习更多）。

最后，一则长久的对阅读本帖的研究生们的公益性广告：当一位教师使用“工具变量”这个词来转发一些世界上发生的看似外生的或者未预料到的事件时，有80%的可能性是他们在讽刺。别怪我没提醒过你。

A Good Instrument Is Hard To Find

Posted on April 9, 2021 by Alexandra Cirone

No matter what academic disciplineyou’re in, estimating causal effects is hard. And one key characteristic of HPEresearch is that the data is observational, and so we as researchers can’tcontrol the treatment assignment. We weren’t there! That time has passed!

So today, I’m going to talk aboutinstruments. For those who do causal inference in HPE, instrumental variable(IV) analysis is one potential identification strategy (see here or here for a crashcourse). TLDR; If our independent variable of interest (oursupposed treatment) is endogenous, then we try to find a third variable that is “as-if random” that helps determinesselection into the treatment condition — this third variable is aninstrument. When done successfully, these often make for really cool papers.

Sounds great, except IV is easily maligned.Partially because we have to pin our identification claims on an 'exclusionrestriction’ for which there is no statistical test (and so lots of room forfudging), and also because naive (confused?) scholars think IV analysis iseasily done (it’s not). It’s also sometimes complicated to find historicalinstruments.

But fear not! Forewarned isforearmed….

Exclusion Restriction for HPE

Let’s start with the hard stuff.

There are a number of identifyingassumptions in this research design, but the most salient is that IV assumesthat the instrument only affects the dependent variable via its relationship withthe independent variable. This is the infamous “exclusionrestriction,” and this is where IV papers sink or swim. I won’t reinvent the wheel,except to note in historical research there are a few additional challenges onthis dimension.

One thing toremember is that the exclusion restriction is sometimes harder to justify over a long period of time. If thepotential instrument is associated with exposure at multiple time points, thenthere are multiple paths to the outcome (and potentially more exclusionrestriction violations). Or if a lagged value of a covariate is used as aninstrument, and that covariate affects outcomes that are durable and/orpersistent, then that again presents a problem for identification (morespecifically, those studying historical immigration, ethnolinguisticfractionalization, or religion should definitely check out the “Broken Instruments” workingpaper by Gallen and Raymond).

Generally,it’s also hard to map out confounders and causal pathways over decades, and weoften underestimate the downstream effects interventions can have. Readersshould be aware of “post-instrument bias”, and read the working paper by Glynn and Rueda. Here, theynote that researches often include post-instrument covariates to help justifythe exclusion restriction, but argue this can undo all the benefits of anatural experiment.

Any exclusionrestriction must be justified with case-specific knowledge, and the bestdefenses usually involve a combination of historical primary source evidence,creative descriptive data, and citations from other fields (hello,historians!). Luckily, this type of research is typically where HPE scholarsshine. (Though missing data and lack of historical records might also makedefending the exclusion restriction more difficult.)

Finally — though this might be less of anissue for historical instruments, since they are harder to find — it’s worthremembering that for popular instruments, the very fact that they have beenused to predict many different things indicate there could be violations of theexclusion restriction. Rainfall, forexample, is a plausibly exogenous instrument that is also very popular. This isdemonstrated in Jon Mellon’s brilliantly titled paper “Rain, Rain, Go Away…”. He reviews185 social science studies, finding 137 distinct variables linked to weather(and this is a conservative estimate). Mellon also helps the reader byproviding steps to take to systematically review the existing literature for the use ofpopular instruments, to find potentialexclusion restriction violations.

It’s sometimes easy to critiqueinstruments, but if you read the critiques (see more papers here and here), you’llhave a better chance of knowing what you’re up against!

A Good Instrument is Hard to Find

How does one find a good instrument forhistorical research?

The bestadvice is to be very, very familiar with your case in question (and read Thad Dunning’s textbook on naturalexperiments, so your subconscious knows what tolook for). The defense of the exclusion restriction will requiredetailed knowledge anyway, but sometimes the best instrumentsare found in the process of doing background research. That’s howmy coauthor and I found a lottery-based procedure which weused as an instrument to estimate the causal effect of committee service oncareers — because I was reading a 19th century French book in the archives, andstumbled across it.

Another greatpiece of advice comes directly from Scott Cunningham’s Mixtape. He writes“A necessary but not a sufficient condition for having an instrument that cansatisfy the exclusion restriction is if people are confused when you tell themabout the instrument’s relationship to the outcome. Instruments are jarring…because these two things (Zi and Yi) don’t seem to go together. If they did, itwould likely mean that the exclusion restriction was violated. But if theydon’t, then the person is confused, and that is at minimum a possible candidatefor a good instrument.”

Or, you can read a bunch ofhistorical + IV papers and get inspiration。I’ve listedsome favorites here:

Nunn (2008): Looks atanalysis of the impact of the slave trade on Africa’s economic development;uses distance from major slave ports as an instrument for the intensityof the slave trade.

Dube and Harish (2020): Looks howEuropean queens were more likely to go to war in 15th-20th century Europe; usesgender of the first born and presence of a female sibling among previousmonarchs as instruments for queenly rule.

Acharya, Blackwell, Sen (2018): Looks at howslavery in 1860 correlates with present-day political attitudes and partyaffiliation; uses cotton suitability as an instrument for slavery prevalence.

Biavaschi, Giulietti, and Siddique (2017): Looks at howmigrants “Americanized” their names to improve their career prospects; usesindex of linguistic complexity based on Scrabble points as an instrumentalvariable that predicts name Americanization.

Cirone and van Coppenolle (2018): Looks athow budget committee service affects long term political careers; useslottery-based procedure as an instrument for committee selection.

Kern and Hainmueller (2009): Looks atwhether exposure to West German TV made East German citizens less supportive ofthe communist regime; uses district-level access to West German television asinstrument.

Gihleb and Giuntella (2017):Looks at theeffect of Catholic school attendance on better student outcomes; uses abruptdecline in female vocations (from reforms made at the Second Vatican Council)as an instrument for Catholic schooling.

I felt like I didn’t have enough pictures,so let’s pretend this guy is looking for an instrument. Source: https://www.loc.gov/item/2016854431/

So You Want to Use An Instrument

IV is a perfectly plausibleidentification strategy for historical work, but here are some do’s and don’tsto remember:

DO put aseparate section in your paper, discussing identification assumptions. While you can’t 'prove’the exclusion restriction, you can providedescriptive data or historical justifications (cite otherfields!) that support your interpretation of the causal model. The Glynn andRueda paper mentioned above collected data on IV papers in the top threepolitical science journals — APSR, AJPS, and JOP — and found that out of 155papers using IV, only 116 explicitly discussed the exclusion restriction. Whatwere the others doing?? Who knows.

DON’T add an IVlightly. This is not an easy identification strategy to defend, and should not be treatedlike a robustness check. A lonely paragraph and a regressiontable in the back won’t get you past referees at top journals (and is morelikely to accidentally prime the reviewers that you don’t know the method).

DO consideralternative estimation strategies, in the same paper. Discussing orincluding 'naive’ specifications like OLS can be important tounderstanding the bias in IV and potential exclusion restriction violations;there also can be fruitful discussion of the difference between the models.

DON’T use itbecause you want to be able to use the word 'causal’ in your abstract. Thereare other identification strategies out there for observational data —difference-in-differences, regression discontinuity designs, matching,synthetic control — and a poorly done IV is not causal.

DO learn to love DirectedAcyclic Graphs (DAGs) — these can help you anticipateexclusion restriction violations, and better help you understand your own research(check out the Mixtape to learn more).

Finally, an evergreen PSA forgraduate students reading this post: when a faculty member retweets someplausibly exogenous and/or unexpected event in the world with the words“instrumental variable,” there’s an 80% chance they are being sarcastic. You’vebeen warned.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。