Record linkage using Stata: Preprocessing, linking, and reviewing utilities
Nada Wasi
Survey Research Center
Institute for Social Research
University of Michigan
Ann Arbor, MI
[email protected]
|
Aaron Flaaen
Division of Research and Statistics
Federal Reserve Board of Governors
Washington, DC
[email protected]
|
Abstract. In this article, we describe Stata utilities that facilitate probabilistic
record linkage—the technique typically used for merging two datasets
with no common record identifier. While the preprocessing tools are developed
specifically for linking two company databases, the other tools can be used
for many different types of linkage. Specifically, the stnd_compname
and stnd_address commands parse and standardize company names and
addresses to improve the match quality when linking. The reclink2
command is a generalized version of Blasnik's reclink (2010,
Statistical Software Components S456876, Department of Economics, Boston
College) that allows for many-to-one matching. Finally, clrevmatch is
an interactive tool that allows the user to review matched results in an
efficient and seamless manner. Rather than exporting results to another file
format (for example, Excel), inputting clerical reviews, and importing back
into Stata, one can use the clrevmatch tool to conduct all of these
steps within Stata. This helps improve the speed and flexibility of matching,
which often involves multiple runs.
View all articles by these authors:
Nada Wasi, Aaron Flaaen
View all articles with these keywords:
reclink2, clrevmatch, reclink, stnd_compname, stnd_address, record linkage, fuzzy matching, string standardization
Download citation: BibTeX RIS
Download citation and abstract: BibTeX RIS
|