Speaking Stata: Distinct observations
Nicholas J. Cox
Department of Geography
Durham University
Durham City, UK
[email protected]
|
Gary M. Longton
Fred Hutchinson Cancer Research Center
Seattle, WA
[email protected]
|
Abstract. Distinct observations are those different with respect to one or more
variables, considered either individually or jointly. Distinctness is thus a
key aspect of the similarity or difference of observations. It is sometimes
confounded with uniqueness. Counting the number of distinct observations may
be required at any point from initial data cleaning or checking to
subsequent statistical analysis. We review how far existing commands in
official Stata offer solutions to this issue, and we show how to answer
questions about distinct observations from first principles by using the
by prefix and the egen command. The new distinct
command is offered as a convenience tool.
View all articles by these authors:
Nicholas J. Cox, Gary M. Longton
View all articles with these keywords:
distinct, by, egen, distinctness, uniqueness, data management
Download citation: BibTeX RIS
Download citation and abstract: BibTeX RIS
|