txttool: Utilities for text analysis in Stata
Abstract. This article describes txttool, a command that provides a set of
tools for managing free-form text. The command integrates several built-in Stata
functions with new text capabilities. These latter functions include a utility to
create a bag-of-words representation of text and an implementation of Porter’s
(1980, Program: Electronic library and information systems 14: 130–137) word-stemming
algorithm. Collectively, these utilities provide a text-processing suite
for text mining and other text-based applications in Stata.
View all articles by these authors:
Unislawa Williams, Sean P. Williams
View all articles with these keywords:
txttool, text mining, Porter stemmer, bag of words, cleaning, stop words, subwords
Download citation: BibTeX RIS
Download citation and abstract: BibTeX RIS
|