Skip to Main Content

Text Analysis

What is text analysis?

Text analysis, also known as text mining or text data mining (TDM), is a research method where large amounts of text are compiled, organized, and quantitatively analyzed in order to derive new information. Researchers can examine text from a variety of sources such as books, journal and magazine articles, or social media posts in order to answer research questions using various analytical tools. General research questions that text analysis can answer include: How are these texts connected? What types of language and emotion are contained within these texts? How are these texts similar or different? and How has the use of this word or phrase changed over time within these texts?

Methods of text analysis

Text analysis can help you identify topics and themes in your text-based dataset. The following list includes common methods used in text analysis:

  • Word frequency - Used to determine the frequency of a word, or set of words, in your text dataset. 
  • Concordance - Delivers the frequency of a word with the context in which it is used.   
  • Collocation - Finds and determines where words are used in conjunction with one another. 
  • Topic modeling - Discovers topics that emerge from a group of texts
  • Significant terms - Measures important terms within your text database 
  • Sentiment Analysis - Measures the sentiment within text, often on a scale of positive to negative, to determine the affect or emotion. 
  • Text summarization - Develops a simple summary from long-form text. This method often uses large language models to write summaries.

 

How to perform text analysis

Not all text analysis projects will be alike, but it is important that some basic guidelines are followed to ensure a replicable and defendable product. Below is a basic outline for how your text analysis project may operate. 

 

1. Develop a corpus - In text analysis, a corpus is your text dataset. It can include journal articles, books, social media posts, speeches, etc. You the researcher must determine the criteria you will use for adding items to your corpus. 

2. Prepare your corpus for analysis (pre-processing) - Here you preform all pre-analysis to ensure your results are accurate. This includes filtering your dataset to include only the relevant documents needed for analysis, tokenizing the texts, removing stop words, lemmatize, and stem words.  

3. Explore your data - Understand the basics of your corpus with methods like word frequencies, significant terms, etc. Here you can develop testable hypotheses to answer with more sophisticated analyses. 

4. Analyze your data - Use a large language model do test your hypotheses. Further refine your data with pre-processing methods, if needed.

5. Visualize and summarize your findings - Distill important information with visual representations. Develop simplified findings from your corpus.