Meaning extraction from large text data: Thematic analysis via corpus linguistics

Info

Course Information

The problem: Your team collected thousands of words of data. You try a traditional thematic analysis of the text. Soon, colour coding, close reading, writing ad hoc reflections about the text become too onerous a task. You doubt the validity of your observations. You wish there was another way to streamline the process, that would extract key themes in data in a faster and empirically-valid way.

Solution: Join us for a session in which we showcase empirical methods for the extraction and analysis of meaning, concepts, and themes in texts. The session will provide training in corpus linguistics and mixed-method tools that enable the analysis of texts in an empirical, bottom-up fashion. Through a range of case-studies, you will be guided to extract meaning and other thematic patterns from texts to gain insight into thoughts and behaviours of authors of those texts. We will share best practises on the thematic analysis of various data types, such as diaries, interview transcripts, data scraped from the web, and outputs of both new and traditional media. We also demonstrate ways of building the results of such analyses into answering research questions, developing business strategy, or a public policy.

This session will be run by researchers from the University of Sussex’s Concept Analytics Lab (https://conceptanalytics.org.uk/) using texts from Mass Observation Archive https://massobs.org.uk/ to showcase approaches to thematic analysis. We will demonstrate solutions developed for a variety of problems and text types coming from our work with medical sciences, psychology, economics, and the energy industry. We will also show how linguistic patterns within or between texts (e.g. those that differ demographically or diachronically) can be explored, particularly through the use of new visualisation techniques. The workshop will conclude with a showcase of next-generation textual analysis tools that have been developed at Concept Analytics Lab.

This will be a practical session, enabling attendees to develop hands-on experience with using corpus analysis tools. The course will consist of six hours of training over the course of one day [9.30am - 5pm] and will be delivered online.

The course covers:

How to extract meaning from large textual data
How to build a corpus using textual data
How to engage with existing corpora, such as multi-billion word corpora scraped from the web
How to use corpus methods for bottom-up and top-down research
Techniques for the visualisation of unstructured language data
An introduction to discourse analysis and its application to corpora (corpus-assisted discourse analysis)

By the end of the course participants will:

Know how to engage a suite of mixed-method corpus linguistic tools to extract meaning from a corpus
Be able to use corpora to answer a variety of research questions
Be able to build their own corpora
Conduct comparative corpus analysis (e.g. between texts that differ demographically or diachronically)

Programme:

9:30: Welcome and introduction to corpus linguistics

10:00: Interrogating existing corpora - quantitative analysis

12:00: Lunch

13:00: Interrogating existing corpora - qualitative analysis

15:00: Break

15:15: Building your own corpus

16:15: The Concept Cruncher: The next generation of text analysis

16:45: Final remarks

Payment using the Online Store can only be completed via Visa and Mastercard Credit/Debit Card or PayPal. AMEX is not accepted.
If you have not previously created an account for the Online Store, you will need to create an account to make a booking.

Course Code

NCRMTAVCL

Course Leader

Dr Justyna Robinson and Dr Rhys Sandow

Start	End	Places Left	Course Fee

23/09/2026	23/09/2026	0		Book Course	[Read More]

Navigation

Meaning extraction from large text data: Thematic analysis via corpus linguistics

Course Information

Course Code

Course Leader