Natural language processing course 2023/24: ProcessingBIT

Qualitative discourse analysis is crucial for social scientists studying human interaction. This project leverages large language models (LLMs) to enhance qualitative discourse analysis, a task traditionally requiring high inter-rater reliability among human coders. This is an exceedingly labor-intensive task, requiring human coders to fully understand the discussion context, consider each participant’s perspective, and comprehend the sentence’s associations with the previous discussion, as well as shared general knowledge.

The goal is to develop a model capable of categorizing postings in online discussions, such as those in a corpus discussing "The Lady, or the Tiger?" story, but capable of generalization.

Our approach incorporates multiple features to identify topic shifts driven by individual users. We fine-tuned multiple LLMs models, like LLama and Mistral, using the LoRA technique to optimize training efficiency and defined a generic prompt, adaptable for both models, that includes chat history, context from relevant articles or stories, and a codebook of labels and examples.

Finally, an ensemble approach combined predictions from multiple models, with the final model using few-shot learning to select the best prediction. To ensure explainability, we generated textual explanations with LLaMA, making the model's decisions accessible to non-expert users while avoiding hallucinations.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
report		report
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural language processing course 2023/24: ProcessingBIT

About

Releases

Packages

Contributors 3

Languages

License

UL-FRI-NLP-2023-2024/ul-fri-nlp-course-project-processingbit

Folders and files

Latest commit

History

Repository files navigation

Natural language processing course 2023/24: ProcessingBIT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages