Skip to content

Latest commit

 

History

History
99 lines (70 loc) · 5.26 KB

13.md

File metadata and controls

99 lines (70 loc) · 5.26 KB

12. What is knowledge about language?

Explanations and visualisations

 

language world

 

Grammar vs. linguistics

  • Grammar is what we study in school: rules of a single language
  • Linguistics is about explaining why grammars are the way they are
    • What is common to all grammars?
    • What kinds of rules may or may not exist?
  • Big gap between theory and grammar
  • Most of what is called "linguistics" is about grammars (not theoretical)

 

Most important terminology in grammar and linguistics with the corresponding NLP tasks

  • Phonetics describes physical properties of sounds (place of articulation, pitch, duration, intonation, etc.), phonology describes rules over abstract sound representations
    • NLP task: ASR
  • Morphology describes the rules of word formation (derivation, inflection)
    • NLP tasks: stemming, lemmatisation, subword tokenization
  • Syntax describes the rules of sentence formation (dependencies between words), can be constituency or dependency trees
    • NLP task: parsing
  • Semantics deals with meanings, can be lexical (meaning of words) or propositional (meaning of sentences)
    • NLP tasks: word sense disambiguation, semantic role labelling, co-reference resolution
  • Pragmatics deals with meaning in context: how we understand non-explicit meanings
    • NLP task: intent classification
  • Discourse analysis describes the rules of combining sentences into higher structures
    • NLP task: dialogue and interactive systems (chat bots), identifying discourse relations
  • Sociolinguistics: linguistic differences between social groups (e.g. young vs. old speakers, men vs. women, degrees of education)
    • NLP task: computational sociolinguistics (classifying social media users)
  • Psycholinguistics, neurolinguistics: where is language located in the brain? what structures are harder for the brain to process?
    • NLP task: cognitive modelling (simulating human language processing)
  • Language acquisition: how language develops in children
    • NLP task: cognitive modelling (simulating human language processing)
  • Second language acquisition: how is a foreign language learnt
    • no particular NLP task

 

Most important theoretical problems

  • Arbitrariness of the sign: there is no natural connection between a word and its meaning,
    • but the Bouba/kiki effect shows some connection
    • what is the mapping between the meaning and the form?
  • Double articulation (duality of patterning): merging meaningless units into meaningful ones, merging meaningful units into higher-order meaningful units
    • sometimes only the latter regarded as language
    • relevant to the question of what are the smallest units of language
    • relevant to the question of the form of language competence: is it just one operation (merge)?
  • Displacement: we can talk about things we don't see
    • but it seems that we don't use this freedom all the time, a short video about that (by me): What you see is what you say, or is it?
    • relevant to the question what knowledge can be extracted from texts
  • Innateness: are we born with a specialised language faculty or it's all just general cognition?
    • a famous puzzle: Poverty of the stimulus
    • distantly relevant to generalisation and universality of NLP models
    • related to universality

 

Information theory: text measures and quantitative laws

  • Shannon entropy and complexity: text as a sequence of outcomes of a stochastic process
  • Zipf-Mandelbrot Law: the relation between word frequency and its frequency rank is universal
  • Zipf's Law of Abbreviation: the length of a word is inversely proportional to its frequency
  • Menzerath-Altmann’s Law: the bigger the whole, the smaller the parts
  • Uniform information density, constant information rate: tendency to keep the amount of information constant

 

Linguistics vs. NLP

  • Symbolic rule-based methods relied a lot on grammars
  • Statistical methods used annotated texts, Penn Treebank was an example for many others, now Universal Dependencies
  • Both rules and annotations are slightly formalised grammars, not scientific theory
  • LLMs and self-supervised learning often work without any explicit linguistic knowledge
  • Popular question: is there still any room for linguistics in NLP?