This is the code,dialogue datasets, and results for Prior Knowledge Matters: An Information-Theoretic Analysis of In-Context Learning in LLM for L2 Dialogues
Languages | Metrics | Quantifiers & Numerals | Tense & Agreement | Reference & Word | Numbers & Agreement | Speech & Acts | Subject Verb Agreement | Modal Verbs & Expressions | Noun Verb Collocation |
---|---|---|---|---|---|---|---|---|---|
yue | dbi | ↑0.145 | ↓0.275 | ↑0.109 | ↑0.099 | ↑0.145 | ↑0.073 | ↑0.052 | ↓0.066 |
dmono | 0.725 | 0.027 | 0.188 | 0.489 | 0.203 | 0.318 | 0.123 | 0.029 | |
th | dbi | ↑0.049 | ↑0.013 | ↑0.097 | ↑0.130 | ↑0.188 | ↑0.060 | ↑0.120 | ↑0.121 |
dmono | 9.227 | 0.570 | 0.222 | 0.265 | 0.400 | 0.913 | 0.180 | 0.190 | |
ja | dbi | ↑0.044 | ↑0.082 | ↑0.265 | ↑0.190 | ↑0.212 | ↑0.087 | ↑0.053 | ↑0.073 |
dmono | 1.954 | 0.514 | 0.273 | 0.330 | 0.520 | 0.874 | 0.452 | 0.232 | |
ko | dbi | ↑0.033 | ↑0.019 | ↑0.009 | ↑0.051 | ↑0.109 | ↑0.148 | ↓0.131 | ↑0.183 |
dmono | 0.654 | 0.296 | 0.108 | 0.259 | 0.247 | 0.605 | 0.069 | 0.295 | |
ms | dbi | ↑0.007 | ↑0.036 | ↑0.027 | ↑0.092 | ↑0.076 | ↑0.026 | ↑0.065 | ↓0.096 |
dmono | 1.039 | 0.321 | 0.109 | 0.341 | 0.279 | 0.477 | 0.097 | 0.080 | |
cmn | dbi | ↑0.027 | ↑0.038 | ↑0.065 | ↑0.037 | ↑0.161 | ↑0.023 | ↑0.082 | ↑0.059 |
dmono | 1.382 | 0.277 | 0.099 | 0.375 | 0.319 | 0.741 | 0.212 | 0.108 | |
ur | dbi | ↑0.079 | ↑0.073 | ↑0.062 | ↑0.050 | ↑0.043 | ↑0.046 | ↓0.126 | ↑0.044 |
dmono | 0.918 | 0.145 | 0.192 | 0.282 | 0.158 | 0.386 | 0.115 | 0.046 |
To fully test our code, make sure you have the following dependencies installed:
- PyTorch
- transformers
- openai
- Matplotlib
- pandas
- numpy
- scipy
- sklearn
We use benchmark ICNALE.
mkdir dataset # you can put datasets and intermedia results here.
cd lib
python gpt_incontext.py --openai_api_key <your openai api key> --output_dir <Path to the output directory>
For both benchmark data and generated data, you can use the same annotation script.
cd lib
python annotation.py --openai_api_key <your openai api key> --input_dir <Path to the input directory> --output_dir <Path to the output directory> --samples_dir <Path to the intermedia samples directory>
You can find some scripts and results in notebooks/ for distribution calculation. We also provide example results of generated dialogue data along with their annotations, available in the data/ and annotations/ directories for reference.