Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emnlp #7

Open
wants to merge 93 commits into
base: main
Choose a base branch
from
Open
Changes from 3 commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
a2ca02a
Add ATP-specific generate DVs workflow
vinhowe May 12, 2023
095912c
Apparently we could generate DVs automatically
vinhowe May 12, 2023
298d028
Add survey cost estimation code
vinhowe May 16, 2023
b6a536e
Add code to automatically generate ATP configs
vinhowe May 16, 2023
a65478c
Lowercase variables because it works for neurips
vinhowe May 16, 2023
39026e1
Add 'culling sampled below n' code (note!)
vinhowe May 16, 2023
f937f59
Merge branch 'main' into neurips
vinhowe May 19, 2023
80110d9
Add output for debugging prompt construction
vinhowe May 19, 2023
08e9bb5
Revert "Lowercase variables because it works for neurips"
vinhowe May 19, 2023
7e7adef
Undo ValidOption lowercasing
vinhowe May 19, 2023
7435097
Revert "Add 'culling sampled below n' code (note!)"
vinhowe May 20, 2023
e0df35c
Remove n_culle_sampled_below everywhere
vinhowe May 20, 2023
97e84c8
Unsort imports in survey.py
vinhowe May 20, 2023
b013cb9
Ignore .DS_Store
vinhowe May 20, 2023
298cf23
Merge branch 'main' into emnlp
vinhowe May 20, 2023
a990667
s/config_filename/variables_filename/g
vinhowe May 20, 2023
5fc9c00
Filter out out-of-schema variable values
vinhowe May 20, 2023
bd7a057
Working async openai sampler impl
vinhowe May 20, 2023
490e07c
Reformat
vinhowe May 20, 2023
81c22f7
Remove extra comment
vinhowe May 20, 2023
3bd43df
Update example_configure_survey.py
vinhowe May 20, 2023
7b68fa6
Remove unused argparse import
vinhowe May 20, 2023
fea3a1f
Reformat
vinhowe May 20, 2023
b4a8fe9
Merge branch 'emnlp' into async-openai-sampling
vinhowe May 20, 2023
3728470
Remove unused prompt printing
vinhowe May 20, 2023
a218599
Merge branch 'emnlp' into async-openai-sampling
vinhowe May 20, 2023
2877aab
Remove comments for copilot
vinhowe May 20, 2023
4e8155d
Merge branch 'emnlp' into async-openai-sampling
vinhowe May 20, 2023
10b50a5
replace responses.csv -> data.csv
vinhowe May 20, 2023
a41a1bd
Update folder structure.
alexgshaw May 22, 2023
27615c8
Add style check GitHub workflow
vinhowe May 22, 2023
056761b
Merge branch 'emnlp' into async-openai-sampling
vinhowe May 22, 2023
8035a00
Bug fix + add slot for response object.
alexgshaw May 22, 2023
98b4acf
Fix index typing error.
alexgshaw May 22, 2023
687dcd0
Merge branch 'emnlp' into async-openai-sampling
vinhowe May 22, 2023
bd381ba
Merge branch 'emnlp' into async-openai-sampling
vinhowe May 22, 2023
a7febdf
Update async openai sampler to return response
vinhowe May 22, 2023
82c356c
Add ordinal property to variables.json
vinhowe May 22, 2023
08f02c5
Handle AutoModel async
vinhowe May 22, 2023
05ddeef
Merge pull request #8 from BYU-PCCL/async-openai-sampling
vinhowe May 22, 2023
2ed52ae
Merge branch 'emnlp' into add-ordinal-to-variables
vinhowe May 22, 2023
2c31a63
Merge pull request #11 from BYU-PCCL/add-ordinal-to-variables
alexgshaw May 22, 2023
64afba1
Fix survey sampling.
alexgshaw May 22, 2023
23bb834
Merge branch 'emnlp' of https://github.com/BYU-PCCL/lm-survey into emnlp
alexgshaw May 22, 2023
2b82b88
Update atp configuration
vinhowe May 22, 2023
f126bc5
Add ordinal functionality to Question class.
alexgshaw May 22, 2023
639d687
Merge branch 'emnlp' of https://github.com/BYU-PCCL/lm-survey into emnlp
alexgshaw May 22, 2023
3740cfc
Rework folder structure.
alexgshaw May 22, 2023
d7198fb
Temp. revert "Add style check GitHub workflow"
vinhowe May 23, 2023
b0b14e5
bug fix.
alexgshaw May 23, 2023
aee5f3c
Minor bug fix on question.
alexgshaw May 23, 2023
f260e71
rename schema to variable
alexgshaw May 23, 2023
83bdc97
gitignore update
alexgshaw May 23, 2023
ea66dd5
Added crude representativeness scoring
chrisrytting May 24, 2023
8008a9f
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
chrisrytting May 24, 2023
ac835b2
Update estimate survey to experiment config file
vinhowe May 24, 2023
d47640a
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
vinhowe May 24, 2023
e3c4c8e
breadth experiment
chrisrytting May 24, 2023
f9347ac
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
chrisrytting May 24, 2023
67bbf30
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
vinhowe May 24, 2023
5940ff0
Add create atp experiment script
vinhowe May 24, 2023
39c2fe4
Check in variables
vinhowe May 24, 2023
95da00f
Clean up create atp experiment imports
vinhowe May 24, 2023
562fb9d
Do formatting for create atp experiment
vinhowe May 24, 2023
28e3ac9
Update check survey prompts for experiments
vinhowe May 24, 2023
4e79da7
Update estimate survey to experiment config file
vinhowe May 24, 2023
cde6e24
Fix ATP configuration script
vinhowe May 24, 2023
9076e10
Use ordinals to find invalid options
vinhowe May 24, 2023
9eba1c9
Fix
vinhowe May 24, 2023
f0cf51d
Fix rate limit error import
vinhowe May 25, 2023
f510c3e
Bump up rate limit to what OpenAI says we have
vinhowe May 25, 2023
9aaead3
Push updates variables
vinhowe May 26, 2023
ac8e3fb
Add force flag.
alexgshaw May 26, 2023
6e9d77f
Sampler fix.
alexgshaw May 26, 2023
0a24acd
Added logging, added functionality to fill in missing response_objects
chrisrytting May 26, 2023
ad4525d
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
chrisrytting May 26, 2023
b058a5f
Bug fix.
alexgshaw May 26, 2023
34115df
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
chrisrytting May 27, 2023
5b3c884
added some helpers
chrisrytting May 29, 2023
07fdbbe
Added tests for infilling
chrisrytting May 29, 2023
21b8e75
Added infilling ability, some more logging, and some extra DVS functi…
chrisrytting May 29, 2023
446672c
Removed data push mistake
chrisrytting May 29, 2023
835da77
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
vinhowe May 29, 2023
1aff991
Remove unused import in async sampler
vinhowe May 29, 2023
8f7e6ad
Add updated estimate_survey.py
vinhowe May 30, 2023
849e91e
Minor fixes.
alexgshaw May 31, 2023
91b8041
Get rid of horrible rate limit print
vinhowe May 31, 2023
e1b1c52
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
vinhowe May 31, 2023
f65c797
Made the rep calculation neater
chrisrytting Jun 2, 2023
c96a889
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
chrisrytting Jun 2, 2023
0909387
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
vinhowe Jun 2, 2023
df555c6
Added weighting and an ability to extract D_H from opinionqa dataset
chrisrytting Jun 3, 2023
a0ba33e
Merge branch 'emnlp' of github.com:BYU-PCCL/lm-survey into emnlp
chrisrytting Jun 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 9 additions & 14 deletions calculate_representativeness.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from lm_survey.survey.results import SurveyResults
import numpy as np
from lm_survey.helpers import *
import os
import json
from lm_survey.survey.dependent_variable_sample import DependentVariableSample
@@ -10,28 +11,22 @@

# Grab all the files called "results.json" in the "experiments" directory
input_filepaths = glob.glob(
os.path.join("experiments/breadth", "**", "results.json"), recursive=True
os.path.join("experiments/breadth", "**", "*3.5*/results.json"),
recursive=True,
)

print(input_filepaths)

# read input filepaths into pandas dfs
dfs = []
mean_reps = {}

question_samples = []
for input_filepath in input_filepaths:
with open(input_filepath, "r") as file:
results = json.load(file)
question_samples = [
DependentVariableSample(
**sample_dict,
)
for sample_dict in results
]

survey_results = SurveyResults(question_samples=question_samples)
question_samples += filepath_to_DVS_list(input_filepath)
wave = input_filepath.split("/")[3][-3:]
# mean_reps[wave] = survey_results.get_representativeness()
print(f"{wave}: {survey_results.calculate_avg_samples()}")

survey_results = SurveyResults(question_samples=question_samples)


# print("Average representativeness: ", np.mean(list(mean_reps.values())))
# print(
9 changes: 7 additions & 2 deletions lm_survey/survey/dependent_variable_sample.py
Original file line number Diff line number Diff line change
@@ -96,14 +96,19 @@ def __init__(
independent_variables: typing.Dict[str, str],
variable_name: str,
question: typing.Union[Question, typing.Dict[str, typing.Any]],
prompt: str,
completion: typing.Union[Completion, typing.Dict[str, typing.Any]],
prompt: typing.Optional[str] = None,
chat_prompt: typing.Optional[str] = None,
**kwargs,
) -> None:
self.index = index
self.independent_variables = independent_variables
self.variable_name = variable_name
self.prompt = prompt
if chat_prompt:
self.prompt = chat_prompt
elif prompt:
self.prompt = prompt
# self.prompt = prompt

if isinstance(completion, Completion):
self.completion = completion
144 changes: 101 additions & 43 deletions lm_survey/survey/results.py
Original file line number Diff line number Diff line change
@@ -12,10 +12,15 @@
class SurveyResults:
def __init__(self, question_samples: typing.List[DependentVariableSample]):
df = pd.DataFrame(
data=[self._flatten_layer(sample.to_dict()) for sample in question_samples]
data=[
self._flatten_layer(sample.to_dict())
for sample in question_samples
]
)
# Make the index the index column and sort by it
df.set_index("index", inplace=True)
df["wave"] = df["variable_name"].str[-3:]


self.df = df.sort_index()
self._engineer_columns()
@@ -79,7 +84,9 @@ def _estimate_standard_error(
groups: pandas.core.groupby.generic.DataFrameGroupBy,
n_bootstraps: int = 1000,
) -> pd.Series:
bootstraps = pd.concat([self._bootstrap(groups) for _ in range(n_bootstraps)])
bootstraps = pd.concat(
[self._bootstrap(groups) for _ in range(n_bootstraps)]
)

return bootstraps.groupby(level=0).std()

@@ -106,52 +113,64 @@ def get_mean_score(self, slice_by: typing.List[str] = []) -> pd.DataFrame:

return scores_df

def get_accuracy(self):
"""Calculcate the accuracy"""

def get_representativeness(
self,
groups: pandas.core.groupby.generic.DataFrameGroupBy,
# slice_by: typing.List[str] = []
):
"""
Calculate the representativeness for each dv and print it
"""
df = self.df.copy()
reps = []
for dv in sorted(df.variable_name.unique()):
if dv == "INDUSTRY_W27":
print("here")
tdf = df[df.variable_name == dv]
ordinal = [k["ordinal"] for k in tdf.iloc[0]["valid_options"]]
# Make a dictionary where the keys are the possible_completions and the values are the number of times they appear
D = {k.strip(): 0.0 for k in tdf.iloc[0]["possible_completions"]}
D_H = D.copy()
D_M = D.copy()
for k, v in (
tdf.correct_completion.str.strip().value_counts(normalize=True).items()
):
D_H[k] = v
for k, v in (
tdf.top_completion.str.strip().value_counts(normalize=True).items()
):
D_M[k] = v

# Calculate the WD between the columns D_H and D_M
wd = wasserstein_distance(
ordinal, ordinal, list(D_H.values()), list(D_M.values())
) / self._get_max_wd(ordinal)
rep = 1 - wd
# For a dv, print the keys of D_H and D_M and then their respective values
tab = 2
print("For dv:", dv)
print(f"{tab * ' '}Rep:", rep)
# print(f"{tab * ' '}WD:", wd)
print(f"{tab * ' '}{''.ljust(tab)}D_M D_H")
for k in D_H.keys():
print(f"{tab * ' '}{k.ljust(tab)}{D_M[k]} {D_H[k]}")
print("\n")
reps.append(rep)
mean_score = np.mean(reps)
if np.isnan(mean_score):
raise ValueError("Mean score is NaN")
return mean_score
rep = lambda df: 1 - self._get_wd(df)
return groups.apply(rep)

# How we were doing before
#
# df = self.df.copy()
# reps = []
# for dv in sorted(df.variable_name.unique()):
# tdf = df[df.variable_name == dv]
# ordinal = [k["ordinal"] for k in tdf.iloc[0]["valid_options"]]
# # If the variable is categorical, skip it
# if len(set(ordinal)) == 1:
# continue
# # Make a dictionary where the keys are the possible_completions and the values are the number of times they appear
# D = {k.strip(): 0.0 for k in tdf.iloc[0]["possible_completions"]}
# D_H = D.copy()
# D_M = D.copy()
# for k, v in (
# tdf.correct_completion.str.strip()
# .value_counts(normalize=True)
# .items()
# ):
# D_H[k] = v
# for k, v in (
# tdf.top_completion.str.strip()
# .value_counts(normalize=True)
# .items()
# ):
# D_M[k] = v

# # Calculate the WD between the columns D_H and D_M
# wd = wasserstein_distance(
# ordinal, ordinal, list(D_H.values()), list(D_M.values())
# ) / self._get_max_wd(ordinal)
# rep = 1 - wd
# # For a dv, print the keys of D_H and D_M and then their respective values
# tab = 2
# print("For dv:", dv)
# print(f"{tab * ' '}Rep:", rep)
# # print(f"{tab * ' '}WD:", wd)
# print(f"{tab * ' '}{''.ljust(tab)}D_M D_H")
# for k in D_H.keys():
# print(f"{tab * ' '}{k.ljust(tab)}{D_M[k]} {D_H[k]}")
# print("\n")
# reps.append(rep)
# mean_score = np.mean(reps)
# return mean_score

def calculate_avg_samples(self):
"""
@@ -166,11 +185,50 @@ def calculate_avg_samples(self):
)
return (df.shape[0] / len(df.variable_name.unique()),)

def _calculate_D_M(self, df, by_top_completion=True, by_distribution=False):
"""Calculate the D_M metric for a dataframe, either by looking at the sampled completion or by considering the average distribution generated by a language model."""
possible_completions = df.possible_completions.iloc[0]
prop_dict = {k.strip(): 0.0 for k in possible_completions}
if by_top_completion:
df["top_completion"] = df["top_completion"].str.strip()
proportions = df.groupby("top_completion").size() / df.shape[0]
for prop in proportions.index:
prop_dict[prop] = proportions[prop]
return prop_dict

def _calculate_D_H(self, df, use_opinion_qa = False):
f"""Calculate the D_H metric for a dataframe."""
if use_opinion_qa:

else:
possible_completions = df.possible_completions.iloc[0]
df["correct_completion"] = df["correct_completion"].str.strip()
proportions = df.groupby("correct_completion").size() / df.shape[0]
prop_dict = {k.strip(): 0.0 for k in possible_completions}
for prop in proportions.index:
prop_dict[prop] = proportions[prop]

return prop_dict

def _get_wd(self, df):
D_M = self._calculate_D_M(df)
D_H = self._calculate_D_H(df)
ordinal = [k["ordinal"] for k in df.iloc[0]["valid_options"]]

wd = wasserstein_distance(
ordinal, ordinal, list(D_H.values()), list(D_M.values())
) / self._get_max_wd(ordinal)
return wd

def _get_max_wd(self, ordered_ref_weights):
d0, d1 = np.zeros(len(ordered_ref_weights)), np.zeros(len(ordered_ref_weights))
d0, d1 = np.zeros(len(ordered_ref_weights)), np.zeros(
len(ordered_ref_weights)
)
d0[np.argmax(ordered_ref_weights)] = 1
d1[np.argmin(ordered_ref_weights)] = 1
max_wd = wasserstein_distance(ordered_ref_weights, ordered_ref_weights, d0, d1)
max_wd = wasserstein_distance(
ordered_ref_weights, ordered_ref_weights, d0, d1
)
return max_wd