You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Any ideas on why this code doesn't work? I was trying to use sliding windows, but it seems like Mistral doesn't support that. So then I tried truncating, and setting a higher token limit - but that doesn't seem to work either (and sorry for all of the print statements, but I am just trying to debug this)
importpandasaspdfromctransformersimportAutoModelForCausalLMimporttime# Start measuring overall execution timeoverall_start_time=time.time()
# Load your DataFrame with the first 5 rowsprint("Loading the DataFrame...")
dataframe_start_time=time.time()
df=pd.read_json('Russia/parliamint_russia_sents_paras.jsonl', lines=True)[:5] # Adjust path as necessarydataframe_end_time=time.time()
dataframe_loading_time=dataframe_end_time-dataframe_start_timeprint(f"DataFrame Loaded in {dataframe_loading_time:.2f} seconds\n")
# Load the modelprint("Loading the model...")
model_start_time=time.time()
llm=AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-v0.1-GGUF", model_file="mistral-7b-v0.1.Q4_K_M.gguf", model_type="mistral", gpu_layers=50)
model_end_time=time.time()
model_loading_time=model_end_time-model_start_timeprint(f"Model Loaded in {model_loading_time:.2f} seconds\n")
# Define your example and keyword promptsexample_prompt="""<s>[INST]I have the following document:- Europe, ladies and gentlemen, the Community of European States, is multicultural. That is a fact, a circumstance that is to be accepted. And I think more: We are currently experiencing an ethnic-national earthquake that is changing Europe's political map more than the two world wars have done. Please give me the keywords that are present in this document and separate them with commas.Make sure you to only return the keywords (nounphrases with 1, 2 or 3 words) and say nothing else. For example, don't say: "Here are the keywords present in the document"[/INST] Europe, ladies, gentlemen, Community of European States, multicultural earthquake, ethnic-national earthquake, map, political map, Europe's political map, world war</s>"""keyword_prompt_template="""[INST]I have the following document:- [DOCUMENT]Please give me the keywords that are present in this document and separate them with commas.Make sure you to only return the keywords (nounphrases with 1, 2 or 3 words) and say nothing else. For example, don't say: "Here are the keywords present in the document"[/INST]"""# Set the token limit for the Mistral modelmax_new_tokens=4096# Initialize a list to store execution timesexecution_times= []
generated_keywords= [] # To store generated keywords# Process the first 5 rowsforiinrange(5):
# Select one row from the DataFramerow=df.iloc[i]
# Extract the paragraph textpara_text=row['para_text']
# Truncate the paragraph to the token limitpara_text=para_text[:max_new_tokens-len(example_prompt) -len(keyword_prompt_template)]
# Construct the full promptfull_prompt=example_prompt+keyword_prompt_template.replace("[DOCUMENT]", para_text)
# Print that processing is starting for this exampleprint(f"Processing Example {i+1}...\n")
# Print inputsprint(f"Input Paragraph Text {i+1}:\n{para_text}\n")
print(f"Full Prompt {i+1}:\n{full_prompt}\n")
# Measure the time it takes to generate a responsestart_time=time.time()
response=llm(full_prompt)
end_time=time.time()
execution_time=end_time-start_timeexecution_times.append(execution_time)
# Print the generated responseprint(f"Generated Response {i+1}:\n{response}\n")
print(f"Execution Time {i+1}: {execution_time:.2f} seconds\n")
# Extract keywords from the response (you may need to modify this part)# For now, we'll simply split the response into keywordskeywords=response.split(", ")
generated_keywords.append(keywords)
# Print that processing for this example has finishedprint(f"Processing for Example {i+1} finished\n")
# Calculate and print the total execution timetotal_execution_time=sum(execution_times)
print(f"Total Execution Time for 5 rows: {total_execution_time:.2f} seconds")
# Measure overall execution timeoverall_end_time=time.time()
overall_execution_time=overall_end_time-overall_start_timeprint(f"Overall Execution Time: {overall_execution_time:.2f} seconds")
# Display the generated keywords for each examplefori, keywordsinenumerate(generated_keywords):
print(f"Generated Keywords for Example {i+1}: {', '.join(keywords)}\n")
This is what I get. And it also takes 2m per input - not sure whether that's only because of the errors or not.
Model Loaded in 5.73 seconds
Processing Example 1...
Input Paragraph Text 1:
The dead – according to the results so far – are no longer in this place for about 50 years. Because of the bone remains, it must have been about 20 to 22 years of age, namely persons of male sex. The upper and lower jaw remains indicate that they are probably not refugees or prisoners of war from Russia because their teeth were usually in worse condition, as I am told.
Full Prompt 1:
<s>[INST]
I have the following document:
- Europe, ladies and gentlemen, the Community of European States, is multicultural. That is a fact, a circumstance that is to be accepted. And I think more: We are currently experiencing an ethnic-national earthquake that is changing Europe's political map more than the two world wars have done.
Please give me the keywords that are present in this document and separate them with commas.
Make sure you to only return the keywords (nounphrases with 1, 2 or 3 words) and say nothing else. For example, don't say:
"Here are the keywords present in the document"
[/INST] Europe, ladies, gentlemen, Community of European States, multicultural earthquake, ethnic-national earthquake, map, political map, Europe's political map, world war</s>
[INST]
I have the following document:
- The dead – according to the results so far – are no longer in this place for about 50 years. Because of the bone remains, it must have been about 20 to 22 years of age, namely persons of male sex. The upper and lower jaw remains indicate that they are probably not refugees or prisoners of war from Russia because their teeth were usually in worse condition, as I am told.
Please give me the keywords that are present in this document and separate them with commas.
Make sure you to only return the keywords (nounphrases with 1, 2 or 3 words) and say nothing else. For example, don't say:
"Here are the keywords present in the document"
[/INST]
Number of tokens (513) exceeded maximum context length (512).
Number of tokens (514) exceeded maximum context length (512).
Number of tokens (515) exceeded maximum context length (512).
Number of tokens (516) exceeded maximum context length (512).
Number of tokens (517) exceeded maximum context length (512).
Number of tokens (518) exceeded maximum context length (512).
Number of tokens (519) exceeded maximum context length (512).
Any suggestions? Or are there maybe other (/newer?) models that WOULD allow for sliding windows? Thanks!
The text was updated successfully, but these errors were encountered:
Any ideas on why this code doesn't work? I was trying to use sliding windows, but it seems like Mistral doesn't support that. So then I tried truncating, and setting a higher token limit - but that doesn't seem to work either (and sorry for all of the print statements, but I am just trying to debug this)
This is what I get. And it also takes 2m per input - not sure whether that's only because of the errors or not.
Any suggestions? Or are there maybe other (/newer?) models that WOULD allow for sliding windows? Thanks!
The text was updated successfully, but these errors were encountered: