Feat/llm predictor #1

915-Muscalagiu-AncaIoana · 2024-11-24T07:33:06Z

No description provided.

recsys/inference/llm_ranking_predictor.py

iusztinpaul · 2024-11-25T15:46:45Z

recsys/inference/llm_ranking_predictor.py

+            "article_ids": article_ids,
+        }
+
+    def _postprocess_output(self, output):


Move the def _postprocess_output(self, output): after def _preprocess_features(self, features): to follow a natural flow of reading.

iusztinpaul · 2024-11-25T15:47:45Z

recsys/inference/llm_ranking_predictor.py

+               - Categorical features: These describe qualitative aspects, like product category, color, and material.
+            3. Your response should only include the probability of purchase for the positive class (e.g., likelihood of being purchased), as a value between 0 and 1.
+
+            ### Product Features:


These are all product features? some of them are not of the customers?

Indeed it's also for the customer.

iusztinpaul · 2024-11-25T15:50:23Z

recsys/inference/llm_ranking_predictor.py

+
+        scores = []
+        for feature in preprocessed_features:
+            langchain_output = self.llm.invoke(feature)


Why do we invoke the llm / feature?

It's indeed not on the feature, but on the feature set of a candidate together with the customer features, which is one data points that needs to be predicted.

iusztinpaul · 2024-11-25T15:53:53Z

recsys/hopsworks_integration/feature_store.py

    return project, project.get_feature_store()


+def get_secrets_api():
+    connection = hopsworks.connection(host="c.app.hopsworks.ai",
+                                      hostname_verification=False,


Have you tried this without adding host="c.app.hopsworks.ai", hostname_verification=False, port=443 ?

You could try to access the get_secrets_api() method directly from the project returned by login. That is usually how it was done throughout the project.

The problem is that the secrets_api is not linked with the project instance in hopsworks, but it is at the level of the user ( hence you need also tokens with also the user scope checked in order to access it, which is not the default scope of a token ), therefore you have to take it from hopsworks.connection() not from the project. Now, trying this using the default configuration of the connection() produces this error:
File "/Users/ancaioanamuscalagiu/Documents/hands-on-recommender-system/.venv/lib/python3.11/site-packages/hopsworks/client/external.py", line 38, in init
raise exceptions.ExternalClientError("host")
hopsworks.client.exceptions.ExternalClientError: host
which if we look up in their source code can be traced back to:

"""Initializes a client in an external environment such as AWS Sagemaker.""" if not host: raise exceptions.ExternalClientError("host")

so therefore when it is an external client ( such as my local machine trying to perform this connect ) it expects the host parameter to be set.

iusztinpaul · 2024-11-25T15:55:06Z

recsys/hopsworks_integration/feature_store.py

+    secrets = secrets_api.get_secrets()
+    existing_secret_keys = [secret.name for secret in secrets]
+    # Create the OPENAI_API_KEY secret if it doesn't exist
+    if "OPENAI_API_KEY" not in existing_secret_keys:


If this runs on the cloud, the settings will not load the .env file. Thus, it will load an empty value. I would avoid doing this step. Better assert these values and crash the program if they are missing with a clear message.

A safer option would be to override the settings object at start-up time, based on the Hopsworks secrets. Like this the settings object is the single source of truth.

To avoid adding these secrets manually, we could create a different scripts that adds them based on the settings file, but I would make it an explicit operation to avoid weird behavior.

iusztinpaul · 2024-11-25T15:59:59Z

recsys/inference/llm_ranking_predictor.py

+        }
+
+    def _postprocess_output(self, output):
+        return float(output['text'].split(':')[1].strip())


I would also check if the output is a flow, otherwise do an try expect, returning a minimum score or something to avoid crashing the probram.

you can do that easily with Pydantic classic + LangChain to check that the value is a float within an expected range.

Porting to 4.0

915-Muscalagiu-AncaIoana force-pushed the feat/llm_predictor branch 3 times, most recently from c3add2d to a97f838 Compare November 24, 2024 15:07

iusztinpaul reviewed Nov 25, 2024

View reviewed changes

iusztinpaul pushed a commit that referenced this pull request Nov 28, 2024

Merge pull request #1 from manu-sj/deployment-fix

0e1ebf8

Porting to 4.0

915-Muscalagiu-AncaIoana added 6 commits November 30, 2024 19:33

added LLM as ranking model

a90c891

finished draft LLM as ranking model

072469f

added LLM predictor deployment notebook

0a6e506

added creation of secrets & model cleanup

f5319b2

added llm model registering

3acc838

added output validation of the llm & fix pr comments

fb8fe48

915-Muscalagiu-AncaIoana force-pushed the feat/llm_predictor branch 2 times, most recently from 7b5d5d5 to 80c1a99 Compare December 3, 2024 07:34

added setup for environment and secrets

093be19

915-Muscalagiu-AncaIoana force-pushed the feat/llm_predictor branch from 80c1a99 to 093be19 Compare December 3, 2024 20:19

915-Muscalagiu-AncaIoana added 2 commits December 4, 2024 00:40

changed pydantic output validator to parser

c9a0121

updated readme

7ebb271

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/llm predictor #1

Feat/llm predictor #1

915-Muscalagiu-AncaIoana commented Nov 24, 2024

iusztinpaul Nov 25, 2024

iusztinpaul Nov 25, 2024

915-Muscalagiu-AncaIoana Nov 25, 2024

iusztinpaul Nov 25, 2024

915-Muscalagiu-AncaIoana Nov 25, 2024

iusztinpaul Nov 25, 2024

915-Muscalagiu-AncaIoana Nov 25, 2024

iusztinpaul Nov 25, 2024

iusztinpaul Nov 25, 2024

iusztinpaul Nov 25, 2024

iusztinpaul Nov 25, 2024

Feat/llm predictor #1

Are you sure you want to change the base?

Feat/llm predictor #1

Conversation

915-Muscalagiu-AncaIoana commented Nov 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment