Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could you provide the topic entity in each sentence? #8

Open
ToneLi opened this issue Jul 2, 2024 · 5 comments
Open

could you provide the topic entity in each sentence? #8

ToneLi opened this issue Jul 2, 2024 · 5 comments

Comments

@ToneLi
Copy link

ToneLi commented Jul 2, 2024

No description provided.

@Wuyxin
Copy link
Collaborator

Wuyxin commented Jul 2, 2024

Hi, are you referring to the paper topic (field) of STaRK-MAG?

@ToneLi
Copy link
Author

ToneLi commented Jul 2, 2024

the topic entity in the input sentence (query) of stark-mag, STaRK-prime, and stark-amazon

@ToneLi
Copy link
Author

ToneLi commented Jul 2, 2024

thanks for that

@ToneLi
Copy link
Author

ToneLi commented Jul 2, 2024

The entity's surface name that appears in the sentence, such as 'Yuriy Norshteyn,' is shown below.
image

@Wuyxin
Copy link
Collaborator

Wuyxin commented Jul 8, 2024

Got it! Thanks for the explanations.

For now we see the topic entity ids as part of the meta data of the natual language query. We prefer not to release them for now since it may be eaiser to conduct retrieval while they may not always be available in the real-world cases.

Another note if you want to try to use NER model to recognize the entities: Some entities (like products) may have the same name. We also found that a proportion of authors in STaRK-MAG are "Unknown".

While the tasks could be much more challenging due to the unavailable topic entities and ambiguity as mentioned, we hope to keep the tasks close to real-world situations. Let me know if these make sense to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants