Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Korean to KSL #129

Closed
AmitMY opened this issue Jan 17, 2024 · 12 comments
Closed

Korean to KSL #129

AmitMY opened this issue Jan 17, 2024 · 12 comments

Comments

@AmitMY
Copy link
Contributor

AmitMY commented Jan 17, 2024

Hello,

I have the similar question.

I want to translate from Korean Nature Language to Korean Sign Language (KSL) or vice versa.

But I found there's no translation to KSL. Is it because there was limited dataset for it?

Also, can I know how to train Korean and KSL dataset or how did you train other available languages?

Thanks!

Originally posted by @tmdtmdqorekf in #128 (comment)

@AmitMY
Copy link
Contributor Author

AmitMY commented Jan 17, 2024

Hi @tmdtmdqorekf
If you know of a dictionary with Korean SIgn Language, and they give us permission to use their data, I can include it.

In general, this is not a trained model. At the moment, it uses dictionary entries and stitches them together:
https://github.com/ZurichNLP/spoken-to-signed-translation

@tmdtmdqorekf
Copy link

tmdtmdqorekf commented Jan 17, 2024

Thanks for your reply!

This is a KSL dictionary below. But the site and all the words are written in Korean.

Url: https://sldict.korean.go.kr/front/main/main.do

The site policy said that we can use the data if we mention author(source) we can use it.

(Just in case you need, the source is 'National Institute of the Korean Language (NIKL)')

@AmitMY
Copy link
Contributor Author

AmitMY commented Jan 17, 2024

Materials on this website subject to the 'Creative Commons Attribution-Attribution-NonCommercial-NoDerivatives 2.0 Korea License' can be freely used for non-profit purposes. However, in order to use the copyrighted work, the following conditions must be observed.

  • Author indication: When using materials, the author must be indicated.
  • No changes: The materials must be used as is without any changes.

This means we are not allowed to perform and show pose estimations

@tmdtmdqorekf
Copy link

Oh is it because of the second condition?

@AmitMY
Copy link
Contributor Author

AmitMY commented Jan 17, 2024

Yes. If they relax that condition, it would be possible. Ideally CC-BY, but CC-BY-NC would also be ok.

@tmdtmdqorekf
Copy link

I'll ask about it.

Instead, I found other dataset which is usable.

Url:
https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=103

Can you check about this please?

@tmdtmdqorekf
Copy link

tmdtmdqorekf commented Jan 17, 2024

But in this case, only Koreans can request API.

If I give you this API after requesting it by myself, is it not possible?

Or should I edit your code to implement it on my own?

@AmitMY
Copy link
Contributor Author

AmitMY commented Jan 18, 2024

This is a very large dataset - 2.63 TB - so I am not sure it contains videos of single words. Does it?

Also, can you direct me to where I can see the license of the data?

@tmdtmdqorekf
Copy link

tmdtmdqorekf commented Jan 18, 2024

This is more information about the data below. So, yes, it contains a video of binding word.

Deployment content and amount of data delivered

Total 536,000 sign language video clips (.mp4 files)

  • Video of 2000 sign language sentences, 3000 sign language words, 1000 exponential characters/fingerprints
  • Sign language sentence/word video (500,000 sign language video clip) taken from 20 language providers at the same time at 5 angles through direct studio photography
  • Exponential language video (21,000 clips) collected from 21 language providers through crowdsourcing filming
  • Handwritten/word video (15,000 clips) produced with avatars
    Morpheme and non-resin element processing values for a total of 536,000 sign language images (json files)
    keypoint value (json file) for 30fps split image for a total of 536,000 sign language images

@tmdtmdqorekf
Copy link

tmdtmdqorekf commented Jan 18, 2024

Also, for the license, here's the link. You can check it in the first section.

https://www.aihub.or.kr/intrcn/guid/usagepolicy.do?currMenu=151&topMenu=105

I included the translated content.

Data Introduction

AI learning data provided by the AI hub (hereinafter referred to as "AI Data") was established as part of the "Building Infrastructure for the Intelligent Information Industry" project by the Ministry of Science and ICT and the Korea Intelligent Information Society Agency.

All rights to data, AI application models and data authoring tools, various manuals, etc. (hereinafter referred to as "AI Data, etc."), which are tangible and intangible results of this project, are held by AI data and participating organizations (hereinafter referred to as "executing organizations, etc.") and the Korea Intelligent Information Society Agency.

This AI data has been established for the development of artificial intelligence technology, products, and services, and can be used for commercial and non-profit research and development purposes in various fields such as intelligent products, services, and chatbots.

Data Utilization Policy

In order to use this AI data, etc., we notify you that we agree to the following and comply with it.

When using this AI data, etc., it must be revealed that it is the result of the project of the Korea Intelligence and Information Society Promotion Agency, and the same must be revealed in the secondary work using this AI data, etc.

In order for a corporation, organization, or individual located abroad to use AI data, it is necessary to agree separately with the executive agency and the Korea Intelligence Information Society Promotion Agency.

In order to take this AI data out of the country, an agreement is required separately from the executive agency and the Korea Intelligence Information Society Promotion Agency.

This AI data can only be used for learning artificial intelligence learning models.

If the purpose, method, and content of using AI data are deemed illegal or inappropriate, the Korea Intelligence and Information Society Agency may refuse to provide it, and if it has already provided it, it may request the suspension of use, the return, and disposal of AI data.

The AI data, etc. provided shall not be provided, transferred, rented, or sold to any other corporation, organization, or individual who has not been approved by the Korea Intelligence and Information Society Agency.

All civil and criminal responsibilities for AI data, etc. arising as a result of unauthorized access, provision, transfer, rental, sales, etc. other than the purpose under paragraph (4), shall lie with the corporation, organization, or individual using AI data, etc.

If it is found that personal information, etc. is included in the AI hub-provided dataset, the user shall immediately report the fact to the AI hub and delete the downloaded dataset.

The non-identification information (including reproduction information) provided by the AI hub shall be safely used for the purpose of developing artificial intelligence services, etc., and no act shall be performed to re-identify an individual using it.

In the future, if the Korea Intelligence and Information Society Promotion Agency conducts a fact-finding survey on use cases and achievements, it shall be faithfully engaged in this.

Thanks.

@tmdtmdqorekf
Copy link

Data downloader can't provide the dataset to the third party, so you need to request directly to the agency if you want to use the dataset outside Korea.

So.. I think it's really difficult for you to access to KSL dataset.

I hope this process can be eased in the future.

BTW thanks for your quick feedback!

@AmitMY
Copy link
Contributor Author

AmitMY commented Jan 18, 2024

Well, if you have access to their data, you can use it in https://github.com/ZurichNLP/spoken-to-signed-translation
I will try to request access at some point, but it is not my main priority right now.

@bipinkrish bipinkrish mentioned this issue Feb 7, 2024
@AmitMY AmitMY closed this as completed Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants