Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan: reputation group #76

Open
0xtsukino opened this issue Feb 1, 2023 · 10 comments
Open

plan: reputation group #76

0xtsukino opened this issue Feb 1, 2023 · 10 comments
Assignees

Comments

@0xtsukino
Copy link
Contributor

0xtsukino commented Feb 1, 2023

TODO:

  • sunset old interep groups (cannot join this group anymore, but should work for anyone already in the group)
  • decide threshold for twitter/reddit/github membership tiers
  • implement reputation verification logic (gatekeeping for joining the group)

zk-group permissioned group for twitter/reddit/github

  • at some point ew can use zk-group to do rep verification too
  • for now we should just keep this group centralized and private on zkitterd (indexer)

dao, and github contributors

  • for now we should just keep this group centralized and private on zkitterd (indexer)

  • on-chain: eth2 validators (plan later)

@0xtsukino 0xtsukino added this to Zkitter Feb 1, 2023
@0xtsukino 0xtsukino converted this from a draft issue Feb 1, 2023
@0xtsukino 0xtsukino changed the title plan: refactor twitter/reddit/github reputation group plan: reputation group Feb 1, 2023
@0xtsukino
Copy link
Contributor Author

When inserting new member, let's keep track of usernames

@0xtsukino 0xtsukino moved this from 🏗 In progress to 📋 Backlog in Zkitter Feb 1, 2023
@sripwoud
Copy link
Member

sripwoud commented Feb 3, 2023

For the "decide threshold for twitter/reddit/github membership tiers" step:

  1. Decide which stats to collect
    Selection criteria:

    • Should be publicly available: we need to be able to scrap this data! So we exclude private data.
    • API to fetch data points should exist (I don't want to scrap HTML😅)
    • should "contain" useful information(i.e. can be used in a reputation "calculation"): numerical info is better and urls like e.g profile image are useless

    Suggested stats to include in reputation score:

    twitter GitHub Reddit
    followers count total stars earned comment karma
    following count total PRs link karma
    creation date (account age) total commits creation date (account age)
    botometer score total issues opened premium subscription?
    verified followers? coins?
    sponsorships? linked identities?
    total contributions?
    total received stars?

    Difference with current stas included in current interep thresholds:

    • github: exclude "proPlan" as it is not available in public data (only on authenticated profile data).
    • twitter: keep same, adjust thresholds
    • reddit: to be defined (are coins, linked identities and premium subscription stats publicly available?)
  2. Collect public data samples for each oauth provider: min ~1000 data points for each.

  3. Bin data according to a given distribution to find thresholds
    Find the bin edges that will match a given distribution: e.g bronze/silver/gold - 60/30/10. These bin edges will be our thresholds.

@sripwoud
Copy link
Member

sripwoud commented Feb 3, 2023

@0xtsukino @AtHeartEngineer what do you think?

@cedoor For the reddit stats: premium subscription, coins and linked identities are used by interrep. Do you know how to fetch this data? Is it even available by fetching public user data with the reddit api?
I am a bit struggling to use the reddit api (it is not very well documented) to collect data, so I might just end using a public dataset like this one.

@cedoor
Copy link

cedoor commented Feb 5, 2023

Hi @r1oga!

I am a bit struggling to use the reddit api (it is not very well documented) to collect data, so I might just end using a public dataset like this one.

Isn't that dataset old? Or is it updated?

Do you know how to fetch this data? Is it even available by fetching public user data with the reddit api?

Interep is using this API: https://github.com/interep-project/reputation-service/blob/main/src/services/reddit/index.ts I don't think you can fetch user data without their OAuth2 token

@sripwoud
Copy link
Member

sripwoud commented Feb 6, 2023

Thanks for the info.
yes that dataset is probably a bit old.

cant fetch user data without Oauth token

Oh so it would mean you can only fetch the data of an authenticated user?!

I am going to try this one https://www.reddit.com/dev/api/#GET_api_user_data_by_account_ids
to fetch user data from randomly generated ids.

@cedoor
Copy link

cedoor commented Feb 6, 2023

Oh so it would mean you can only fetch the data of an authenticated user?!

It's something we should test, but I think you need a token to fetch some data, maybe not everything though.

I am going to try this one https://www.reddit.com/dev/api/#GET_api_user_data_by_account_ids to fetch user data from randomly generated ids.

Which data does it return?

@0xtsukino
Copy link
Contributor Author

I am going to try this one https://www.reddit.com/dev/api/#GET_api_user_data_by_account_ids to fetch user data from randomly generated ids.

how about:

seems like both requires an oauth token, but can be used to query about any user/listing

@sripwoud
Copy link
Member

sripwoud commented Feb 7, 2023

Thanks, will try that too.

@sripwoud
Copy link
Member

sripwoud commented Feb 7, 2023

@0xtsukino
For the gh data here are intermediary results and first new thresholds suggestions. (In the interep org as I had started this work there).
interep-project/interep-groups-eda#7

Instead of the 60/90 quantiles like I had suggested above, 90/99/99.9 seem to be better, at least for GH data.

@sripwoud
Copy link
Member

sripwoud commented Feb 16, 2023

update (for sprint planning today):

  • sunset old interep groups (cannot join this group anymore, but should work for anyone already in the group): not done, I'll discuss with cedoor how to do that

  • decide threshold for twitter/reddit/github membership tiers: 80% done. Reddit and twitter done, github data collected but need to clean/crunch it and run some simulations.

  • implement reputation verification logic (gatekeeping for joining the group): not done (part of the services are ready, like this API which return the data for the GH groups)

  • zk-group permissioned group for twitter/reddit/github: still need to sync more in details with andy and cedoor

  • dao, and github contributors: not complete, partly done, see https://github.com/zkitter/groups

  • eth2 validators (plan later): we can reuse some endpoint I build for https://github.com/privacy-scaling-explorations/e2e-zk-ecdsa. see here the /beacon endpoint

@0xtsukino

@sripwoud sripwoud moved this from 📋 Backlog to 🏗 In progress in Zkitter Feb 16, 2023
@AtHeartEngineer AtHeartEngineer moved this from 🏗 In progress to 👀 In Review in Zkitter Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 👀 In Review
Development

No branches or pull requests

3 participants