Skip to content

Commit

Permalink
data substitutions, improved import and query docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mchmarny committed May 30, 2022
1 parent d9c75a9 commit 68a2973
Show file tree
Hide file tree
Showing 13 changed files with 442 additions and 332 deletions.
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,29 +30,33 @@ One of the core principles on which `dctl` is built is that open source contribu

## How

`dctl` imports all contribution metadata for a specific repo(s) using the [GitHub API](https://docs.github.com/en/rest), and augments that data with developer affiliations from sources like [CNCF](https://github.com/cncf/gitdm) and [Apache Foundation](https://www.apache.org/foundation/members.html). More about importing data [here](docs/IMPORT.md).
`dctl` imports all contribution metadata for a specific repo(s) using the [GitHub API](https://docs.github.com/en/rest), and augments that data with developer affiliations from sources like [CNCF](https://github.com/cncf/gitdm). More about importing data [here](docs/IMPORT.md).

Once downloaded, `dctl` exposes that data using a local UI with option to drill-downs different aspects of the project activity (screenshot above). The instructions on how to start the integrated server and access the UI in your browser are located [here](docs/SERVER.md).
Once downloaded, `dctl` exposes that data using a local UI with an option to drill-down on different aspects of the project activity (screenshot above). The instructions on how to start the integrated server and access the UI in your browser are located [here](docs/SERVER.md).

`dctl` can also be used to query the imported data in CLI and output JSON payloads for subsequent postprocessing in another tool (e.g. [jq](https://stedolan.github.io/jq/)). More about the CLI query option [here](docs/QUERY.md)

Whichever way you decide to use `dctl`, you will be able to use time period, contribution type, and developer name filters to further scope your data and identify specific trends with direct links to the original detail in GitHub for additional context. And, since all this data is cached locally in [sqlite](https://www.sqlite.org/index.html) DB, you can even use another tool to further customized your queries using SQL without the need to re-download data. More about that [here](docs/QUERY.md)
Whichever way you decide to use `dctl`, you will be able to use time period, contribution type, and developer name filters to further scope your data and identify specific trends with direct links to the original detail in GitHub for additional context.

And, since all this data is cached locally in [sqlite](https://www.sqlite.org/index.html) DB, you can even use another tool to further customized your queries using SQL without the need to re-download data. More about that [here](docs/QUERY.md)

## Usage

`dctl` is a dual-purpose utility that can be either used as a `CLI` to [authentication](#authentication), [import](docs/IMPORT.md) data, and [query](docs/QUERY.md) the data, or as a [server](docs/SERVER.md) to launch a local app that provides UI to access the imported data in your browser.
`dctl` is a dual-purpose utility that can be either used as a CLI and as a server that can be accessed locally with your browser:

* [authentication](#authentication)
* [import](docs/IMPORT.md)
* [query](docs/QUERY.md)
* [server](docs/SERVER.md)
* [Authenticating to GitHub](#authentication)
* [Importing Data](docs/IMPORT.md)
* [Querying Data](docs/QUERY.md)
* [Using Server](docs/SERVER.md)

## Authentication

To get access to the higher GitHub API rate limits, `dctl` uses OAuth authentication to obtain a token:
To import data, `dctl` users GitHub API. While you can use GitHub API without authentication, to avoid throttling, and to get access to the higher API rate limits, `dctl` uses OAuth-based authentication to obtain a GitHub user token:

> `dctl` doesn't ask for any access scopes, so the resulting token has only access to already public data
To authenticate to GitHub using `dctl`:

```shell
dctl auth
```
Expand All @@ -65,7 +69,9 @@ The result should look something like this:
3). Hit enter to complete the process:
```

Once you enter the provided code in the GitHub UI prompt and hit enter, `dctl` will persist the token in your home directory for all subsequent queries. Should you need to, you can revoke that token in your [GitHub Settings](https://docs.github.com/en/developers/apps/managing-oauth-apps/deleting-an-oauth-app).
Follow these steps. Once you enter the provided code in the GitHub UI prompt and hit enter, `dctl` will persist the token in your home directory for all subsequent queries. Should you need to, you can revoke that token in your [GitHub Settings](https://docs.github.com/en/developers/apps/managing-oauth-apps/deleting-an-oauth-app).

Once authenticated, try [importing some data](docs/IMPORT.md).

## Disclaimer

Expand Down
83 changes: 42 additions & 41 deletions cmd/cli/import.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package main

import (
"context"
"database/sql"
"encoding/json"
"fmt"
"os"
Expand Down Expand Up @@ -71,28 +72,22 @@ var (
Action: cmdImportAffiliations,
},
{
Name: "names",
Aliases: []string{"n"},
Usage: "Updates imported developer names with Apache Foundation data",
Action: cmdUpdateAFNames,
},
{
Name: "updates",
Aliases: []string{"u"},
Usage: "Update all previously imported org, repos, and affiliations",
Action: cmdUpdate,
},
{
Name: "substitute",
Name: "substitutions",
Aliases: []string{"s"},
Usage: "Global substitute imported data (e.g. replacing entity name)",
Action: cmdSubstitute,
Usage: "Create a global data substitutions (e.g. standardize location or entity name)",
Action: cmdSubstitutes,
Flags: []cli.Flag{
subTypeFlag,
oldValFlag,
newValFlag,
},
},
{
Name: "updates",
Aliases: []string{"u"},
Usage: "Update all previously imported org, repos, and affiliations",
Action: cmdUpdate,
},
},
}
)
Expand All @@ -105,8 +100,10 @@ type EventImportResult struct {
}

type EventUpdateResult struct {
Duration string `json:"duration,omitempty"`
Imported map[string]int `json:"imported,omitempty"`
Duration string `json:"duration,omitempty"`
Imported map[string]int `json:"imported,omitempty"`
Updated *data.AffiliationImportResult `json:"updated,omitempty"`
Substituted []*data.Substitution `json:"substituted,omitempty"`
}

func cmdUpdate(c *cli.Context) error {
Expand All @@ -127,10 +124,27 @@ func cmdUpdate(c *cli.Context) error {
return errors.Wrap(err, "failed to import events")
}

db := getDBOrFail()
defer db.Close()

// update final state
res.Imported = m
res.Duration = time.Since(start).String()

// also update affiliations
a, err := importAffiliations(db)
if err != nil {
return errors.Wrap(err, "failed to import affiliations")
}
res.Updated = a

// also update substitutes
sub, err := data.ApplySubstitutions(db)
if err != nil {
return errors.Wrap(err, "failed to apply substitutions")
}
res.Substituted = sub

if err := json.NewEncoder(os.Stdout).Encode(res); err != nil {
return errors.Wrapf(err, "error encoding list: %+v", res)
}
Expand Down Expand Up @@ -190,41 +204,34 @@ func cmdImportEvents(c *cli.Context) error {
return nil
}

func cmdImportAffiliations(c *cli.Context) error {
func importAffiliations(db *sql.DB) (*data.AffiliationImportResult, error) {
token, err := getGitHubToken()
if err != nil {
return errors.Wrap(err, "failed to get GitHub token")
return nil, errors.Wrap(err, "failed to get GitHub token")
}

if token == "" {
return cli.ShowSubcommandHelp(c)
return nil, errors.New("no GitHub token")
}

ctx := context.Background()
client := net.GetOAuthClient(ctx, token)

db := getDBOrFail()
defer db.Close()

res, err := data.UpdateDevelopersWithCNCFEntityAffiliations(ctx, db, client)
if err != nil {
return errors.Wrap(err, "failed to import affiliations")
return nil, errors.Wrap(err, "failed to import affiliations")
}

if err := json.NewEncoder(os.Stdout).Encode(res); err != nil {
return errors.Wrapf(err, "error encoding list: %+v", res)
}

return nil
return res, nil
}

func cmdUpdateAFNames(c *cli.Context) error {
func cmdImportAffiliations(c *cli.Context) error {
db := getDBOrFail()
defer db.Close()

res, err := data.UpdateNoFullnameDevelopersFromApache(db)
res, err := importAffiliations(db)
if err != nil {
return errors.Wrap(err, "failed to update names from apache foundation")
return errors.Wrap(err, "failed to import affiliations")
}

if err := json.NewEncoder(os.Stdout).Encode(res); err != nil {
Expand All @@ -234,7 +241,7 @@ func cmdUpdateAFNames(c *cli.Context) error {
return nil
}

func cmdSubstitute(c *cli.Context) error {
func cmdSubstitutes(c *cli.Context) error {
sub := c.String(subTypeFlag.Name)
old := c.String(oldValFlag.Name)
new := c.String(newValFlag.Name)
Expand All @@ -246,18 +253,12 @@ func cmdSubstitute(c *cli.Context) error {
db := getDBOrFail()
defer db.Close()

res, err := data.UpdateDeveloperProperty(db, sub, old, new)
res, err := data.SaveAndApplyDeveloperSub(db, sub, old, new)
if err != nil {
return errors.Wrap(err, "failed to update names from apache foundation")
}

m := make(map[string]interface{})
m["updated"] = res
m["substitution_type"] = sub
m["old_value"] = old
m["new_value"] = new

if err := json.NewEncoder(os.Stdout).Encode(m); err != nil {
if err := json.NewEncoder(os.Stdout).Encode(res); err != nil {
return errors.Wrapf(err, "error encoding list: %+v", res)
}

Expand Down
100 changes: 63 additions & 37 deletions cmd/cli/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,51 +86,77 @@ var (
Usage: "List data query operations",
Subcommands: []*cli.Command{
{
Name: "developers",
Usage: "List developers",
Action: cmdQueryDevelopers,
Flags: []cli.Flag{
developerLikeQueryFlag,
queryLimitFlag,
},
},
{
Name: "developer",
Usage: "Get specific CNCF developer details, identities and associated entities",
Action: cmdQueryDeveloper,
Flags: []cli.Flag{
ghUserNameQueryFlag,
},
},
{
Name: "entities",
Usage: "List entities (companies or organizations with which users are affiliated)",
Action: cmdQueryEntities,
Flags: []cli.Flag{
entityLikeQueryFlag,
queryLimitFlag,
Name: "developer",
Usage: "List developer operations",
Aliases: []string{"d"},
Subcommands: []*cli.Command{
{
Name: "list",
Usage: "List developers",
Aliases: []string{"l"},
Action: cmdQueryDevelopers,
Flags: []cli.Flag{
developerLikeQueryFlag,
queryLimitFlag,
},
},
{
Name: "detail",
Usage: "Get specific developer details, identities and associated entities",
Aliases: []string{"d"},
Action: cmdQueryDeveloper,
Flags: []cli.Flag{
ghUserNameQueryFlag,
},
},
},
},
{
Name: "entity",
Usage: "Get specific CNCF entity and its associated developers",
Action: cmdQueryEntity,
Flags: []cli.Flag{
entityNameQueryFlag,
Name: "entity",
Usage: "List entity operations",
Aliases: []string{"e"},
Subcommands: []*cli.Command{
{
Name: "list",
Usage: "List entities (companies or organizations with which users are affiliated)",
Aliases: []string{"l"},
Action: cmdQueryEntities,
Flags: []cli.Flag{
entityLikeQueryFlag,
queryLimitFlag,
},
},
{
Name: "detail",
Usage: "Get specific entity and its associated developers",
Aliases: []string{"d"},
Action: cmdQueryEntity,
Flags: []cli.Flag{
entityNameQueryFlag,
},
},
},
},
{
Name: "repositories",
Usage: "List GitHub org/user repositories",
Action: cmdQueryOrgRepos,
Flags: []cli.Flag{
orgNameFlag,
Name: "org",
Usage: "List GitHub org/user operations",
Aliases: []string{"o"},
Subcommands: []*cli.Command{
{
Name: "repos",
Usage: "List GitHub org/user repositories",
Action: cmdQueryOrgRepos,
Flags: []cli.Flag{
orgNameFlag,
},
},
},
},
{
Name: "events",
Usage: "List GitHub repository events",
Action: cmdQueryEvents,
Name: "events",
Usage: "List GitHub events",
Aliases: []string{"e"},
Action: cmdQueryEvents,
Flags: []cli.Flag{
orgNameFlag,
repoNameFlag,
Expand Down Expand Up @@ -339,7 +365,7 @@ func cmdQueryOrgRepos(c *cli.Context) error {
return errors.Wrap(err, "failed to get GitHub token")
}

if org == "" && token == "" {
if org == "" || token == "" {
return cli.ShowSubcommandHelp(c)
}

Expand Down
Loading

0 comments on commit 68a2973

Please sign in to comment.