Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output structured diff results #34

Closed
EyalLavi opened this issue Feb 16, 2019 · 7 comments · Fixed by #92
Closed

Output structured diff results #34

EyalLavi opened this issue Feb 16, 2019 · 7 comments · Fixed by #92
Assignees
Labels
awaiting-code-review feature New functionality - requires a feature branch
Milestone

Comments

@EyalLavi
Copy link
Contributor

In addition to a single WER metric (#32), return the results of the diff metrics (s/d/i/c) in a structured format.

Something like:

{
"hypothesis-file": "file1.txt",
"diff-results":{"substitution":5, "deletions":10, "insertions":1, "correct":37}
}

Include vendor name? See also #33

@EyalLavi EyalLavi added feature New functionality - requires a feature branch for discussion Further information is requested labels Feb 16, 2019
@EyalLavi EyalLavi added this to the Release 1 milestone Feb 16, 2019
@amessina71
Copy link
Contributor

This is a good idea

@EyalLavi
Copy link
Contributor Author

EyalLavi commented Mar 5, 2019

There will other metrics. Each metric should have its own structure.

{
"hypothesis": "machine-transcript.txt",
"reference":"human-transcript.txt",
"metrics":[
  {"type":"word-diff", "result":{"substitution":5, "deletions":10, "insertions":1, "correct":37}},
  {"type":"wer", "result":0.76}
]
}

@EyalLavi EyalLavi removed the for discussion Further information is requested label Mar 5, 2019
@EyalLavi
Copy link
Contributor Author

EyalLavi commented Mar 6, 2019

@MikeSmithEU in the future we may report metrics that are not numbers, so perhaps we should make all values strings, like:

{"type":"word-diff", "result":{"substitution":"5", "deletions":"10", "insertions":"1", "correct":"37"}},
  {"type":"wer", "result":"0.76"}

This way we leave the interpretation to the client.

@MikeSmithEU
Copy link
Contributor

Indeed @EyalLavi Yet when the type fits a js type I would not pass it as a string, but the type itself.
In practice this means supporting: array, object, int, float, string, null/undefined (would use null, never undefined).

@EyalLavi
Copy link
Contributor Author

The discussion above revolved around the summary of the diff results. I would like to add the diffs themsleves to the scope. This will allow apps to do their own processing on the results. How do we output those? Standard diff format wrap in the json? Or separately as standard diff?

@EyalLavi
Copy link
Contributor Author

EyalLavi commented May 21, 2019

For Release 1, output the worddiffs metric. Something like this:

{
	"worddiffs":
	[{
	"ref":"hello",
	"hyp":"hello" 	         //match
	},
	{
	"ref":"world",
	"hyp":null		//deletion
	},
	{
	"ref":null,
	"hyp":"foo"		//insertion
	},
	{
	"ref":"bar",
	"hyp":"gar"		//replacement
	}]
}

For future releases, consider using the canonical transcript format.

@EyalLavi
Copy link
Contributor Author

EyalLavi commented Jul 9, 2019

Discussion with @MikeSmithEU today:

  • Output JSON format is metric-dependent
  • Currently, -o json only

Example:

--worddiffs -o json

Produces:

[
    {"metric": "worddiffs",
    "result": [
        {
            "hypothesis": "this",
            "type": "equal",
            "reference": "this"
        },
        {
            "hypothesis": "is",
            "type": "equal",
            "reference": "is"
        },
        {
            "hypothesis": null,
            "type": "delete",
            "reference": "reference"
        }
     ]
}
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-code-review feature New functionality - requires a feature branch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants