New package metric: README length/complexity #1691
Replies: 3 comments 1 reply
-
CocoaPods used this to score their README files: http://clayallsopp.github.io/readme-score/ It wasn't perfect, and I remember being a little frustrated that it scored some README files that I considered good so low, but it's worth bearing in mind. |
Beta Was this translation helpful? Give feedback.
-
Bit of natural language processing to extract helpful bits of information 😂 Might be separate to this, but perhaps we could track README language too? I know it was proposed in the past (though I can't find the issue) with us support a language selector for READMEs, where available, but we discussed the fact there isn't a standard/official approach for this. |
Beta Was this translation helpful? Give feedback.
-
I think the better README is the shortest one possible, I also don't think it is possible to access complexity e.g. Complex problem solution will have more complex README, and will be "worse" in ranking that a simple problem having a complex solution, because the complexity won't account for the multiplier that is a problem complexity itself. Other example, people who managed to describe something complex in one header and two paragraphs should win a Nobel prize but instead will have their packages ranked lower than ones who just put all their unstructured thoughts in many paragraphs. So it will promote lesser quality packages (might be even harmful) giving then big advantage over ones that are a quality solutions. |
Beta Was this translation helpful? Give feedback.
-
There are many ways we could score a “good” README, but I’d suggest we start with a fairly crude calculation.
We should store a
readme_score
in addition to thescore
against each package. Thatreadme_score
can then count towards a package’s score calculation.A simple
readme_score
could consist of:I’d suggest we “band” each of these, for example, 1-2 headings, 3-5 headings, 5+ headings, rather than counting anything directly. That will cut down on the impact that things like flooding a README with 40 paragraphs of Lorem ipsum.
We could also use something like the Flesch–Kincaid readability tests to give a better sense of the wording inside the README, but I’d suggest we start with some simple counts.
Beta Was this translation helpful? Give feedback.
All reactions