New package metric: README length/complexity #1691

daveverwer · 2022-04-12T15:31:00Z

daveverwer
Apr 12, 2022
Maintainer

There are many ways we could score a “good” README, but I’d suggest we start with a fairly crude calculation.

We should store a readme_score in addition to the score against each package. That readme_score can then count towards a package’s score calculation.

A simple readme_scorecould consist of:

Number of headings
Number of code blocks
Number of images
Number of paragraphs/total number of words

I’d suggest we “band” each of these, for example, 1-2 headings, 3-5 headings, 5+ headings, rather than counting anything directly. That will cut down on the impact that things like flooding a README with 40 paragraphs of Lorem ipsum.

We could also use something like the Flesch–Kincaid readability tests to give a better sense of the wording inside the README, but I’d suggest we start with some simple counts.

daveverwer · 2022-04-12T15:37:08Z

daveverwer
Apr 12, 2022
Maintainer Author

CocoaPods used this to score their README files: http://clayallsopp.github.io/readme-score/

It wasn't perfect, and I remember being a little frustrated that it scored some README files that I considered good so low, but it's worth bearing in mind.

1 reply

daveverwer Apr 12, 2022
Maintainer Author

That score does have an HTTP API: https://github.com/clayallsopp/readme-score-api

If we even decide to use this, we should talk to the owner and check in on how they feel about it. I don't want to rack up someone's Heroku bill because we're constantly asking them to score README files!

Sherlouk · 2022-04-12T15:59:52Z

Sherlouk
Apr 12, 2022
Collaborator Sponsor

Bit of natural language processing to extract helpful bits of information 😂

Might be separate to this, but perhaps we could track README language too?

I know it was proposed in the past (though I can't find the issue) with us support a language selector for READMEs, where available, but we discussed the fact there isn't a standard/official approach for this.

0 replies

MaximBazarov · 2022-10-28T08:44:04Z

MaximBazarov
Oct 28, 2022

I think the better README is the shortest one possible, I also don't think it is possible to access complexity e.g.

Complex problem solution will have more complex README, and will be "worse" in ranking that a simple problem having a complex solution, because the complexity won't account for the multiplier that is a problem complexity itself.

Other example, people who managed to describe something complex in one header and two paragraphs should win a Nobel prize but instead will have their packages ranked lower than ones who just put all their unstructured thoughts in many paragraphs.

So it will promote lesser quality packages (might be even harmful) giving then big advantage over ones that are a quality solutions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New package metric: README length/complexity #1691

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

New package metric: README length/complexity #1691

daveverwer Apr 12, 2022 Maintainer

Replies: 3 comments · 1 reply

daveverwer Apr 12, 2022 Maintainer Author

daveverwer Apr 12, 2022 Maintainer Author

Sherlouk Apr 12, 2022 Collaborator Sponsor

MaximBazarov Oct 28, 2022

daveverwer
Apr 12, 2022
Maintainer

Replies: 3 comments 1 reply

daveverwer
Apr 12, 2022
Maintainer Author

daveverwer Apr 12, 2022
Maintainer Author

Sherlouk
Apr 12, 2022
Collaborator Sponsor

MaximBazarov
Oct 28, 2022