Generating package summaries with GPT #2368

daveverwer · 2023-05-02T14:22:42Z

daveverwer
May 2, 2023
Maintainer

We are not trying to find a use for GPT in SPI, but there are a couple of uses (one in this thread and one in #2369) where it could be useful.

The first and most practical one is in README summarisation. GPT is good at summarising text, and we don’t have good, concise descriptions for most packages in the index. We have package name and sometimes a one sentence description, and then the full README file which is too long and often contains much more than a package description.

Having reliably good summaries would be very useful for package search results, category and author lists, and also in a slightly redesigned package page metadata area. It would also give us another relevant field for search results without bringing in the entire README file.

It’s important to note that if we do include package summaries as metadata, they would be overridable by humans via a new the .spi.yml file. We would always prioritise human entered text over AI-generated.

We’re not quite at “New Issue“ with it yet, but we’re convinced enough that it’s worth a discussion.

Costs

Running v3 queries is very cheap and with some back of the envelope calculations we could summarise every README in the current index for under $50. We then would only need to change the summary when we see README changes, and at most something like once a week.

GPT4 queries are much more expensive, so the results would need to be much better. TBD when we get the new API key.

Results

These results are were made using the text-davinci-003 model. We are on the waiting list for a GPT-4 API key. The full text of a README was entered without any pre-processing and processed with the following prompt:

README_FILE describes a software library. Summarise README_FILE in one paragraph with no more than 40 words. Establish the library’s purpose but do not begin the paragraph with “The purpose of” or “Summary:”. Do not include details of installation or compatibility. Do not include license information. Don’t include code listings. Your response must not include any of these words: iOS, macOS, tvOS, watchOS, Linux, CocoaPods, Carthage, Swift Package Manager, SPM.`

README_FILE

Results follow this post.

Alternatives

We did some experimentation with the Kagi summariser but it produces summaries that are too long and always include installation instructions. A problem that GPT had too, but could be worked around with the prompt.

daveverwer · 2023-05-02T14:26:52Z

daveverwer
May 2, 2023
Maintainer Author

Results

All results are an unprocessed README file copied and pasted from the rendered Markdown with the prompt above. No editing has been done to the results.

Alamofire

Alamofire is an HTTP networking library written in Swift that provides chainable request / response methods, Swift concurrency support, authentication, and more. It is built on top of URLSession and provides features such as response validation, response caching and authentication.

SemanticVersion

SemanticVersion is a struct for representing and comparing semantic versions.

Satin

Satin is a 3D graphics framework built on Apple’s Metal that provides helpful classes and abstractions to quickly and easily create meshes, materials, and pipelines, and render them on screen or to textures. It also offers a Metal Shader Compiler and fast raycasting via Bounding Volume Hierachies.

FlyingFox

FlyingFox is a lightweight HTTP server built using Swift Concurrency, featuring non blocking BSD sockets and a range of handlers, routes, and websockets for powerful request handling.

ArgumentParser

Swift Argument Parser is a library for easily building command-line tools in Swift, providing argument parsing, help messages, and error handling. It uses a declarative approach and property wrappers to quickly define commands and their behavior.

Swift Crypto

Swift Crypto is an open-source implementation of a portion of the CryptoKit API suitable for use on Linux platforms, providing safe APIs that abstract over the complexity of modern cryptographic primitives.

0 replies

Gerzer · 2023-05-02T16:32:25Z

Gerzer
May 2, 2023

If you go forward with this, then it must be opt-in, not opt-out, for package maintainers. I personally would be very upset if I learned that descriptions that may or may not contain confidently wrong statements had been automatically generated for my packages without asking me first.

2 replies

daveverwer May 2, 2023
Maintainer Author

I can’t say whether it will be opt-in or opt-out right now, but we take your point seriously. We actually just talked a little about opt-in/opt-out in our next podcast episode that we recorded today and which will be released on Thursday if you’d like to listen.

I believe the chances of “confidently wrong” answers are much lower than with other GPT responses because we’re not asking the API to generate original writing. We’re passing in README content and asking it to refine it. I’m not saying that it couldn’t happen that it could get things wrong, but from my experimentation so far the chances are low.

Of course, we will look at significant numbers of results carefully before making any decisions on if/when/how we roll a feature like this out.

—

I just ran your README files through the API and here are the results:

OnboardingKit

A library to simplify the implementation of onboarding flows, providing a declarative API for quickly setting up and running onboarding processes.

JSONParser

JSONParser provides an elegant and effortless type-safe way to interact with JSON data on multiple platforms.

Existential Graphs Foundation

Allows for the manipulation of relations and objects to create a logical graph structure.

Logic Parser

A Swift library for parsing formal-logic expressions, allowing users to quickly and easily interpret and evaluate logical statements.

It doesn’t do very well with logic parser or Existential Graphs Foundation but it’s working off quite a small amount of text in the original readme files. I’m sure it would give better results with a little more input data.

Finally, as mentioned above, we’ll also provide an override in the .spi.yml.

daveverwer May 2, 2023
Maintainer Author

It doesn’t do very well with logic parser or Existential Graphs Foundation but it’s working off quite a small amount of text in the original readme files. I’m sure it would give better results with a little more input data.

Thinking about this a little more, there's a good way to avoid any issues with it trying to generate the "40 words" that the prompt asks for if there are less than 40 words in the source text. We just don't run it on short README files and the full text of the README becomes the summary.

Sherlouk · 2023-05-02T18:38:03Z

Sherlouk
May 2, 2023
Collaborator Sponsor

Naturally running READMEs which are entirely unstructured and unstandardised through an LLM is going to result in some unexpected cases.

I'm not sure if we have analytics for this, but I've definitely seen packages with no README, packages with READMEs in different languages (such as Chinese), boilerplate READMEs (either with no information in, or just the standard "This is a Swift Package" template). There are READMEs with FAQs in, TODO lists, version history and testimonials, links to podcast episodes and books. There is so much variety here and areas where GPT could get confused.

But no system is perfect, and I think I've seen enough examples here (and separately) where it's output very strong descriptions even in some of these tough scenarios. I think the prompt could be tweaked over time to wean out bad results. But I think there's also stuff we can do with the implementation to make it safer including filtering out bad READMEs, and potentially starting with only the top 500 packages (based on internal scoring) to test it out.

I also think introducing a button to the UI to report a package summary could be handy, one which simply opens a GitHub issue with a pre-filled form. This would help surface these bad results so we, as a community, can re-assess and make changes to the prompt, to the README, and if necessary override using the SPI YML.

2 replies

Sherlouk May 2, 2023
Collaborator Sponsor

To be clear, on the whole I am very supportive of this and think it will provide a tonne of value to the community to make search results better and make it easier to find the right package.

daveverwer May 3, 2023
Maintainer Author

These are great points, thanks James!

I just ran a your examples through to see how it did and these were the results:

SnapshotTesting:

SnapshotTesting is a delightful Swift library for snapshot testing to compare values as images, text, data or any diffable format and is highly configurable for device-agnostic snapshots.

Roadmap:

Roadmap enables users to publish their roadmap inside an app and vote for upcoming features without having to create a backend.

The best package ever… LeftPad:

LeftPad is a sample package to demonstrate additional metadata to a Package.swift file, providing a leftPad extension to String.

We also chatted about opentelemetry-swift after you posted this, which contains only a very short package description followed by lots of potentially distracting data:

opentelemetry-swift is an OpenTelemetry client library for Swift, providing protocols and implementations for producing and collecting telemetry data.

I’m constantly surprised by how good they are.

heckj · 2023-05-06T19:32:07Z

heckj
May 6, 2023
Collaborator Sponsor

Heard about this conversation through the SPI podcast - wanted to chime in. I think using it to create a 2-3 sentence summary for a package might be a fantastic use case, specifically to supplement search results. That said, I think the summary result should be something that a package developer can override - perhaps a couple sentence summary in the structure of the SPI.yml file? I also think the field should be clearly identified - maybe with a background color, or a caption-like subtitle underneath it (my design skills suck, so I'll defer to nearly anyone else about good affordances here). The idea being that when generated by an LLM, the summary should be highlighted as dynamic content.

And somewhat related, using LLM generation to deal with stemming issues in the search is absolutely brilliant. A longer summary - as a generated and stored text field within the index, and then in turn used with the Postgres style search mechanism (the classic stemmer/ranking mechanisms) - would, I suspect, improve ranked relevance for those quirky results where we've got two or more words CamelCased together that are easy to visually disambiguate, but which that stemmer fails with - and extending that system in Postgres is an unfortunate challenge.

I don't know the details about ChatGPT as a tool - but I'm wondering if perhaps it's also multi-lingual (Whisper, also by OpenAI clearly is). If so, then that could also potentially up the value of summarized search results if we can ask it translate into English for the search-only-summary view. I wouldn't want to display that translated/search-focused result - but I suspect it may solve the issue of getting a reasonable summary from some of the packages that have a README in another language (the ones I remember being Mandarin - but it wouldn't surprise me to see Japanese or various European languages either). There's some downsides with that - but I think overall the results from search would be notably improved.

4 replies

daveverwer May 8, 2023
Maintainer Author

Thanks for the feedback, Joe!

That said, I think the summary result should be something that a package developer can override - perhaps a couple sentence summary in the structure of the SPI.yml file?

Absolutely. I mentioned this in the original post. The description field in the SPI manifest would be essential before we do this.

I don't know the details about ChatGPT as a tool - but I'm wondering if perhaps it's also multi-lingual (Whisper, also by OpenAI clearly is). If so, then that could also potentially up the value of summarized search results if we can ask it translate into English for the search-only-summary view. I wouldn't want to display that translated/search-focused result - but I suspect it may solve the issue of getting a reasonable summary from some of the packages that have a README in another language (the ones I remember being Mandarin - but it wouldn't surprise me to see Japanese or various European languages either). There's some downsides with that - but I think overall the results from search would be notably improved.

This is a very good point. Certainly GPT is "fluent" (whatever that means in this context! 😂) in multiple languages and I just had it answer a question in Mandarin and it did well (I only checked the result with Google Translate, it would need better testing than I can do).

I think Mandarin is the only non-english language I've seen in README files, can you think of any other examples?

daveverwer May 8, 2023
Maintainer Author

I can't quite believe this, but I just ran it through for LJTool and Srt2BilibiliKit with no modifications to the prompt and it did an outstanding job. Yes, it translates the summary into English, but a modification to the prompt would fix that.

For LJTool:

LJTool is a library containing useful extensions for the development process, such as color schemes, control initialization, image creation and button content arrangement.

I've not used LJTool, but translating the first paragraphs of the README with Gooigle Translate, that summary seems good. The Google Translate text was this:

During the development process, there are some commonly used extensions that are used every time. It is a bit annoying to copy and copy every time, so they are packaged into a tool library and managed by pod, which is much more convenient. LJTool is mainly divided into three parts:

The color matching class encapsulates some commonly used colors.
Control initialization, encapsulates the creation of controls and common property settings, and adds some placeholders
Others, including the creation of pictures and the content arrangement of buttons

For Srt2BilibiliKit:

Srt2BilibiliKit is a command line program that converts .srt subtitle files into Bilibili danmaku.

The word danmaku seems odd in this context, but from the Google Translate below, I guess it's coming from "barrage".

Running the relevant bit of the README through Google Translate gave this:

A command-line program that sends .srt subtitle files to bilibili as a barrage.

daveverwer May 8, 2023
Maintainer Author

I just had a quick try to get the prompt to summarise without changing the language and this worked well:

README_FILE describes a software library. Summarise README_FILE in one paragraph with no more than 40 words. Establish the library’s purpose but do not begin the paragraph with “The purpose of” or “Summary:”. Do not include details of installation or compatibility. Do not include license information. Don’t include code listings. Your response must not include any of these words: iOS, macOS, tvOS, watchOS, Linux, CocoaPods, Carthage, Swift Package Manager, SPM. Use the original language and do not translate to English.

Added words are in bold.

The output of the same input text for LJTool was then:

LJTool是一个封装的工具库，可以简化开发过程中的常用扩展，包括配色类、控件初始化和其它，提供便捷的使用。

Which, passed through Google Translate, is:

LJTool is an encapsulated tool library, which can simplify common extensions in the development process, including color matching classes, control initialization and others, and provide convenient use.

That is remarkable 🤯

heckj May 9, 2023
Collaborator Sponsor

Mandarin was the only other language I saw in READMEs (while doing some of that earlier search work), although honestly I expected to find some Japanese if I kept looking, but the packages I've found from folks in Japan appear to write their READMEs in English.

I've no idea how much the quality of training and weights for the underlying LLM makes a difference in practical outputs, but one of the ideas for cost reduction that I had yesterday was to potentially use one of the "open source clones" - there's quite a number following Meta's Llama weights disclosure, although not as neatly packaged with an explicit API. But as a potential upside, they can run in a single computer and not require external hosting, with the main constraint I've seen being some published models being licensed for non-commercial use only.

daveverwer · 2023-05-16T18:19:48Z

daveverwer
May 16, 2023
Maintainer Author

Finally got access to a GPT-4 API key and did some more testing with this tonight. The results are different, but still very good. There's much more structure to the prompt, which is great. All of these results were generated from raw, unprocessed markdown files so we would not need to render or transform anything. This is with a smaller word limit than above, but it really struggles with word limits 😂

Prompt

You are a technical editor that summarises Markdown input.
You are an expert in the Swift programming language.
All input will be README files from library packages.
Your task is to summarise and determine the library’s purpose.
It is more important to be brief than to summarise everything.
It is CRUCIAL to strictly adhere to a maximum of 30 words in all generated content, without exceptions.
NEVER begin the paragraph with “The purpose of” or “Summary:”.
NEVER include "is a library", "is a package", or "is a tool".
NEVER include details of installation or compatibility.
NEVER include license information.
NEVER include code listings.
NEVER include any of these words: iOS, macOS, tvOS, watchOS, Linux, CocoaPods, Carthage, Swift Package Manager, SPM.

Output

Alamofire

Alamofire is an elegant Swift-based HTTP networking library featuring chainable request/response methods, Swift concurrency support, JSON parameter encoding, and more.

SemanticVersion

SemanticVersion is a Codable, Comparable, Equatable, Hashable, and LosslessStringConvertible struct for representing and comparing semantic versions in Swift.

Satin

Satin is a 3D graphics framework for Metal API, inspired by threejs, offering classes for meshes, materials, shaders, geometries, and more. It enables complex graphics tasks with reduced boilerplate code.

FlyingFox

FlyingFox is a lightweight, concurrent HTTP server using non-blocking BSD sockets and Swift Concurrency, supporting static files, websockets, and proxying.

swift-argument-parser

Swift Argument Parser is a library for parsing command-line arguments, providing automatic help generation, error handling, and customizable property wrappers.

swift-crypto

Swift Crypto is an open-source library providing cross-platform cryptographic functionality, including key exchange, key derivation, encryption, decryption, hashing, and message authentication. It shares a common API with Apple CryptoKit.

SwiftSoup

SwiftSoup is a pure Swift library for working with real-world HTML, offering a convenient API for extracting and manipulating data using DOM, CSS, and jQuery-like methods. It implements the WHATWG HTML5 specification and parses HTML similarly to modern browsers.

CleverBird

CleverBird is a Swift package for convenient interaction with OpenAI's chat APIs, offering token counting, encoding, and seamless integration with the TokenEncoder class. It focuses on chat-based interactions and simplicity.

2 replies

daveverwer May 16, 2023
Maintainer Author

Costs for this may be more reasonable than we thought, too. I submitted about 50 API requests with the Alamofire README file (which is quite large) and then a few more with the examples above and it cost less than $2.

Extrapolating that for 5,500 packages would mean we'd be up against a ~$200 cost for initial ingestion and probably $15-$20/month ongoing.

Sherlouk May 16, 2023
Collaborator Sponsor

Very exciting. I'd love to see this added, think it'd be huge for search/discoverability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating package summaries with GPT #2368

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Generating package summaries with GPT #2368

daveverwer May 2, 2023 Maintainer

Costs

Results

Alternatives

Replies: 5 comments · 10 replies

daveverwer May 2, 2023 Maintainer Author

Results

Alamofire

SemanticVersion

Satin

FlyingFox

ArgumentParser

Swift Crypto

Gerzer May 2, 2023

daveverwer May 2, 2023 Maintainer Author

daveverwer May 2, 2023 Maintainer Author

Sherlouk May 2, 2023 Collaborator Sponsor

Sherlouk May 2, 2023 Collaborator Sponsor

daveverwer May 3, 2023 Maintainer Author

heckj May 6, 2023 Collaborator Sponsor

daveverwer May 8, 2023 Maintainer Author

daveverwer May 8, 2023 Maintainer Author

daveverwer May 8, 2023 Maintainer Author

heckj May 9, 2023 Collaborator Sponsor

daveverwer May 16, 2023 Maintainer Author

Prompt

Output

daveverwer May 16, 2023 Maintainer Author

Sherlouk May 16, 2023 Collaborator Sponsor

daveverwer
May 2, 2023
Maintainer

Replies: 5 comments 10 replies

daveverwer
May 2, 2023
Maintainer Author

Gerzer
May 2, 2023

daveverwer May 2, 2023
Maintainer Author

daveverwer May 2, 2023
Maintainer Author

Sherlouk
May 2, 2023
Collaborator Sponsor

Sherlouk May 2, 2023
Collaborator Sponsor

daveverwer May 3, 2023
Maintainer Author

heckj
May 6, 2023
Collaborator Sponsor

daveverwer May 8, 2023
Maintainer Author

daveverwer May 8, 2023
Maintainer Author

daveverwer May 8, 2023
Maintainer Author

heckj May 9, 2023
Collaborator Sponsor

daveverwer
May 16, 2023
Maintainer Author

daveverwer May 16, 2023
Maintainer Author

Sherlouk May 16, 2023
Collaborator Sponsor