A little Python application to generate pictures from a text prompt.
Based on Stable Diffusion.
Report a bug or request a feature
This repository is tested on Python 3.7+ and PyTorch LTS 1.8.2. It works only on Nvidia graphics cards and CUDA should be installed.
You should install Picture Machine in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide. First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install PyTorch. Please refer to PyTorch installation page regarding the specific install command for your platform.
When PyTorch is installed, 🤗 Transformers can be installed using pip as follows:
pip install transformers
You can refer to the repository of 🤗 Transformers for more information.
Then you will need to install diffusers using pip as follows:
pip install diffusers
Finally you will need to install PySide6, a port of QT for Python used for the graphic interface. It can be installed using pip as follows:
pip install pyside6
Optional but recommended, to have faster generation, you can also install scipy and ftfy:
pip install scipy ftfy
Follow the instructions above then clone the repo (git clone https:://github.com/torresflo/Picture-Machine.git
).
You can now run main.py
.
The image generation and if it is possible or not will depends of your hardware. This project has been tested on a Nvidia RTX 3070 with 8Gb of VRAM. With this hardware, it allows to generate images with a size of 768 x 768 pixels in around 15 seconds.
Enter your prompt and then press "generate" to generate an image. You can then click on the image to save it if you want to.
Image size (width and height):
These are some recommendations to choose good image sizes:
- Make sure height and width are both multiples of 8.
- Going below 768 might result in lower quality images.
- Going over 768 in both directions will repeat image areas (global coherence is lost).
- The best way to create non-square images is to use 768 in one dimension, and a value larger than that in the other one.
Iteration steps:
You can change the number of inference steps using the this parameter. In general, results are better the more steps you use. Stable Diffusion, being one of the latest models, it works great with a relatively small number of steps (default is 50). If you want faster results you can use a smaller number here.
Guidance scale:
The last parameter is the guidance_scale. It is a way to increase the adherence to your prompt as well as overall sample quality. In simple terms, it forces the generation to better match with your prompt. Numbers like 7 or 8.5 give good results, if you use a very large number the images might look good, but will be less diverse.
Seed:
If you want deterministic output you can set a random seed that will be given to the image generator. Every time you use the same seed and the same prompt, you will get the same image.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the GNU General Public License v3.0. See LICENSE
for more information.