Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new project called "Turn Images Into Story" #1069

Merged
merged 4 commits into from
Nov 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions Generative-AI/Turn Pictures Into Story/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Turn Images into Story with AI


## Project Overview

This project uses artificial intelligence to generate stories from images. The goal is to create a system that can take an image as input and produce a coherent and engaging story as output.

## How it Works

**Image Input**: The user uploads an image to the system.

**Image Analysis**: The system uses computer vision techniques to analyze the image and extract relevant features such as objects, scenes, and actions.

**Story Generation**: The system uses a natural language processing (NLP) model to generate a story based on the extracted features.

**Post-processing**: The system refines the generated story to ensure coherence, grammar, and readability.

**Features**

**Image Upload**: Users can upload images in various formats (e.g., JPEG, PNG, GIF).

**Story Generation**: The system generates a story based on the uploaded image.

**Story Customization**: Users can customize the story by selecting from various genres, tones, and styles.
*
**Image-Story Pairing**: The system allows users to view the original image alongside the generated story.

## Technical Requirements

**Programming Languages**: Python 3.x, JavaScript (for frontend)

**Frameworks**: TensorFlow, Keras (for image analysis and NLP), React (for frontend)

**Libraries**: OpenCV, Pillow (for image processing), NLTK, spaCy (for NLP)

**Hardware**: GPU acceleration recommended for faster image processing and story generation

**Installation**

**Clone the repository**: git clone https://github.com/NANDAGOPALNG/ML-Nexus/tree/main/Generative%20Models/Turn%20Images%20Into%20Story%20with%20AI

**Install dependencies**: pip install -r requirements.txt

**Set up the environment**: python setup.py

**Run the application**: python app.py

## Usage

Upload an image to the system.
Select the desired story genre, tone, and style (optional).
Click the "Generate Story" button.
View the generated story alongside the original image.

## Contributing

Contributions are welcome! If you'd like to contribute to this project, please:

Fork the repository.
Create a new branch for your feature or bug fix.
Commit your changes with a descriptive message.
Open a pull request.

## License
This project is licensed under the MIT License. See the LICENSE file for details.

## Acknowledgments

We acknowledge the contributions of the open-source community and the developers of the libraries and frameworks used in this project.

## Future Work

Improving Story Quality: Refine the NLP model to generate more coherent and engaging stories.
Expanding Image Support: Add support for more image formats and sizes.
User Feedback Mechanism: Implement a feedback system to allow users to rate and improve the generated stories.
49 changes: 49 additions & 0 deletions Generative-AI/Turn Pictures Into Story/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import torch
import gradio as gr
from PIL import Image
import scipy.io.wavfile as wavfile

# Use a pipeline as a high-level helper
from transformers import pipeline

device = "cuda" if torch.cuda.is_available() else "cpu"

# model_path = ("../Models/models--Salesforce--blip-image-captioning-large"
# "/snapshots/2227ac38c9f16105cb0412e7cab4759978a8fd90")
#
# tts_model_path = ("../Models/models--kakao-enterprise--vits-ljs/snapshots"
# "/3bcb8321394f671bd948ebf0d086d694dda95464")

caption_image = pipeline("image-to-text",
model="Salesforce/blip-image-captioning-large", device=device)

narrator = pipeline("text-to-speech",
model="kakao-enterprise/vits-ljs")

# caption_image = pipeline("image-to-text",
# model=model_path, device=device)
#
# narrator = pipeline("text-to-speech",
# model=tts_model_path)

def generate_audio(text):
# Generate the narrated text
narrated_text = narrator(text)

# Save the audio to a WAV file
wavfile.write("output.wav", rate=narrated_text["sampling_rate"],
data=narrated_text["audio"][0])
# Return the path to the saved audio file
return "output.wav"


def caption_my_image(pil_image):
semantics = caption_image(images=pil_image)[0]['generated_text']
return generate_audio(semantics)

demo = gr.Interface(fn=caption_my_image,
inputs=[gr.Image(label="Select Image",type="pil")],
outputs=[gr.Audio(label="Image Caption")],
title="@GenAILearniverse Project 8: Image Captioning",
description="THIS APPLICATION WILL BE USED TO CAPTION THE IMAGE.")
demo.launch()
5 changes: 5 additions & 0 deletions Generative-AI/Turn Pictures Into Story/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
transformers
torch
gradio
scipy
phonemizer