Skip to content

Commit

Permalink
Merge pull request #1069 from NANDAGOPALNG/main
Browse files Browse the repository at this point in the history
Added new project called "Turn Images Into Story"
  • Loading branch information
UTSAVS26 authored Nov 4, 2024
2 parents 1e0fcf9 + 16b79da commit 588da58
Show file tree
Hide file tree
Showing 3 changed files with 129 additions and 0 deletions.
75 changes: 75 additions & 0 deletions Generative-AI/Turn Pictures Into Story/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Turn Images into Story with AI


## Project Overview

This project uses artificial intelligence to generate stories from images. The goal is to create a system that can take an image as input and produce a coherent and engaging story as output.

## How it Works

**Image Input**: The user uploads an image to the system.

**Image Analysis**: The system uses computer vision techniques to analyze the image and extract relevant features such as objects, scenes, and actions.

**Story Generation**: The system uses a natural language processing (NLP) model to generate a story based on the extracted features.

**Post-processing**: The system refines the generated story to ensure coherence, grammar, and readability.

**Features**

**Image Upload**: Users can upload images in various formats (e.g., JPEG, PNG, GIF).

**Story Generation**: The system generates a story based on the uploaded image.

**Story Customization**: Users can customize the story by selecting from various genres, tones, and styles.
*
**Image-Story Pairing**: The system allows users to view the original image alongside the generated story.

## Technical Requirements

**Programming Languages**: Python 3.x, JavaScript (for frontend)

**Frameworks**: TensorFlow, Keras (for image analysis and NLP), React (for frontend)

**Libraries**: OpenCV, Pillow (for image processing), NLTK, spaCy (for NLP)

**Hardware**: GPU acceleration recommended for faster image processing and story generation

**Installation**

**Clone the repository**: git clone https://github.com/NANDAGOPALNG/ML-Nexus/tree/main/Generative%20Models/Turn%20Images%20Into%20Story%20with%20AI

**Install dependencies**: pip install -r requirements.txt

**Set up the environment**: python setup.py

**Run the application**: python app.py

## Usage

Upload an image to the system.
Select the desired story genre, tone, and style (optional).
Click the "Generate Story" button.
View the generated story alongside the original image.

## Contributing

Contributions are welcome! If you'd like to contribute to this project, please:

Fork the repository.
Create a new branch for your feature or bug fix.
Commit your changes with a descriptive message.
Open a pull request.

## License
This project is licensed under the MIT License. See the LICENSE file for details.

## Acknowledgments

We acknowledge the contributions of the open-source community and the developers of the libraries and frameworks used in this project.

## Future Work

Improving Story Quality: Refine the NLP model to generate more coherent and engaging stories.
Expanding Image Support: Add support for more image formats and sizes.
User Feedback Mechanism: Implement a feedback system to allow users to rate and improve the generated stories.
49 changes: 49 additions & 0 deletions Generative-AI/Turn Pictures Into Story/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import torch
import gradio as gr
from PIL import Image
import scipy.io.wavfile as wavfile

# Use a pipeline as a high-level helper
from transformers import pipeline

device = "cuda" if torch.cuda.is_available() else "cpu"

# model_path = ("../Models/models--Salesforce--blip-image-captioning-large"
# "/snapshots/2227ac38c9f16105cb0412e7cab4759978a8fd90")
#
# tts_model_path = ("../Models/models--kakao-enterprise--vits-ljs/snapshots"
# "/3bcb8321394f671bd948ebf0d086d694dda95464")

caption_image = pipeline("image-to-text",
model="Salesforce/blip-image-captioning-large", device=device)

narrator = pipeline("text-to-speech",
model="kakao-enterprise/vits-ljs")

# caption_image = pipeline("image-to-text",
# model=model_path, device=device)
#
# narrator = pipeline("text-to-speech",
# model=tts_model_path)

def generate_audio(text):
# Generate the narrated text
narrated_text = narrator(text)

# Save the audio to a WAV file
wavfile.write("output.wav", rate=narrated_text["sampling_rate"],
data=narrated_text["audio"][0])
# Return the path to the saved audio file
return "output.wav"


def caption_my_image(pil_image):
semantics = caption_image(images=pil_image)[0]['generated_text']
return generate_audio(semantics)

demo = gr.Interface(fn=caption_my_image,
inputs=[gr.Image(label="Select Image",type="pil")],
outputs=[gr.Audio(label="Image Caption")],
title="@GenAILearniverse Project 8: Image Captioning",
description="THIS APPLICATION WILL BE USED TO CAPTION THE IMAGE.")
demo.launch()
5 changes: 5 additions & 0 deletions Generative-AI/Turn Pictures Into Story/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
transformers
torch
gradio
scipy
phonemizer

0 comments on commit 588da58

Please sign in to comment.