I gave GPT-4 eyes. "眼观六路,耳听八方"
Here’s what I did:
- added some data to a vision model
- gave the AI camera access
- asked it questions about the scene
- it identified objects
- it searched web for info
- used that info to accurately answer
Watch it get 3 questions 100% correct!
- Twitter https://twitter.com/mckaywrigley/status/1651291367224807424?s=20
- YouTube https://www.youtube.com/watch?v=w-wxguIs-0I
https://github.com/sponsors/Charmve?frequency=one-time&sponsor=Charmve
This repo was only available to my sponsors on GitHub Sponsors until I reached 15 sponsors.
Learn more about Sponsorware at github.com/sponsorware/docs 💰.
- Frontend: React
- Image Analysis API: TensorFlow Models - MobileNet
- Text Generation API: GPT API
- Clone the repository:
git clone https://github.com/Charmve/gpt-eyes.git
- Navigate to the project directory:
cd gpt-eyes
- Install dependencies:
npm install
- Create an account and obtain API keys for TensorFlow Models - MobileNet and GPT API.
- Update the configuration file with your API keys:
- TensorFlow Models - MobileNet:
/path/to/config.js
- GPT API:
/path/to/config.js
- TensorFlow Models - MobileNet:
- Start the development server:
npm start
- Open your browser and visit:
http://localhost:3000
- Device camera analyses an image.
- The application uses TensorFlow Models - MobileNet API to analyze the image and extract object information.
- The application sends the analyzed object information to the GPT API.
- The GPT API generates text describing the analyzed object.
- The application displays the analyzed image and the generated text.