This project was created as a learning experience to explore computer vision and AI technologies, specifically working with object detection and segmentation models. It served as a practical way to gain hands-on experience with Python backend development while implementing modern AI capabilities. The application provides a basic interface for interactive object detection using FastSAM (Fast Segment Anything Model). Since the emphasis was on backend development and AI implementation, I chose to use JSX over TypeScript to keep the frontend simple and minimize development overhead.
- Basic drag and drop or file selection for image upload
- Interactive object selection by clicking on the image
- Display of cropped object images
- API response time tracking
├── fe/ # Frontend directory
│ ├── src/
│ │ ├── components/
| | |── styles/
│ │ ├── utils/
│ └── package.json
└── be/ # Backend directory
├── main.py
├── models/ # AI model files
|── utils/
└── requirements.txt
- Navigate to the backend directory:
cd be
- Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install the required Python packages:
pip install flask flask-cors Pillow numpy ultralytics lapx
- Download the FastSAM model:
-
Create a
models
directory in the backend folder:mkdir models
-
Download the FastSAM-s or FastSAM-x model. If you choose to use FastSAM-x, change
model = FastSAM("models/FastSAM-s.pt")
tomodel = FastSAM("models/FastSAM-x.pt")
in be/main.py file. -
FastSAM-s from https://github.com/ultralytics/assets/releases/download/v8.2.0/FastSAM-s.pt
-
FastSAM-x from https://github.com/ultralytics/assets/releases/download/v8.2.0/FastSAM-x.pt
-
Place the downloaded file in the
be/models/
directory
-
- Navigate to the frontend directory:
cd fe
- Install the dependencies:
npm install
- Make sure you're in the backend directory and your virtual environment is activated
- You may choose to start the app without warmup the model. In that case, change
is_warmup_model = True
toFalse
in be/main.py file. - Run the Flask application:
python main.py
The backend server will start on http://localhost:5000
- In a new terminal, navigate to the frontend directory
- Start the Vite development server:
npm run dev
The frontend will be available at http://localhost:5173
- Open your browser and go to
http://localhost:5173
- Upload an image using drag & drop or the file selector
- Click the "Select Object" button
- Click on any object in the image you want to detect
- The application will display the cropped images of detected objects
- Use the "Cancel" button to exit selection mode
- You can select a new file to process at any time
- Basic error handling in frontend
- Minimalistic UI with potential bugs
- No loading states for failed API requests
- Limited input validation
- Memory usage not optimized for large images
- No progress indicators for model processing
- Basic CORS configuration
- No input size limitations implemented
Accepts JSON payload with:
x
: X-coordinate of the clicked pointy
: Y-coordinate of the clicked pointimage
: Base64 encoded image data
Returns:
- Array of detected object images in base64 format
- 200 status code on success
- Error messages with appropriate status codes on failure
- This is an experimental project focused on learning computer vision and AI implementation
- The primary focus was on the backend and AI integration, with the frontend serving as a basic testing interface
- The frontend is built with React and Vite
- The backend uses Flask and CORS for API handling
- Temporary files are automatically cleaned up after processing
- Code structure is focused on functionality rather than production-ready features
- Node.js v18.16.0
- Python v3.12
If you encounter any issues:
- Ensure all required Python packages are installed correctly
- Verify that the FastSAM model file is present in
be/models/
- Check that the CORS origins in
main.py
match your frontend URL - For large images, you might need to increase your system's available memory
- If the API returns errors, check the browser console and Python logs for details
This is an experimental project, but suggestions and improvements are welcome. Feel free to fork and experiment with the code.
This project is intended for experimental and educational purposes. Feel free to use the code as you see fit.
Project created by Ido Zamir. Feel free to reach out through GitHub if you have questions about the project or want to contribute.
This application is for development and learning purposes only. It is not intended for production use and comes with no warranties or guarantees.