Skip to content

a delivery fare estimation script that also filters out suspicious deliveries and incorrect data entries due to faulty GPS devices.

License

Notifications You must be signed in to change notification settings

Erfanm83/delivery-fare-estimation

Repository files navigation

Delivery Fare Estimation

Project Overview

This project is designed to estimate delivery fares based on GPS data logs. The program reads GPS data for deliveries, filters out invalid points, calculates the distance using the Haversine formula, and generates fare estimates. It leverages concurrency for high performance and efficiency, making it suitable for large datasets.


Folder Structure

1. input_dataset/

This folder contains the input CSV files with the GPS data for deliveries. The input CSV file should have the following format:

id_delivery,lat,lng,timestamp
  • id_delivery: A unique identifier for each delivery.
  • lat: The latitude coordinate of the delivery point.
  • lng: The longitude coordinate of the delivery point.
  • timestamp: A UNIX timestamp representing the time of the delivery point.

Example:

1,35.706552,51.412262,1723697700
1,35.702591,51.412704,1723697730

2. output_dataset/

This folder will contain the output files generated by the program:

  • filtered_data.csv: Contains the valid delivery points after filtering.
  • fares.csv: Contains the fare estimates for each delivery, in the format id_delivery, fare_estimate.

How to Run the Project

Prerequisites

Make sure you have Go installed on your system. You can verify by running:

go version

Running the Project

  1. Place your input data file (e.g., sample_data.csv) inside the input_dataset/ folder.
  2. Run the program using the following command:
go run main.go

This will process the input data, filter out invalid points, calculate delivery fares, and generate two output files:

  • output_dataset/filtered_data.csv
  • output_dataset/fares.csv

Using Custom Datasets

If you want to use your own dataset, follow these steps:

  1. Ensure your input CSV file follows the required format: id_delivery,lat,lng,timestamp.
  2. Place the file in the input_dataset/ folder.
  3. Update the path in the main.go file to reference your dataset:
chunks, err := readDataChunks("input_dataset/your_custom_data.csv")
  1. Run the program:
go run main.go

The program will generate the filtered data and fare estimates for your dataset in the output_dataset/ folder.

Testing

Unit Test

The unit tests validate individual functions like:

  • haversine(): For distance calculation between two geographical points.
  • filterInvalidPoints(): To ensure points that exceed the speed threshold are filtered.
  • calculateFare(): For correct fare estimation based on filtered points. Run the unit tests with:
go test -v

End-to-End (E2E) Tests

The E2E tests cover the entire flow, from reading raw data, processing it, and generating output files. The E2E tests simulate how the program will behave in a real-world scenario and ensure the correct integration of all functions.

To run the end-to-end tests, use:

go test -v -run TestEndToEndFlow

This will check:

  • Reading the input dataset.
  • Filtering invalid points.
  • Calculating fares.
  • Writing the final output (fares.csv and filtered_data.csv).


Short Implementation Overview

  • Data Ingestion: The program reads GPS data in chunks from the input file.
  • Concurrency: Each chunk is processed concurrently using Go's goroutines, speeding up the filtering and fare calculation process.
  • Filtering: The program uses the Haversine formula to calculate the distance between consecutive points and filters out any points where the speed exceeds 100 km/h.
  • Fare Calculation: Fares are calculated based on the distance, time of day (daytime or nighttime rates), and idle time. The minimum fare for any delivery is set to 3.47.
  • Output: The program writes filtered data to filtered_data.csv and fare estimates to fares.csv.

Notes

  • The program is designed to handle large datasets efficiently.
  • File writing operations are synchronized using a mutex to avoid race conditions in concurrent environments.
  • The project includes comprehensive unit tests and end-to-end tests to ensure correctness and robustness.

License

This project is open-source and available for use under the MIT License.

I would be so happy to ask me about this project ! 😊

About

a delivery fare estimation script that also filters out suspicious deliveries and incorrect data entries due to faulty GPS devices.

Topics

Resources

License

Stars

Watchers

Forks

Languages