Skip to content
Josh Burke edited this page May 9, 2019 · 34 revisions

Welcome

Welcome to the wiki! This wiki offers an in-depth view of Hack4Impact’s initiatives this semester in partnership with GlobalGiving. This page will work as an index, linking off to subpages describing each individual approach to the problem. Check out the sidebar for quick-links to the pages offered on this wiki.

Problem

With many potential contacts to make, GlobalGiving needs to be able to make an informed choice when contacting nonprofits to bring into their network. GlobalGiving’s network consists of many organizations based in the US along with some nonprofits in other countries. However, the process of finding and applying to GlobalGiving remains significantly easier within the United States. In certain countries, factors including lack of internet connectivity and lack of access to documents required by GlobalGiving’s vetting process has led to slower onboarding and discovery. As a result, GlobalGiving aims to use data science techniques to preemptively find and reach out to nonprofits around the world in an attempt to streamline the process of acquiring and benefitting more nonprofits outside of the country.

In late 2018, Hack4Impact provided a solution for GlobalGiving which obtained basic information about many new organizations which are not yet a part of GlobalGiving’s network. Now, GlobalGiving aims to fill out these records with more detailed information about the work they do and for whom in order to further streamline the process of benefitting these organizations. Whereas last semester’s problem was more about discovering the breadth of organizations around the world, this semester is about depth.

Approaches

Along with the many algorithms we provide in this repo, we also spent some time developing a new categorization scheme that takes into account statistic trends in the data and implements mechanisms which enforce consistency of these trends. This idea came about in trying to imagine an ideal categorization scheme for classifying new NGOs.


Classification:

Classification algorithms are one way to characterize the work of unknown, new NGOs. By feeding in summary text of an NGO to a properly trained classifier model, you can obtain a set of categories that describes that NGO with some degree of accuracy.


Clustering:

Clustering algorithms offer the possibility of generating new sets of categories, or better understanding of the connections and similarities between NGOs. K-Means with Document Embeddings seem to be the most promising initiative in this category.

We attempted to design a Semi-Supervised LDA algorithm based on an article published online, but were unable to get the code to run. Here is the article for reference.


Data Processing/Visualization:

Most processing was involved with obtaining data and seeing what it looked like, along with getting it into a form we could analyze. Preprocessing like TF-IDF scoring and count vectorizing are not included here, but stock SKLearn preprocessors were used in many algorithms.


Conclusions

Include, $\forall$ sections of the approaches, suggestions on ways to move forward. Talk about new/data-driven categories Idk exactly what else, but what did we learn?

Clone this wiki locally