GitHub - horsehour/collegematch: College matching system, including training and prediction

#CollegeMatch

Match college is based on a simple bilinear model to make recommendations for high school students with their profiles, and colleges' multi-dimensional data. We expect that the system could help students to find the colleges that best match their preference.

Dependencies:

amqp-client-4.0.0.jar
commons-pool2-2.4.2.jar
jedis-2.9.0.jar

The training profiles include two parties: colleges and students. Both parties required to transform all the training data to numerical values. Their format are similar except their features and the dimensions.

Here are some simple examples to show what the trainning data looks like.

The input features of the collges should be preprocessed to the following format (csv):

# college_index f_1 f_2 f_3 ... f_dc

c001 0.3 0.8 0.0 ... 1.2

c002 0.9 0.2 0.7 ... 0.1

c003 1.2 0.0 0.0 ... 0.4

. . . . . .

c990 0.9 0.1 0.4 ... 0.2

The input features of the students should be preprocessed into the similar format (csv):

# student_index f_1 f_2 f_3 ... f_ds

s001 1.3 18 1.0 ... 0.2

s002 2.9 20 0.3 ... 1.4

s003 3.2 09 0.8 ... 0.5

. . . . . .

s999 0.9 10 0.0 ... 1.2

Moreover, the desired rankings of students over collges should be collected based on survey result. These preference rankings are the ground-truth labels which are the required answers in supervised machine learning. These rankings should be associated to the corresponding students' index. Here is the specification for the preference rankings:

# student_index college_first college_second ...

s020 c009 c001 c999

s029 c008 c002 c001 c909

s038 c287 c678 c21

s056 c629 c001 c200 c871 c323 c111 c002

The preference rankings over colleges provided by students are allowed different size. Some of them could just provide only two collges that (s)he most likes or dislikes. Also, (s)he can give more detailed feedback on his or her preferences over known colleges with a list of size 15.

Based on these training data, CollegeMatch will be able to train a bilinear model by transforming both the original college features and the students features to a smaller feature space. The matchness between a student and a college could be computed in this reduced dimension. Each pair of student-college will be assigned a real score, and colleges could be ranked based on these scores. The higher a student-college's score is, the higher the probability that the student likes the college in the pair.

The training algorithm is under-development, and will support much advanced functionalities and produces more accurate results with increasing numbers of high-quality training data provided by students.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
imatch		imatch
README.md		README.md
Student-College Match.pdf		Student-College Match.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

horsehour/collegematch

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages