This is the code for the paper "Predicting Risk of Acute Lung Injury Using Distant Supervision". Documentation will be added eventually. To run the code, one must have access to the MIMIC dataset (have a username and password) + high memory runtime. Also, if you plan on running it on Google Collab, please set a GPU runtime (not a TPU) as the Spark connection to Java servers occaionally does not work for some reason (something about memory usage). If this happens, try restarting the kernel. More details will be added soon. Also, please keep in mind that the performance of many models vary each runtime as there is not a random seed set.