Various code for mammography data (including mammography image type derivation and ROI extraction)
MammoROI
Description: The MammoROI code is used to identify image type, special views, and extract ROI coordinates of images from cohorts Google Drive link: https://docs.google.com/document/d/1ytwhTBsamVCoEyJM22xBvLxR3chpXnEgv4eyPYY04Yw/edit?usp=sharing
Libraries: dataframe_processing - functions for dataframe manipulations roi_extractions - functions for roi extraction
dataframe_processing Functions:
load_simple_df(): loads metadata csvs as pandas dataframes with only necessary columns
Input(s): df_path, df_type=None df_path: path to cohort csv e.g.: ‘/mnt/PACSNAS/…/...cohort_X.csv’ metadata_anon_cohort_X.csv All anonymized data metadata_orig_and_anon_cohort_X.csv Same as metadata_anon_cohort_X.csv but contains original PHI information df_type: default None, optional ‘ROI_merge’, ‘ROI_extract’ None: reads just the columns necessary for image type and special view derivations ‘ROI_extract’: reads just the columns necessary for ROI extraction ‘ROI_merge’: reads just the columns necessary for ROI dataframe merging Output(s): loaded pandas dataframe on a specified variable Sample Usage: cohort_1_df = load_simple_df(‘/path/.../metadata_anon_cohort_1 .csv’)
correct_root_paths(): replaces png_path roots to correct ones (if needed)
Input(s): dataframe, root_a, root_b dataframe: loaded pandas dataframe with file paths column ‘png_path’ root_a: root path to be replacee e.g. ‘/opt/ssd-data/’ root_b: root path to replace with e.g. ‘/mnt/PACS_NAS1/’ Output(s): Pandas dataframe with new columns [‘corrected_png_path’, ‘folder_path’, ‘filename’] ‘corrected_png_path’: new file paths with root_a changed to root_b ‘folder_path’: folder path to the accession/instance ‘filename’: png filename Sample Usage: cohort_1_df = correct_paths_PACS(cohort_1_df, ‘/opt/ssd-data/’, ‘/mnt/PACS_NAS1/’)
derive_imgType(): derives/identifies image types and lateralities Input(s): df_in df_in: dataframe with columns [‘SeriesDescription’ , ‘ImageLaterality’] Loaded variable with load_simple_df(‘/path/.../metadata_anon_cohort_1 .csv’) Output(s): df_out df_out: dataframe with new columns added [‘LateralityDeriveFlag’, ‘ImageLateralityFinal’, ‘FinalImageType’] ‘LateralityDeriveFlag’: was the laterality derived from the series description with the code or was laterality already available 0 or 1 ‘ImageLateralityFinal’: the final laterality L or R No bilaterals (‘B’) ‘FinalImageType’: the final image type cview, 2D, 3D, ROI_SSC, ROI_SS, other Other includes bilateral types Sample Usage: cohort_1_df = derive_imgType(cohort_1_df)
get_spotmag_flags(): derives/identifies special images Input(s): df_in df_in: dataframe with columns [‘0_ViewCodeSequence_0_ViewModifierCodeSequence_CodeMeaning’] Loaded variable with load_simple_df(‘/path/.../metadata_anon_cohort_1 .csv’) Output(s): df_out df_out: dataframe with new column added [‘spot_mag’] ‘spot_mag’: is the image a special view or not 1 or NaN Sample Usage: cohort_1_df = get_spotmag_flags(cohort_1_df)
match_roi(): matches the correct ROIs to the actual mammogram Input(s): main_df, roi_df main_df: dataframe with metadata columns and [‘png_path’] roi_df: output dataframe from ROI extraction with column [‘Matching_Mammo’, ‘ROI_coord’] Output(s): df_out df_out: dataframe with new column added [‘ROI_coord’] ‘ROI_coord’: is the ROI coordinates of the corresponding mammogram or screen save in their size [1251, 1536, 1354, 1856] (one list) for one ROI OR [[154, 1567, 264, 1897], [1251, 1536, 1354, 1856]] list of lists for multiple. Sample Usage: main_df= match_roi(main_df, roi_df)
replace_old_png_path(): removes and replaces the ‘png_path’ with the correct path Input(s): df_in df_in: dataframe with columns [‘png_path’] and [‘corrected_png_path’] Output(s): df_out df_out: dataframe with [‘png_path’] replaced with [‘corrected_png_path’]’s values Sample Usage: cohort_1_df = replace_old_png_path(cohort_1_df)
make_screensave_dict(): makes a dictionary of screensave filenames and paths for ROI extraction Input(s): df_in df_in: dataframe with columns [‘ROI_SSC’], [‘folder_path’], and [‘filename’] Output(s):out_dict out_dict: dictionary with key value pairs of {filename: folder_path} Sample Usage: out_dict = make_screensave_dict(ROI_SSC_df)
read_df(): reads the ‘final’ metadata csv and loads the ROI_coord column as a list not a string Input(s): path_to_csv path_to_csv: path to the final metadata csv with all metadata with the lateralities, spotmags, and merged ROI coordinates Output(s): df_out df_out: with final metadata Sample Usage: cohort_1_df = read_df(‘./.../.../metadata_cohort_1_ROI.csv’)
Tensorflow Object Detection Install: https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html
Installing Object Detection API: https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#downloading-the-tensorflow-model-garden
Install protobuf: https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#protobuf-installation-compilation
In the end, a root folder should have: ‘models’ folder: created from the object detection api install ‘OD_Files’ folder: a provided folder with parameters for object detection model ‘training’ folder: a provided folder with the trained model and checkpoints
roi_extractions Functions: First need to start a new instance of the class: ROI = ROI_extraction(‘.../.../path to root directory with tensorflow object detections) e.g.: ROI = ROI_extraction('/home/jupyter-jjjeon3')
run_extractions(): loads metadata csvs as pandas dataframes with only necessary columns
Input(s): ss_dict ss_dict: dictionary output from make_screensave_dict() from dataframe_processing Output(s): ROI_coords_ssc, ROI_coords_mammo, ROI_matching_ssc, ROI_matching_mammo ROI_coords_ssc: list of screen save coordinates (in screen save size) ROI_coords_mammo: list of mammography coordinates (in mammography size and flipped accordingly) ROI_matching_ssc: the path to the screen save image for the same index of the ROI_coords_ssc ROI_matching_mammo: the path to the mammogram image for the same index of the ROI_coords_ssc Sample Usage: cohort_1_df = load_simple_df(‘/path/.../metadata_anon_cohort_1 .csv’)
Trained model weights available by request.