updated SAR2SAR notebook to provide more detail about the algorithm a…

…nd its preprocessing steps
Drew-1771 · Feb 8, 2024 · 90dbb7e · 90dbb7e
1 parent 16f0a3d
commit 90dbb7e
Show file tree

Hide file tree

Showing 5 changed files with 136 additions and 20 deletions.
diff --git a/SAR2SAR.ipynb b/SAR2SAR.ipynb
@@ -61,7 +61,7 @@
  "id": "vJfhq_QbGvkg"
  },
  "source": [
- "## 2. Set up the data enviornment\n",
+ "## 2. Set up the data environment\n",
  "The model.py contains the run_model function. Here is the docstring for the run_model function:\n",
  " \n",
  " Runs the despeckling algorithm\n",
@@ -75,7 +75,7 @@
  " stride: U-Net is scanned over the image with a default stride of 64 pixels when the image dimension\n",
  " exceeds 256. This parameter modifies the default stride in pixels. Lower pixel count = higher quality\n",
  " results, at the cost of higher runtime\n",
- " store_noisy: Whether to store the \"noisy\" or input in the save_dir. Default is True\n",
+ " store_noisy: Whether to store the \"noisy\" or input in the save_dir. Default is False\n",
  " generate_png: Whether to generate PNG of the outputs in the save_dir. Default is True\n",
  " debug: Whether to generate print statements at runtime that communicate what is going on\n",
  "\n",
@@ -98,15 +98,75 @@
  "current_dir = Path(os.getcwd())\n",
  "\n",
  "# set the path of the input and save directories\n",
- "example_input_dir = str(current_dir / \"src\" / \"test_data\" / \"grd_test_data\")\n",
+ "example_input_dir = str(current_dir / \"src\" / \"test_data\" / \"example_test_data\")\n",
  "example_save_dir = str(current_dir / \"example_output\")\n",
  "\n",
  "# set the path of your own input and save directory\n",
- "my_input_dir = str()\n",
- "my_save_dir = str()\n",
+ "input_dir = str(current_dir / \"my_data\" / \"input\")\n",
+ "save_dir = str(current_dir / \"my_data\" / \"results\")\n",
  "\n",
- "print(f\"Input directory set to: {my_input_dir}\")\n",
- "print(f\"Save directory set to: {my_save_dir}\")"
+ "print(f\"Input directory set to: {input_dir}\")\n",
+ "print(f\"Save directory set to: {save_dir}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Converting .tif to .npy\n",
+ "If your data is in .tif form, and you would like to run this algorithm, you will need to convert it to .npy (and subsequently convert it back to tif when it is done, though this can be more complicated based on your input and how you want your results). An easy way to do that is with the rasterio python library and this function, which converts all single band .tif and .TIF files in the input_dir directory to .npy files. Multi-band rasters are more complicated, and should be split up into single band rasters on your own terms."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import rasterio\n",
+ "import numpy as np\n",
+ "\n",
+ "# file extensions\n",
+ "tif_extensions = [\".tif\", \".TIF\"]\n",
+ "npy_extensions = [\".npy\"]\n",
+ "\n",
+ "def tifToNpy(input_dir: str, extensions: list=[\".tif\", \".TIF\"]) -> None:\n",
+ " \"\"\"\n",
+ " Converts the files in the input_dir directory with the given extensions to .npy files\n",
+ "\n",
+ " Arguments:\n",
+ " input_dir: Path to the input directory\n",
+ " extensions: list of valid extensions for files\n",
+ " Returns:\n",
+ " None\n",
+ " \"\"\"\n",
+ " input_dir = Path(input_dir)\n",
+ " # get each .tif/.TIF file in the input_dir directory\n",
+ " for file in input_dir.iterdir():\n",
+ " if not file.is_dir() and file.suffix in extensions:\n",
+ " # open the file and read the data\n",
+ " with rasterio.open(str(file)) as src:\n",
+ " data = src.read()\n",
+ " # save as a .npy\n",
+ " if data.shape[0] == 1:\n",
+ " # if there is only one band\n",
+ " path_to_output = file.with_suffix(\".npy\")\n",
+ " np.save(path_to_output, np.squeeze(data))\n",
+ " else:\n",
+ " # if there are multiple bands, this introduces many complications when trying to re-combine later. Everyone's setup and needs\n",
+ " # are different so you will have to write your own code to handle these cases. Here is an example of what it could look like\n",
+ " \"\"\"\n",
+ " for i in range(data.shape[0]):\n",
+ " filename = str(Path(file.name).with_suffix(\"\"))\n",
+ " path_to_output = file.with_name(filename + f\"_B{i}\").with_suffix(\n",
+ " \".npy\"\n",
+ " )\n",
+ " np.save(path_to_output, np.squeeze(data[i]))\n",
+ " \"\"\"\n",
+ " raise ValueError(\"Multiple bands, please split up into single band rasters for easier processing\")\n",
+ "\n",
+ "# convert tif files in the input directory to .npy\n",
+ "tifToNpy(input_dir, tif_extensions)"
  ]
  },
  {
@@ -116,8 +176,8 @@
  },
  "source": [
  "## 3. Run the example model\n",
- "Run the example model to make sure that everything has been installed correctly and is ready to run. This code was originally written for Tensorflow V1, so the tensorflow library will throw a lot of warnings.\n",
- "When the model is done, you should see a folder named example_output with noisy_ and denoised_ files inside of it.\n",
+ "Run the example model to make sure that everything has been installed correctly and is ready to run. **This code was originally written for Tensorflow V1, so the tensorflow library will throw a lot of warnings.**\n",
+ "When the model is done, you should see a folder named example_output with the results\n",
  "\n",
  "***The model will print this line when it has finished:***\n",
  "\n",
@@ -153,7 +213,53 @@
  "outputs": [],
  "source": [
  "tf.compat.v1.reset_default_graph()\n",
- "run_model(my_input_dir, my_save_dir)"
+ "run_model(input_dir, save_dir)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5. Convert the data back to .tif\n",
+ "If your data was in .tif form, you probably want your results to be in .tif form too. Converting back can be more complicated because of the tif's associated metadata. Here is a simple approach that uses the original .tif as a mirror for the metadata of the original file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def npyToTif(processed_npy_file: str, metadata_mirror: str) -> None:\n",
+ " \"\"\"\n",
+ " Converts the processed .npy file back to .tif using the metadata from the metadata mirror tif,\n",
+ " the original .tif before processing\n",
+ "\n",
+ " Arguments:\n",
+ " processed_npy_file: Path to the processed .npy file\n",
+ " metadata_mirror: Path to the original .tif file this .npy file as generated from. The metadata mirror\n",
+ " allows rasterio to write the metadata of the original .tif onto the despeckled result.\n",
+ " Returns:\n",
+ " None\n",
+ " \"\"\"\n",
+ " # open the orignal tif to get its metadata\n",
+ " with rasterio.open(metadata_mirror) as src:\n",
+ " # rename the file\n",
+ " filename = str(Path(processed_npy_file).with_suffix(\"\").name)\n",
+ " path_to_output = Path(processed_npy_file).with_name(\"denoised_\" + filename).with_suffix(\".tif\")\n",
+ " # open the new denoised_ .tif and write the .npy w/ the mirror metadata\n",
+ " with rasterio.open(path_to_output, \"w\", **src.meta) as dst:\n",
+ " dst.write(np.stack([np.load(processed_npy_file)]))\n",
+ "\n",
+ "# get the original files and the despeckled results and search for matches on file name\n",
+ "original_files, despeckled_files = [file for file in Path(input_dir).iterdir() if not file.is_dir() and file.suffix in tif_extensions], [file for file in Path(save_dir).iterdir() if not file.is_dir() and file.suffix in npy_extensions]\n",
+ "for i in range(len(despeckled_files)):\n",
+ " for j in range(len(original_files)):\n",
+ " # if the file names are the same, convert .npy to .tif\n",
+ " if despeckled_files[i].with_suffix(\"\").name == original_files[j].with_suffix(\"\").name:\n",
+ " print(f\"Converting {despeckled_files[i].name} to tif with metadata mirror at {original_files[j].name} to {'denoised_' + original_files[j].name}\")\n",
+ " npyToTif(str(despeckled_files[i]), str(original_files[j]))\n",
+ " "
  ]
  }
  ],

diff --git a/requirements.txt b/requirements.txt
@@ -1,4 +1,5 @@
 absl-py==2.1.0
+affine==2.4.0
 anyio==4.2.0
 argon2-cffi==23.1.0
 argon2-cffi-bindings==21.2.0
@@ -14,6 +15,9 @@ cachetools==5.3.2
 certifi==2024.2.2
 cffi==1.16.0
 charset-normalizer==3.3.2
+click==8.1.7
+click-plugins==1.1.1
+cligj==0.7.2
 colorama==0.4.6
 comm==0.2.1
 debugpy==1.8.0
@@ -81,12 +85,14 @@ pyasn1==0.5.1
 pyasn1-modules==0.3.0
 pycparser==2.21
 Pygments==2.17.2
+pyparsing==3.1.1
 python-dateutil==2.8.2
 python-json-logger==2.0.7
 pywin32==306
 pywinpty==2.0.12
 PyYAML==6.0.1
 pyzmq==25.1.2
+rasterio==1.3.9
 referencing==0.33.0
 requests==2.31.0
 requests-oauthlib==1.3.1
@@ -98,6 +104,7 @@ scipy==1.12.0
 Send2Trash==1.8.2
 six==1.16.0
 sniffio==1.3.0
+snuggs==1.4.7
 soupsieve==2.5
 stack-data==0.6.3
 tensorboard==2.15.1

diff --git a/src/model.py b/src/model.py
@@ -166,7 +166,7 @@ def run_model(
  save_dir: str,
  checkpoint_dir: str = None,
  stride=64,
- store_noisy=True,
+ store_noisy=False,
  generate_png=True,
  debug=True,
 ) -> None:
@@ -182,7 +182,7 @@ def run_model(
  stride: U-Net is scanned over the image with a default stride of 64 pixels when the image dimension
  exceeds 256. This parameter modifies the default stride in pixels. Lower pixel count = higher quality
  results, at the cost of higher runtime
- store_noisy: Whether to store the "noisy" or input in the save_dir. Default is True
+ store_noisy: Whether to store the "noisy" or input in the save_dir. Default is False
  generate_png: Whether to generate PNG of the outputs in the save_dir. Default is True
  debug: Whether to generate print statements at runtime that communicate what is going on
 

diff --git a/src/test_data/example_test_data/Alpes1_21.npy b/src/test_data/example_test_data/Alpes1_21.npy
diff --git a/src/utils.py b/src/utils.py
@@ -67,27 +67,30 @@ def save_sar_images(
  generate_png = False
  print("\t[!] Threshold calculated to be 0, could not store as PNG properly")
 
- denoisedfilename = Path(save_dir) / str("denoised_" + imagename)
+ denoisedfilename = Path(save_dir) / str(imagename)
  if not denoisedfilename.exists():
  denoisedfilename.touch(exist_ok=True)
  denoisedfilename = str(denoisedfilename)
  np.save(denoisedfilename, denoised)
  if debug:
- print(f"\t[*] Saved to {denoisedfilename}")
+ print(f"\t[*] Saved denoised file to {denoisedfilename}")
  if generate_png:
  store_data_and_plot(denoised, threshold, denoisedfilename)
  if debug:
- print(f"\t[*] Saved png of {denoisedfilename.replace('npy', 'png')}")
+ print(
+ f"\t[*] Saved png of denoised file to {denoisedfilename.replace('npy', 'png')}"
+ )
 
+ noisyfilename = Path(save_dir) / str("noisy_" + imagename)
  if store_noisy:
- noisyfilename = Path(save_dir) / str("noisy_" + imagename)
  if not noisyfilename.exists():
  noisyfilename.touch(exist_ok=True)
  noisyfilename = str(noisyfilename)
  np.save(noisyfilename, noisy)
  if debug:
  print(f"\t[*] Saved to {noisyfilename}")
- if generate_png:
- store_data_and_plot(noisy, threshold, noisyfilename)
- if debug:
- print(f"\t[*] Saved png of {noisyfilename.replace('npy', 'png')}")
+ if generate_png:
+ noisyfilename = str(noisyfilename)
+ store_data_and_plot(noisy, threshold, noisyfilename)
+ if debug:
+ print(f"\t[*] Saved png of {noisyfilename.replace('npy', 'png')}")