Improve path management #22

iannesbitt · 2023-05-17T13:39:34Z

When handling file paths, keep in mind that when this code is run across different nodes, the local file system might not be shared. On Delta, each node has access to it's own local system, and some access to a shared network. Reading and writing speeds tend to be faster when files are on the node that is doing the work, rather than on the shared filesystem. So in many cases, we copy smaller files on to each node before starting a job.

Let's take home = os.path.expanduser('~').replace('\\', '/') in __init__ as an example.

The os.path.expanduser('~') function returns the path of the current user's home directory. On a cluster, different nodes might have different filesystems. The user's home directory might not be the same on all nodes, or might not even exist on some nodes.

Even if all nodes have the same filesystem, the user's home directory might not be shared across all nodes. If the home directory is not shared, a file that is written to the home directory on one node will not be available on other nodes.

Since we need to access and save files to a specific path, it would be safer to use an absolute path to a directory that we know exists and is shared across all nodes in the cluster.

replacing '' with '/' might not always give the correct results, especially if we want the code to be cross-platform. Better to use pathlib.

In parts of viz-points, the input path is used to construct the output path. However, we will need the ability to write files to a different location than the location from where we read them in. In other words, the base path for input files could differ from the base path for output files. For example, sometimes we read input files from the shared network, but write files to the local node that is processing the output (to speed up writing).

Originally posted by @robyngit in #12 (review)

The text was updated successfully, but these errors were encountered:

iannesbitt · 2023-07-06T15:38:27Z

Reopening as not all points have been fully addressed. I need to know where we usually store parallel processing artifacts in order to make paths more cluster-friendly.

iannesbitt added the enhancement New feature or request label May 17, 2023

iannesbitt self-assigned this May 17, 2023

iannesbitt added a commit that referenced this issue Jun 29, 2023

returning pathlib.Path instead of string (#22)

ed3b871

iannesbitt closed this as completed Jul 5, 2023

iannesbitt added this to the 0.0.2 milestone Jul 5, 2023

iannesbitt mentioned this issue Jul 5, 2023

Release 0.0.2 of LiDAR tiling workflow #12

Merged

iannesbitt removed this from the 0.0.2 milestone Jul 6, 2023

iannesbitt reopened this Jul 6, 2023

mbjones added this to VizWorkflow and Visualization Workflow Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve path management #22

Improve path management #22

iannesbitt commented May 17, 2023

iannesbitt commented Jul 6, 2023

Improve path management #22

Improve path management #22

Comments

iannesbitt commented May 17, 2023

iannesbitt commented Jul 6, 2023