修复下载逻辑,避免重复下载并改进错误处理机制和路径管理 | Fix Photo Download Logic to Avoid Duplicate Downloads and Improve Error Handling and Path Management #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request 描述(中文):
在此 Pull Request 中,我对下载逻辑、错误处理和路径管理进行了优化,确保图片文件不会重复下载并正确存储在指定目录下。主要更改包括:
避免重复下载:
通过检查 BauduPhoto 目录下的文件名与从 JSON 文件中提取的文件名,防止重复下载已存在的文件。
改进错误处理:
添加了重试机制:对于 IncompleteRead 或请求失败等网络错误,程序将进行最多 3 次重试,每次重试间隔 5 秒。
为网络请求添加了 30 秒的超时机制,以防止程序因网络问题而长时间挂起。
路径管理改进:
新增了 sanitize_filename 方法,确保文件名中的非法字符(如 :、/ 等)被正确替换,防止文件写入到错误的路径或根目录。
按照日期在 BauduPhoto 目录下创建子文件夹,将图片按日期分类存储,确保文件有序管理。
这些更改提高了下载过程的效率和可靠性,同时避免了重复操作和文件路径问题。
Pull Request Description (English):
In this Pull Request, I have optimized the photo download logic, error handling, and path management to ensure that files are not downloaded multiple times and are correctly stored in the specified directory. Key changes include:
Avoiding Duplicate Downloads:
The code now checks the filenames in the BauduPhoto directory against those extracted from the JSON files, preventing the download of files that already exist.
Improved Error Handling:
Added a retry mechanism: If an IncompleteRead or network request failure occurs, the program will retry up to 3 times with a 5-second delay between attempts.
A 30-second timeout has been added to network requests to prevent the program from hanging due to network issues.
Path Management Improvements:
Introduced the sanitize_filename function to ensure that illegal characters (e.g., :, /, etc.) in filenames are properly replaced, preventing files from being written to incorrect locations or root directories.
Files are organized by date in subfolders within the BauduPhoto directory, ensuring that photos are stored in an orderly fashion.
These changes improve the efficiency and reliability of the download process, while avoiding redundant operations and ensuring proper file path management.