Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Support clean expired log #3212

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

horizonzy
Copy link
Contributor

The kernel lib support clean expired logs.

@vkorukanti
Copy link
Collaborator

vkorukanti commented Aug 16, 2024

Hi @horizonzy, thank you for the PR! Could you please rebase this PR? Once rebased we can review and get it merged. There has been a formatting change since the PR posted. Refer here on how to reformat the changes.

@horizonzy
Copy link
Contributor Author

I fixed the conflict, but the code style does not suit the rules, I have no idea how to fix it. The kernel-checkstyle.xml is not suitable for the IntelliJ idea settings.

@vkorukanti
Copy link
Collaborator

I fixed the conflict, but the code style does not suit the rules, I have no idea how to fix it. The kernel-checkstyle.xml is not suitable for the IntelliJ idea settings.

can you try the instructions here?

@horizonzy
Copy link
Contributor Author

I fixed the conflict, but the code style does not suit the rules, I have no idea how to fix it. The kernel-checkstyle.xml is not suitable for the IntelliJ idea settings.

can you try the instructions here?

Thanks!

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. One comment to address keeping the last checkpoint.

logger.info(
"{}: Starting the deletion of log files older than {}", tablePath, fileCutOffTime);
int numDeleted = 0;
try (CloseableIterator<FileStatus> files =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check there atleast one checkpoint to after the latest deleted file to make sure the table is still time travelable.

008.json
009.json
010.json
010.checkpoint.parquet
011.json
012.json
013.json
014.json
...
020.json
020.checkpoint.parquet

If the retention selection 8 to 13.json files, we should only delete up to the 10.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants