Skip to content
This repository has been archived by the owner on Oct 23, 2019. It is now read-only.

MAX_SIZE minor bug fix #99

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

parasteh
Copy link

@parasteh parasteh commented Dec 5, 2018

Hi,
Thanks to your work, I just used it and noticed there is a minor problem with max_size, actually, it did not work and files with size larger than I defined just downloaded. So after a while, I figured out there should be a comparison with media.document.size and max_size. I did it, hope it works

Again thank you, It saved me a lot of hours.
BR

Copy link
Collaborator

@Lonami Lonami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo the .gitignore change and confirm my question about None. Besides that, looks good!

.gitignore Outdated
@@ -85,6 +85,9 @@ target/
# pyenv
.python-version

#pycharm
.idea
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This belongs in your global .gitignore, so please undo this change.

@@ -80,7 +80,8 @@ def _check_media(self, media):
"""
Checks whether the given MessageMedia should be downloaded or not.
"""
if not media or not self.max_size:
# It is needed to chek the size with the max_size defined in config file
if not media or media.document.size > self.max_size:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check at least twice the Telethon documentation to make sure document is not None and size is not None (or they can't be None, not present at all)? We don't want an error here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it, let you know the result

Copy link
Author

@parasteh parasteh Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some points need consideration, Media has multiple types and it is not easy to get the size of each object, I mean it is depended to the Type and calling media.document.size will only work for document type and no other. I think we need to use polymorphism to do that. each type has to implement an interface for general attributes like size, so calling something like media.getsize() return the size of the file regardless of the type of it,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each type has to implement an interface for general attributes like size

More like some utils method get_size(media), unless you're willing to do this change in Telethon which would be by far harder :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I found an easier way to do that, but not sure yet, so far I realized that just document types have size attribute, so when calling export_utils.get_media_type(media) returns "document" we can refer to media.document.size and compare it to the max_size,
What's your idea?

Copy link
Author

@parasteh parasteh Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in utils there is a method named get_file_location(media):, So already it has been implemented, I will use it and after test I will commit the code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add another function to utils.py to get_file_size for media, comparing media type with isinstance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its done !, how can I pull the changes ?

@parasteh
Copy link
Author

parasteh commented Dec 5, 2018

Sorry, there is a problem for media types that are not document, I made a mistake!,
Cause I need it for document types. I will fix it and commit the changes

…xsize which is defined by user, and in some cases there were a minor bug with filename, it threw error when file names consist os restricted characters like :, I fixed it
Copy link
Author

@parasteh parasteh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the gitignore, I finished the code to consider the maxfile size, it is now working, I have tested it for both new exporting files as well as download-past-media option, Also there was a minor filename issue with restricted characters like ':' I solve it by adding a pice of code from here filename

Copy link
Collaborator

@Lonami Lonami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a careful review there are a lot other things to consider and irrelevant changes that do not belong here.


def format_filename(s):
"""Take a string and return a valid filename constructed from the string.
Uses a whitelist approach: any characters not present in valid_chars are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formatting is terrible and not consistent with the rest at all. A simple:

"""Removes invalid file characters from the input string"""

Would be enough.

an invalid filename.

"""
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about people using Chinese, Japanese, Persian or any other language that don't have ASCII characters in the file name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I had to consider that, Thank you for the point, I will fix it

"""
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
filename = ''.join(c for c in s if c in valid_chars)
filename = filename.replace(' ', '_') # I don't like spaces in filenames.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about personal opinions in the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just used the code I refered, So I will write my own code to fix it again, Thank you for your comments :)


"""
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
filename = ''.join(c for c in s if c in valid_chars)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A far better option is to use a regex and replace all "invalid" characters with _

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def format_filename(s): """Removes invalid file characters from the input string""" return re.sub(r'[\\/:"*?<>|]', '_', s)
what about this code ? this restriction is only for windows file system, is it necessary to make it conditional upon os type ? I mean only do it for windows os

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code is much better, I would not normally use those characters in any filename either way. But, I still think this should be a separate pull request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither, I think for some reasons it uses time for the file name! so the ':' in time format caused the exception for me! the code I have submitted is working and using it downloaded around 12000 docs from two separated group

valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
filename = ''.join(c for c in s if c in valid_chars)
filename = filename.replace(' ', '_') # I don't like spaces in filenames.
return filename
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at the end of the file.

@@ -259,3 +260,20 @@ def parse_proxy_str(proxy_str):
else:
proxy = (proxy_type, host, port)
return proxy


def format_filename(s):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is irrelevant to the pull request and should not belong here. If you want to make this change use a new pull request.

@@ -254,6 +267,9 @@ def _get_name(self, peer_id):
if not ext:
ext = export_utils.get_extension(media_row[4])

if isinstance(filename, str):
filename = export_utils.format_filename(filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is irrelevant to the pull request and should not belong here. If you want to make this change use a new pull request.

Copy link
Author

@parasteh parasteh Dec 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please let me know how to do this, I think I have to create a new branch and commit this part of code there and then create a new pull request? right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a new branch is a bit harder because you have to revert the changes you have made in master first. Easiest way for you is:

  1. Wait until MAX_SIZE bug fix is merged.
  2. Delete your fork.
  3. Fork again.
  4. Make new changes.

But you can look up online how to do it without deleting the fork if you want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -82,9 +81,18 @@ def _check_media(self, media):
"""
if not media or not self.max_size:
return False

if not self.types:
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bypasses the max_size check. Are you sure this is the expected behaviour?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is if you check the config file, it says "Setting to "0" will not download any media."
So I think there is no problem with this one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not what I meant. I mean, if there are no types (i.e. "all media is valid"), the maximum size is not checked at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, So the file size should be checked before this line!

telegram_export/downloader.py Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants