Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TVP extractor redacted for new url schema and extract method #32028

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

bibiak1
Copy link

@bibiak1 bibiak1 commented Apr 11, 2023

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • Where I am not the original author of this code it was released under the same terms at https://github.com/yt-dlp.

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This PR switches VOD extraction to new APIs.

The original proposal added an extractor TVappIE. After alignment with the yt-dlp extractor from yt-dlp/yt-dlp#6989, there are now two new extractors, TVPStreamIE and TVPVODVideoIE, plus TVPVODSeriesIE replacing TVPWebsiteIE: thanks @selfisekai.

These changes have been applied over the aligned code:

  • use traverse_obj() for safer extraction
  • fix tests that are not blocked from UK.

Compared with the yt-dlp extractor,

  • episode number 0 in site metadata is treated as missing (None), since it seems to apply to one-off shows
  • as the yt-dl download test harness now supports test cases with variable ids, a real test has been added for TVPStreamIE.

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work !

I've made a few suggestions and will enable the CI test once you've had time to address them.

youtube_dl/extractor/extractors.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
@selfisekai
Copy link

what we do in yt-dlp is, we take the externalUid and use it with the embed extractor, which has much less strict geo-blocking.
https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/extractor/tvp.py#L492-L493

@dirkf
Copy link
Contributor

dirkf commented May 4, 2023

But see yt-dlp/yt-dlp#6987, where the external ID just gives (in Polish) "payment required".

@bibiak1, I should also have asked you to complete the "Explanation", especially since there's no related issue. Is this a new scheme or an alternative as suggested above? If the latter, it should be built into the existing TVPIE.

@bibiak1
Copy link
Author

bibiak1 commented May 6, 2023

It's a news scheme. I have checked unpayed content only. I can try to implement log-in as I was not focusing on that

@dirkf
Copy link
Contributor

dirkf commented May 7, 2023

Let's back-port the yt-dlp version that was just committed using the new scheme. I'd like try to keep the two as similar as possible. If you fancy doing that please go ahead.

The diff from yt-dlp should apply straightforwardly to the base yt-dl code; then any Python3-only syntax (yield from ... -> for from_ in ...:; yield from_, f'formatted text with {value}' ->, f'formatted text with {0}'.format(value), str -> compat_str, etc) in the new code has to be tweaked; then see if there are any features of the existing PR that should be added.

@dirkf

This comment was marked as outdated.

dirkf added 2 commits May 9, 2023 16:34
* pull changes from yt-dlp/yt-dlp#6989, thanks selfisekai
* use `traverse_obj()` for safer extraction
* fix tests that are not blocked from UK

Co-authored-by: selfisekai
Add `txt_or_none()` shim
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
@dirkf
Copy link
Contributor

dirkf commented May 9, 2023

@bibiak1, I've merged the yt-dlp changes. If you're able to check (and fix if necessary) the geo-blocked tests, that would be great.

Let me know if you want to add login code; or you could make a separate PR?

@selfisekai
Copy link

hmmmm. tested from NL. turns out https://vod.tvp.pl/filmy-dokumentalne,163/krzysztof-krawczyk--cale-moje-zycie,332512 plays in browser if logged in to a free account. the geoblock error message I'm getting when not logged in is this:

Z powodów licencyjnych materiał jest niedostępny w Twoim kraju.

Przepisy UE pozwalają abonentom, którzy w kraju zamieszkania (w tym przypadku w PL) zakupili pakiet TVP + utrzymać dostęp do materiałów wchodzących w skład pakietu, w czasie wizyty w innym kraju UE.

so I guess they wanted to implement this but someone made an assumption that an account logged in means SVOD and not logged in means AVOD? login is by cookies so --cookies-from-browser firefox worked for me on yt-dlp master

@dirkf
Copy link
Contributor

dirkf commented May 9, 2023

From UK, the extractor just sees the truthy .isGeoBlocked member of the API result. From experience, if that is ignored, the stream URLs give the special video that says "Video not available in your region".

@selfisekai
Copy link

that's on embed (tvplayer2 api), this workaround is specific to VOD playlist api

@dirkf
Copy link
Contributor

dirkf commented May 10, 2023

When I run that URL in the current PR, it's handled by TVPVODVideoIE and then handed off to TVPEmbedIE using the externalUid, where the geo-block is detected.

The data from the VOD API includes these values:

...
'countries': [{'id': 92, 'key': 'polska', 'name': 'Polska'}],
...
'geoipManualLock': False,
...
'loginRequired': False,
...

The test case https://vod.tvp.pl/website/krzysztof-krawczyk-cale-moje-zycie,51374466 is redirected via TVPIE to 332512 with the same result.

youtube_dl/extractor/tvp.py Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tvp.py Show resolved Hide resolved
youtube_dl/extractor/tvp.py Show resolved Hide resolved
youtube_dl/extractor/tvp.py Outdated Show resolved Hide resolved
elif fatal:
raise RegexNotFoundError('Unable to extract %s' % _name)
else:
self._downloader.report_warning('unable to extract %s' % _name + bug_reports_message())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._downloader.report_warning('unable to extract %s' % _name + bug_reports_message())
self.report_warning('unable to extract %s' % _name + bug_reports_message())

@bibiak1
Copy link
Author

bibiak1 commented May 15, 2023

It works from PL without logging in. Most likely it's geo location oriented. If you are not from PL IP address then you need to login.

I read all comments and do not know what i suppose to do. tvp.py code starting line 255 (class TVPappIE) is mine. I've just added support for new schema/player and didn't want to change anything that exist before.

bibiak1 and others added 6 commits May 22, 2023 21:21
Co-authored-by: dirkf <fieldhouse@gmx.net>
Co-authored-by: dirkf <fieldhouse@gmx.net>
Co-authored-by: dirkf <fieldhouse@gmx.net>
Co-authored-by: dirkf <fieldhouse@gmx.net>
Co-authored-by: dirkf <fieldhouse@gmx.net>
Copy link
Author

@bibiak1 bibiak1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow tested. All OK

@bibiak1 bibiak1 requested a review from dirkf May 22, 2023 19:37
@bibiak1
Copy link
Author

bibiak1 commented Jun 19, 2023

@dirkf can you merge this pull req?

@selfisekai selfisekai mentioned this pull request Aug 10, 2023
5 tasks
@pawisoon
Copy link

@dirkf can this be reviewed and merged in? 🙏

Copy link
Author

@bibiak1 bibiak1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please proceed with merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants