Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcaster's validation fails for vertical videos #2649

Open
thomshutt opened this issue Nov 10, 2022 · 2 comments
Open

Broadcaster's validation fails for vertical videos #2649

thomshutt opened this issue Nov 10, 2022 · 2 comments
Assignees

Comments

@thomshutt
Copy link
Contributor

thomshutt commented Nov 10, 2022

Describe the bug
When local validation is enabled on the Broadcaster with the localVerify flag, vertical videos seem to consistently fail this validation.

To Reproduce

  1. Segment any vertical MP4 and push the segments through a Broadcaster with verification enabled
  2. Observe that segments will fail to transcode with a mixture of PM Check Failed, PixelsAbsent, PixelsMismatch. @cyberj0g pointed out that the former can happen for a couple of different reasons.

Expected behavior
Verification passes

@yondonfu
Copy link
Member

yondonfu commented Nov 10, 2022

Re: the PixelsMismatch errors

Context:

The term "local verification" refers to:

  • Pixel count verification
  • Signature verification

Local verification is enabled by default unless:

  • -localVerify=false is passed to the node
  • The node is in off-chain mode and -localVerify is not specified

Pixel Count Verification

An O will calculate the # of pixels that were encoded for each transcoded rendition. This information is reported back to B because the # of pixels encoded for a rendition multiplied by the current price per pixel determines the fee that O is charging for returning the rendition which will determine how often B needs to send payment tickets to O.

B will calculate the # of pixels in a transcoded result. Under the hood, this involves calling the LPMS transcode3() method without any TranscodeOptions (i.e. param value is nil) which will tell LPMS to just decode the input. The TranscodeResults struct returned will include the pixels decoded in the Decoded field. The pixels decoded by this call should be equal to the pixels encoded reported by O. If the pixel count comparison fails, a PixelsMismatch error is returned.

In my testing, I observed that the following videos resulted in PixelMismatch errors during transcoding when local verification was enabled:

  • w3s.link/ipfs/bafybeid7m3zxjzezyxsqcmay764au4o73qcoblmg4oxqms2kfkd5vh4hgi
  • w3s.link/ipfs/bafybeihmwckalrtm3hylltkrd6p3j4kizsglpxguffpgp4b7aioy27staq

The videos were passed into the Catalyst VOD pipeline and were first segmented by Mist.

One way to debug these PixelMismatch errors could be to take the segmented m3u8 playlist for one of the example videos above and:

  • Run a LPMS test to transcode each of the segments on Nvidia and note the encoded pixel count
  • Run a LPMS test to decode each of the segments on a CPU and note the decoded pixel count
  • Run a LPMS test to decode each of the segments on Nvidia and note the decoded pixel count

And we can see if there are differences between any of these values. If there is a difference (which we suspect there will be), we can move forward by understanding why there is a difference when our belief is that all of the values should theoretically be the same.

A few questions for us to consider as we investigate:

  • Why is the pixel count decoded on a CPU by B different from the pixel count encoded by Nvidia on O?
  • Is the difference due to Nvidia encoding? Can it be addressed with changes to LPMS? What is the effort required?
  • If the difference cannot be addressed with changes to LPMS would doing CPU encoding for these cases be a viable solution?

Note: This approach for pixel count verification is known to be inefficient right now since it requires a decode by B. We might be able to make improvements here such as relying on the pixel count reported by a trusted O.

@cyberj0g
Copy link
Contributor

Update

This issue is Nvenc related. I found that for some resolutions of portrait videos (e.g. 144x256), Ffmpeg's AVCodecContext structure will report correct width, consistent with the AVFrame and requested parameters, but actual frames written are of WIDTH+N, N<=2 and consistent with values in SPS NAL (frame sequence headers). This will produce a fully valid output file, but with a slightly different width.

Because data structures we are using for counting pixels during encoding does not reflect above, the reported pixel count will not match the actual pixel count calculated by decoding the frames. It may also be the case that neither of pixel counting approaches is correct, because we are not taking conformance cropping window into account, which needs to be applied to video frame resolutions to get actually displayed resolution. I tried to align the width for portrait videos to be a multiple of 32, as discussed here, to no avail. Nvenc seem to have obscure rules of calculating, which resolution it will actually output.

This needs further investigation. A temporary workaround might be to introduce a tolerance threshold, when matching pixel counts, to account for slight video width mismatch.

I updated the portrait resolution test to include pixel count matching, and it fails now. One more test case we may want to add is for the videos with variable resolutions. We already have such test for software encoder.

Tagging @AlexKordic as this is interconnected with portrait resolutions and rotation flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants