Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issues with parsing .ssa subtitles #68

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wiedymi
Copy link

@wiedymi wiedymi commented Aug 12, 2024

Hi! I've fixes issues with parsing real complex .ass subtitles.

@wiedymi
Copy link
Author

wiedymi commented Aug 12, 2024

@adracea please review the code.

@@ -688,7 +691,7 @@ mod parse {
strikeout: parse_str_to_bool(
get_line_value(
&headers,
"Strikeout",
"StrikeOut",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to support Strikeout and StrikeOut i think. At least in the spec in the section 5 overview it says that the field is called StrikeOut, but in the more detailed explanation beneath (field 9.2) it says Strikeout

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the get_line_value function it will be in lowercase anyway, so it does not matter anymore.

Comment on lines +121 to +138
pub outline: f32,
/// If [SSAStyle::border_style] is `1`, then this specifies the depth of the drop shadow behind
/// the text (in pixels). Values may be `0`, `1`, `2`, `3` or `4`. Drop shadow is always used in
/// addition to an outline - SSA will force an outline of 1 pixel if no outline width is given.
pub shadow: i8,
pub shadow: f32,
/// Sets how text is "justified" within the Left/Right onscreen margins, and also the vertical
/// placing.
pub alignment: Alignment,
/// Defines the Left Margin in pixels.
pub margin_l: i32,
pub margin_l: f32,
/// Defines the Right Margin in pixels.
pub margin_r: i32,
pub margin_r: f32,
/// Defines the Vertical Left Margin in pixels.
pub margin_v: i32,
pub margin_v: f32,
/// Specifies the font character set or encoding and on multilingual Windows installations it
/// provides access to characters used in multiple than one language. It is usually 0 (zero)
/// for English (Western, ANSI) Windows.
pub encoding: i32,
pub encoding: f32,
Copy link
Contributor

@bytedream bytedream Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you changed the types?

Copy link
Author

@wiedymi wiedymi Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality those properties are floats.

Copy link
Author

@wiedymi wiedymi Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
This is from libass documentation.

I need to update types according to .ssa specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see. I've only used this spec as reference when rewriting the ass parsing and interpreted the "Values may be ..." part as "values must be ...".
I don't want to sound know-it-all, but the documentation of libass is not a ass/ssa specification, they have stated that right on the top. But I think most of subtitle software that utilized ass subtitles is built around libass so having compatibility to it is a good thing ig

@@ -246,7 +246,10 @@ impl SSA {
let mut line_num = 0;

let mut blocks = vec![vec![]];
for line in content.as_ref().lines() {

let contents: String = content.as_ref().to_string().replace("\u{feff}", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this character do / why removing it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you try to open any subtitle file the first char will be \u{feff} and it breaks parsing, so we need to remove it. I don't why it appears when I do read_to_string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently \ufeff is a byte order mark (BOM). Normally this is only set when the file is UTF-16 or UTF-32 encoded. Rust is completely UTF-8 oriented, so read_to_string expects the input file to be in UTF-8 and thus does not strip it away. For me that's something that should be sanitized out by the user and not the library

Urantij added a commit to Urantij/rsubs-lib that referenced this pull request Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants