Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to recognize (properly ignore?) colors #129

Open
097115 opened this issue Dec 14, 2022 · 13 comments · Fixed by #130
Open

Fails to recognize (properly ignore?) colors #129

097115 opened this issue Dec 14, 2022 · 13 comments · Fixed by #130
Labels
enhancement New feature or request

Comments

@097115
Copy link

097115 commented Dec 14, 2022

When colors are added to messages, kirc ignores the control sequences (which is probably a good thing) but still displays the actual codes.

A simple example would be newsly bot from the ##news channel at libera.chat. With the log entry like this (^B and ^C being the actual binary entries, the raw log is available here):

Wed Dec 13 23:02:36 2022::newsly!~newsly@user/yano/bot/rssly PRIVMSG ##news :[^B^C6ECNS for China^C^B] ^BU.S. panel investigating Capitol riot to release final report next week^B http://www.ecns.cn/news/2022-12-14/detail-ihchtzzp6213103.shtml

kirc displays it like this -- note 6 after the opening square bracket:

newsly [6ECNS for China] U.S. panel investigating Capitol riot to release final report next week http://www.ecns.cn/news/2022-12-14/detail-ihchtzzp6213103.shtml

Would it possible to catch those somehow? And probably hide / ignore altogether?

@mcpcpc mcpcpc added the enhancement New feature or request label Dec 15, 2022
@mcpcpc
Copy link
Owner

mcpcpc commented Dec 15, 2022

Should be doable. The logging function is pretty dumb at the moment and does not handle escape sequences gracefully. Give me some time though (probably until after the new year) and I will prepare a new release (also with some overdue refactoring).

@mcpcpc
Copy link
Owner

mcpcpc commented Dec 16, 2022

@097115 had some bandwidth to work on this issue. Let me know if the 0.3.2 branch resolves your issue.

@097115
Copy link
Author

097115 commented Dec 16, 2022

Let me know if the 0.3.2 branch resolves your issue.

Sorry but it seems like no :)

I did git clone -b 0.3.2 https://github.com/mcpcpc/kirc/, but it still prints out the color codes, like this (note the 13 code):

newsly [13Google News - India] Parliament Winter Session updates...

And the logging now has [almost] the same line twice :) Like this:

Fri Dec 16 06:23:07 2022::newsly!~newsly@user/yano/bot/rssly PRIVMSG ##news :[��13Google News - India��] �Parliament Winter Session updates | December 15 2022 - The Hindu� https://www.thehindu.com/news/national/parliament-winter-session-updates-15-december-2022/article66265787.ece 2022-12-15T05:06:00
:newsly!~newsly@user/yano/bot/rssly PRIVMSG ##news :[13Google News - India] Parliament Winter Session updates | December 15 2022 - The Hindu https://www.thehindu.com/news/national/parliament-winter-session-updates-15-december-2022/article66265787.ece 2022-12-15T05:06:00

:)

@mcpcpc
Copy link
Owner

mcpcpc commented Dec 16, 2022

@097115 My apologies… I forgot to remove the original printed string with the fix. But you can see that the second “duplicate” string removes the non-ASCII characters from being printed (e.g. ^B and ^C). Try the latest commit to the 0.3.2 Branch and let me know if that looks better =).

Regarding the 13 being printed, that will have to be addressed in the message parser directly (since it also prints to the console). So I'll need a little more time to dig into that.

@097115
Copy link
Author

097115 commented Dec 16, 2022

Yeah, the log is now cleaner, indeed :) However, this binary stuff was never the actual problem :)

As I mentioned earlier, the problem was color codes (numbers) that follow the control sequences. E.g., in the string [13Google News - India] Parliament Winter Session updat the number 13 is such a code. Ctrl-B makes the text bold, and Ctrl-C followed by a color code paints it.

So, now the control characters, like ^B and ^C are ignored (which is good) but the color codes are still shown (both in the log and in kirc itself).

Thus, you probably shouldn't just outright ignore those sequences but in order not to keep / not to show this number garbage you likely should also analyze them to some degree -- and to ignore not only the binary stuff but also the following codes, too :).

@mcpcpc
Copy link
Owner

mcpcpc commented Dec 16, 2022

@097115 The latest commit removes the garbage characters in addition to the non-ASCII characters. I need to spend more time considering other impacted use cases for multi-byte characters, but this should work for the time being.

@097115
Copy link
Author

097115 commented Jan 5, 2023

Not to sound like a prick but no, it still has problems:

  • The captured buffer from kirc itself: screenshot, text
  • The corresponding part from the log: text

Unfortunately, I'm not using kirc at the moment, so can't be much of help, but as you can see, the log still has color codes parts if the code in question was double digits, and the program itself still displays them in any case.

@mcpcpc
Copy link
Owner

mcpcpc commented Jan 5, 2023

Oof. I see it in my own logs now re-reviewing them. Looks like I forgot to push my last commit before the holidays.

@mcpcpc mcpcpc reopened this Jan 5, 2023
@stefan11111
Copy link
Contributor

I made a patch that ignores color codes.

diff --git a/kirc.c b/kirc.c
index 9a96b88..2c026c9 100644
--- a/kirc.c
+++ b/kirc.c
@@ -615,9 +615,12 @@ static void log_append(char *str, char *path)
     while (*str != '\0') {
         if (*str >= 32 && *str < 127) {
             fwrite(str, sizeof(char), 1, out);
-        } else if (*str == 3 || *str == 4) {
+        }
+        /* this may not be needed anymore */
+        else if (*str == 3 || *str == 4) {
             str++;
-        } else if (*str == '\n') {
+        }
+        else if (*str == '\n') {
             fwrite("\n", sizeof(char), 1, out);
         }
         str++;
@@ -1071,6 +1074,47 @@ static void param_print_channel(param p)
 
 static void raw_parser(char *string)
 {
+    int len = strlen(string);
+    char *_str1 = malloc(len + 1);
+    char *str1 = _str1;
+    char *str = string;
+    while(*str) {
+        if (*str == 3 || *str == 4) {
+            /* color codes are ^Cxx,yy */
+            /* xx and yy are numbers, either 1 or 2 digits */
+            /* the ,yy part is optional */
+            /* I also notiuced that sometimes, color codes are terminated with '\267' */
+            str++;
+            char *p = str;
+            while (*p && *p != ',' && *p >= '0' && *p <= '9' && p < str + 2) {
+                p++;
+                len--;
+            }
+            str = p;
+            if (*str != ',') {
+                continue;
+            }
+            str++;
+            p = str;
+            while (*p && *p >= '0' && *p <= '9' && p < str + 2) {
+                p++;
+                len--;
+            }
+            str = p;
+            if (*str != '\267') {
+                continue;
+            }
+            str++;
+            continue;
+        }
+        *str1 = *str;
+        str1++;
+        str++;
+    }
+    *str1 = '\0';
+    memcpy(string, _str1, len + 1);
+    free(_str1);
+
     if (!memcmp(string, "PING", sizeof("PING") - 1)) {
         string[1] = 'O';
         raw("%s\r\n", string);

@mcpcpc I saw in the code that '\004' is also special cased, along with '\003'.
However, when reading the 'spec' for this, I found no mention of '\004'. Why was what added?

I say 'spec', because this looks to me like less of an actual spec, and more like 'some clients implemented this, and it spread'.

This is what I read when implementing this:
https://www.mirc.com/colors.html

Is there something I'm missing here?
I also added handling for '\267', because of what I've noticed in my testing, and not because of any 'spec'.

@stefan11111
Copy link
Contributor

I was asked to also strip spaces that sometimes appear after color codes via email.
Here's a patch for that:

diff --git a/kirc.c b/kirc.c
index 9a96b88..eb395bf 100644
--- a/kirc.c
+++ b/kirc.c
@@ -615,9 +615,12 @@ static void log_append(char *str, char *path)
     while (*str != '\0') {
         if (*str >= 32 && *str < 127) {
             fwrite(str, sizeof(char), 1, out);
-        } else if (*str == 3 || *str == 4) {
+        }
+        /* this may not be needed anymore */
+        else if (*str == 3 || *str == 4) {
             str++;
-        } else if (*str == '\n') {
+        }
+        else if (*str == '\n') {
             fwrite("\n", sizeof(char), 1, out);
         }
         str++;
@@ -1071,6 +1074,66 @@ static void param_print_channel(param p)
 
 static void raw_parser(char *string)
 {
+    int len = strlen(string);
+    char _str1[MSG_MAX + 1]; /* char *string is at most this large */
+    char *str1 = _str1;
+    char *str = string;
+    while(*str) {
+        if (*str != 3 && *str != 4) {
+            *str1 = *str;
+            str1++;
+            str++;
+            continue;
+        }
+        /* color codes are ^Cxx,yy */
+        /* xx and yy are numbers, either 1 or 2 digits */
+        /* the ,yy part is optional */
+        /* I also noticed that sometimes, color codes are terminated with '\267' */
+        /* Likewise, I noticed that color codes are also sometimes terminated with */
+                                        /* '\302', then '\225' or '\272', then ' ' */
+        str++;
+        len--;
+        char *p = str;
+        while (*p && *p != ',' && *p >= '0' && *p <= '9' && p < str + 2) {
+            p++;
+            len--;
+        }
+        str = p;
+        if (*str != ',') {
+            continue;
+        }
+        str++;
+        len--;
+        p = str;
+        while (*p && *p >= '0' && *p <= '9' && p < str + 2) {
+            p++;
+            len--;
+        }
+        str = p;
+        switch (*str) {
+        case '\267':
+            str++;
+            len--;
+        continue;
+        case '\302':
+            str++;
+            len--;
+            if (*str != '\225' && *str != '\272') {
+                continue;
+            }
+            str++;
+            len--;
+            if (*str != ' ') {
+                continue;
+            }
+            str++;
+            len--;
+        continue;
+        }
+    }
+    *str1 = '\0';
+    memcpy(string, _str1, len + 1);
+
     if (!memcmp(string, "PING", sizeof("PING") - 1)) {
         string[1] = 'O';
         raw("%s\r\n", string);

I also found out what the deal was with '\004':
https://modern.ircdocs.horse/formatting#color

This looks like some ircv3 stuff, which kirc doesn't implement.

This begs the question, how much is too much.
Should I just strip the codes?
Should I also handle '\267' ?
Should I also handle '\302' spaces?
Should I handle neither?

@mcpcpc I would like your input on this?

@stefan11111
Copy link
Contributor

And it turns out that the above is too much.
It breaks unicode.

Now the question remains:

Should I just strip the codes?
Should I also handle '\267' ?
Or should I handle neither?

If I see no spec for '\267', I'd go with the first.

stefan11111 added a commit to stefan11111/kirc that referenced this issue Sep 11, 2024
@stefan11111
Copy link
Contributor

I also added a switch for this.
If you want to strip color codes, you have to pass -f to kirc.

stefan11111 added a commit to stefan11111/kirc that referenced this issue Sep 14, 2024
@mcpcpc
Copy link
Owner

mcpcpc commented Sep 14, 2024

And it turns out that the above is too much. It breaks unicode.

Now the question remains:

Should I just strip the codes? Should I also handle '\267' ? Or should I handle neither?

If I see no spec for '\267', I'd go with the first.

Given I can't dig up anything on 267 either, I am leaning toward stripping them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants