Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not parsed expressions on this pages #2

Open
martinratinaud opened this issue Mar 15, 2016 · 9 comments
Open

Not parsed expressions on this pages #2

martinratinaud opened this issue Mar 15, 2016 · 9 comments

Comments

@martinratinaud
Copy link

Hi,

here are some pages where quote detection does not work

Also I guess you should parse

:-)

Thanks again

@mitica
Copy link
Owner

mitica commented Mar 15, 2016

Hi Martin,

quote-parser extracts quotes from text, not from HTML. It doesn't operate with HTML tags/logic.
You need a module that finds quotes by HTML tags like blockquote.

Cheers

@martinratinaud
Copy link
Author

All right :-)

but here is some code that strip HTML and it still does not detect the quote of dave Nicholette

const $ = require('cheerio');
const parser = require('quote-parser');
const request = require('request');

const sites = [
  'https://www.blossom.co/blog/3-tips-for-quick-effective-stand-up-meetings'
];


sites.forEach((site) => {
  request(site, function (error, response, body) {
    if (error || response.statusCode !== 200) {
      return;
    }

    var t,
        quotes;

    t = $.load(body)("body");
    t.find("script").remove();//eslint-disable-line

    quotes = parser.parse(t.text(), 'en');
    console.log(); //eslint-disable-line
    console.log("-------------------------------------"); //eslint-disable-line
    console.log(site); //eslint-disable-line
    console.log(quotes); //eslint-disable-line
  });
});

@mitica
Copy link
Owner

mitica commented Mar 16, 2016

Hi, I can't find quotes with Quotation marks in your example.

I see one quote:

  • HTML:
    `

    Another issue with the conventional format is that tasks or workstreams aren’t discussed coherently; instead, each subject comes up briefly depending on the order in which team members speak. This can make it hard to tell what’s really going on.

    Dave Nicolette

    `
  • TEXT:
    Another issue with the conventional format is that tasks or workstreams aren’t discussed coherently; instead, each subject comes up briefly depending on the order in which team members speak. This can make it hard to tell what’s really going on. — Dave Nicolette

quote-parser will try to detect quotes in TEXT and it will not find because there are no quotation marks.

It will work for this example:
"Another issue with the conventional format is that tasks or workstreams aren’t discussed coherently; instead, each subject comes up briefly depending on the order in which team members speak. This can make it hard to tell what’s really going on." — Dave Nicolette

:) Cheers

@martinratinaud
Copy link
Author

Thanks Dumitru I get it

@martinratinaud
Copy link
Author

Hi, ok so if I get it well, on this page https://www.goodreads.com/quotes

If I transform it into text I get stuff like this

“I believe that everything happens for a reason. People change so that you can learn to let go, things go wrong so that you appreciate them when they're right, you believe lies so you eventually learn to trust no one but yourself, and sometimes good things fall apart so better things can fall together.”
    ―
    Marilyn Monroe

which should be detected, right ?

@mitica
Copy link
Owner

mitica commented Mar 17, 2016

I'm not sure it will be correct(safe) for all situations.
Now(0.1.4), for below text it will not detect any quotes:

“I believe that everything happens for a reason...”
    ―
    Marilyn Monroe

But for this text it will:

“I believe that everything happens for a reason...”
    ― Marilyn Monroe

@martinratinaud
Copy link
Author

Ok thanks, do you plan on modifying it or do you think it is not relevant ?

Thanks

@mitica
Copy link
Owner

mitica commented Mar 17, 2016

I've added an option: extraRules - v0.1.5

You can use it. See last test.

It's not a cool solution, but it works :)

@martinratinaud
Copy link
Author

Hi Dumitru,

Ok that wortks great and I will put it in the next release of my chrome extension.

Though, can you explain to me why you don't put that pattern in your core rules ?

Also I have another problem with this

    var t,
        quotes;

    t = $(`<p>“I teach people how to sell their books online. The less code they
      need the better. Zapier eliminates the need for code.”</p>
      <cite>—Paul Jarvis, <span>Web Designer &amp; Author<span></span></span></cite>`);
    t.find("script").remove();//eslint-disable-line

    var options = {
            minLength: 10,
            extraRules: [
        {
                reg: /“([^\f\t\v“”„]{10,})”[ \t\u00A0]*[\n\r]+[ \t\u00A0]*[\u2010-\u2015-][ \t\u00A0\r\n]*([^\f\n\r\t\v,]{3,30})(?:$|[\n\r])/gi,
                quote: 0,
                name: 1
            }
      ]
        };

    quotes = parser.parse(t.text(), 'en', options);
    console.log(t.text());
    console.log(); //eslint-disable-line
    console.log("-------------------------------------"); //eslint-disable-line
    console.log(site); //eslint-disable-line
    console.log(quotes); //eslint-disable-line

This gives this string to parse

“I teach people how to sell their books online. The less code they
      need the better. Zapier eliminates the need for code.”
      —Paul Jarvis, Web Designer & Author

Thanks again for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants