You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello.
Thanks for crawlee^^
Can you explain please is there convinient way for handling this:
===
Have
httpCrawler in use
sessionPool in use
requestQueue in use
headerGenerator in use
gotOptions.sessionToken = crawlingContext.session;
Want
when creating new session (before it will be used for work with requestQueue) visit certain url and get cookies from it. keeping headers persistent for all requests within session (include this "cookie-init" request).
Thoughts
I see createSessionFunction param in sessionPoolOptions but i cant make requests from there bound to session being created. There is no sendRequest and context.
If create another got-scraping inside createSessionFunction (along with being created for currently constructing httpCrawler) it will have its own storage for generated headers and gotOptions.sessionToken will not work.
const crawler = new HttpCrawler({
useSessionPool: true,
persistCookiesPerSession: true,
sessionPoolOptions: {
createSessionFunction: (sessionPool, options) => {
let resultSession;
const sessionOptions = options?.sessionOptions || {};
resultSession = new Session({
...sessionOptions,
sessionPool: sessionPool,
});
// need to visit page and use cookies from it during this session
// keep headers consistent not random
// ???
return resultSession;
},
},
preNavigationHooks: [
async (crawlingContext, gotOptions) => {
gotOptions.sessionToken = crawlingContext.session;
gotOptions.useHeaderGenerator = true;
},
],
});
crawler.router.addHandler( 'SOME_LABEL', async ({ request, json, log}) => {
//some work
});
export async function run() {
await addRequestsToQueue();
await crawler.run();
}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello.
Thanks for crawlee^^
Can you explain please is there convinient way for handling this:
===
Have
httpCrawler in use
sessionPool in use
requestQueue in use
headerGenerator in use
gotOptions.sessionToken = crawlingContext.session;
Want
when creating new session (before it will be used for work with requestQueue) visit certain url and get cookies from it. keeping headers persistent for all requests within session (include this "cookie-init" request).
Thoughts
I see createSessionFunction param in sessionPoolOptions but i cant make requests from there bound to session being created. There is no sendRequest and context.
If create another got-scraping inside createSessionFunction (along with being created for currently constructing httpCrawler) it will have its own storage for generated headers and gotOptions.sessionToken will not work.
Beta Was this translation helpful? Give feedback.
All reactions