-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/updating indeed scraper (#166) #170
Conversation
* - Updated to mobile endpoints and user agents to prevent CAPTCHA - Updated parsing of indeed scraper - Fixed tags not being parsed correctly - Fixed remoteness not being parsed correctly - Changed to only scrape the first page of each search by default for speed * - Updated method of loading user agent files - Updated user agent file of indeed scraper * - Updated versions in requirements.txt - Added in black configuration file for formatting - Added a pre-commit hook so all contributors will have consistent formatting on upload - Updated all python files to conform to black formatter * Updated Python version * More black formatting updates * - Added prettierrc and prettierignore - Formatted all files other than python * Updated prettierignore so prettier can search through subdirectories * Reset formatting to longer line width * Reverted to previous commit * Updating again to longer line width after accounting for missing files * Updated prettierrc and prettierignore files and reran formatting * Updated version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets put back the markdown + demo file but other than this i'm approving it
- Reverted settings_USA changes - Updated readme - Removed extra user-agent from phone user agents list - Removed extra comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow makes me so happy to see this working again :)
a few things
- we should update readme to say that 3.11 is required at least
- we need to add
user_agent_mobile.txt
to theMANIFEST.in
file so that it is properly packaged when you install/run usingpip install .
and thefunnel load -s...
per the instructions for new users in the readme.
…mobile user agent list to the MANIFEST.in
Thanks @PaulMcInnis, we should be good to go! As I was updating, I was wondering why this project still uses I will open a new issue (#172) on this as I believe we should probably switch while the project is still small, but would love to hear if there are any reasons not to. |
Hey there, no reason other than age really - this is an old project and it needs some love |
Description
Merging fixed indeed scraper into master branch. For more information, see #166
Additional changes include updates to formatting, adding black and prettier configuration files, and performing minor updates to the repo (versioning, updating modules, etc.).
Biggest point of discussion is whether we want to keep the current method of scraping each individual job page, or scraping the job list with description summaries. The latter (default implementation in this PR), is much faster, though will result in less information in the description. It is possible to manually switch back, but do we want to keep it as the default implementation?
Context of change
Type of change
How Has This Been Tested?
Same tests as in #166
Checklist: