My target is using a web automatic test framework Selenium to crawling the users' homepage in LinkedIn, I use the regular expression and some tricks to match the information the most recruiters are interested in such as working experience and educational experience without being detected as a web crawler, I also make a rule to calculate the score of the profile of the user, the more score the profile means the more competitive the user is in IT market.
Before running it, you will have a firefox.exe as the kenel of the webdriver to run the program, you should also notice the version of the firefox.exe and the version of the Selenium framework, I strongly recommond you to download the firefox from the project to make sure the version of firefox is compatible with the Selenium, we use selenium-server-standalone-2.44.0.jar and firefox33.1.
FirefoxBinary binary = new FirefoxBinary(new File("C:\\Program Files (x86)\\Mozilla Firefox\\Firefox.exe"));
id.sendKeys("********");
pass.sendKeys("********");
If you use the traditional way such as using some web tools to construct the http request and get the http response will have the following question:
2.2.1: The Linkedin website has done a lot of things to anti the web crawler, first you have to login before you enter their website which means you have to get all the cookies from the website and construct a request using these cookies, they also generate some random number in the form, you also have the take the random number to their server. I have done this in another project by using go language.
2.2.2: Lots of websites are not using the static html, they use ajax to dynamic load the page, in order to get the entire page, you have to form all the ajax request, it can be done, but it will take a lot of time and have bugs.
2.2.3 Selenium can do all things for you, what you need to do is just get the page and analyse it which confront with our needs.
WebElement position = driver.findElement(By.xpath("//h2[@class='pv-top-card-section__headline Sans-19px-black-85%']"));
For more information about the Selenium, please look at here: Selenium doc
We want to evaluate a user from several prospectives:
-
The educational experience: The full socre is 100 scores
-
The working experience: The full score is 100 scores
-
The skills match the recruiters' requirment
scores = (float) (0.4 * skillScore + 0.3 * eduScore + 0.3 * workScore);
if the candidate is not a graduate student from the university the weights of the educational experience will be lower:
scores = (float) (0.5 * skillScore + 0.4 * workScore + 0.1 * eduScore);