Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 687 Bytes

README.md

File metadata and controls

16 lines (12 loc) · 687 Bytes

php-nim-scraper

Naive approach in scraping all of Institut Teknologi Bandung NIMs (Nomor Induk Mahasiswa) from ITB Network Information Center (nic.itb.ac.id). To see all of the scraped data in action, see ITB NIM Finder.

How-to

  • cURL is needed for curl-crawl.php, whereas fgc-crawl.php only uses file_get_contents (but is considerably slower).
  • Include your logged-in cookie from nic.itb.ac.id to the source (try making a HTTP call here).
  • Run the following command via CLI:
$ php curl-crawl.php
or
$ php fgc-crawl.php
  • Sample output (without emails to avoid spam): crawled.out.