-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
189 lines (135 loc) · 9.65 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset='utf-8'>
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<link href="https://fonts.googleapis.com/css?family=Josefin+Sans|Josefin+Slab|News+Cycle" rel="stylesheet">
<link rel="stylesheet" href="/Python-Scraping/assets/css/style.css?v=43669a44c3a7bf7be2afda0d8bf045eea639d5ce" media="screen" type="text/css">
<link rel="stylesheet" href="/Python-Scraping/assets/css/print.css" media="print" type="text/css">
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<!-- Begin Jekyll SEO tag v2.3.0 -->
<title>Python-Scraping | Python codes for Scraping Google (https://google.com), TripAdvisor (https://tripadvisor.com), JIIT (https://jiit.ac.in) and JIIT Simplified (https://jiitsimplified.com) (for self-learning and Education purpose only)</title>
<meta property="og:title" content="Python-Scraping" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="Python codes for Scraping Google (https://google.com), TripAdvisor (https://tripadvisor.com), JIIT (https://jiit.ac.in) and JIIT Simplified (https://jiitsimplified.com) (for self-learning and Education purpose only)" />
<meta property="og:description" content="Python codes for Scraping Google (https://google.com), TripAdvisor (https://tripadvisor.com), JIIT (https://jiit.ac.in) and JIIT Simplified (https://jiitsimplified.com) (for self-learning and Education purpose only)" />
<link rel="canonical" href="https://newtein.github.io/Python-Scraping/" />
<meta property="og:url" content="https://newtein.github.io/Python-Scraping/" />
<meta property="og:site_name" content="Python-Scraping" />
<script type="application/ld+json">
{"name":"Python-Scraping","description":"Python codes for Scraping Google (https://google.com), TripAdvisor (https://tripadvisor.com), JIIT (https://jiit.ac.in) and JIIT Simplified (https://jiitsimplified.com) (for self-learning and Education purpose only)","author":null,"@type":"WebSite","url":"https://newtein.github.io/Python-Scraping/","image":null,"publisher":null,"headline":"Python-Scraping","dateModified":null,"datePublished":null,"sameAs":null,"mainEntityOfPage":null,"@context":"http://schema.org"}</script>
<!-- End Jekyll SEO tag -->
<style>
html, body {
max-width: 100%;
margin: 0;
background: #fdf9f2 !important;
font-weight: 400;
font-family: Josefin Sans,sans-serif,arial,serif !important;
color:#000000 !important;
}
p{
font-weight:400 !important;
color:#000000 !important;
margin-top:2px;
}
h2 {
width: 100%;
border-bottom: 2px solid #ed5565;
line-height: 1.5em;
margin: 5px 0px 10px 0px;
}
h3 {
width: 100%;
border-bottom: 2px solid #ed5565;
line-height: 1em;
margin: 5px 0px 10px 0px;
}
</style>
<script
src="https://code.jquery.com/jquery-3.3.1.min.js"
integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8="
crossorigin="anonymous"></script>
<script>
$(function(){
$("#includedContentDiv").load("https://raw.githubusercontent.com/newtein/no_escape_search/master/tempfile.html");
});
</script>
</head>
<body>
<header style="background-image:None;background-color:#ed5565;">
<div class="inner">
<a href="https://newtein.github.io/Python-Scraping/">
<h1 style="color:#ffffff;">Python Scraping </h1>
</a>
<h2 style="color:#ffffff;"> Google, TripAdvisor, JIIT and JIIT Simplified </h2>
<a href="https://github.com/newtein/Python-Scraping" class="button"><small>View project on</small> GitHub</a>
</div>
</header>
<div id="content-wrapper">
<div class="inner clearfix">
<section id="main-content">
<h1 id="python-scraping">Python-Scraping</h1>
<p>Python codes for Scraping Google (https://google.com), TripAdvisor (https://tripadvisor.com), JIIT (https://jiit.ac.in) and JIIT Simplified (https://jiitsimplified.com)</p>
<h1 id="google-scraping">Google Scraping</h1>
<h2 id="problem-statement">Problem Statement</h2>
<h2 id="to-store-google-search-results-for-java-logging">To store Google Search results for Java Logging</h2>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/1.%20Google_Scrapping/images/glogging.png" alt="alt text" /></p>
<h2 id="python-script-and-mysql-database">Python Script and MySQL Database</h2>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/1.%20Google_Scrapping/images/1.%20Google_to_MySQL.JPG" alt="alt text" /></p>
<p>## MySQL Database
<img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/1.%20Google_Scrapping/images/2.%20database_snap.JPG" alt="alt text" /></p>
<h1 id="tripadvisor-scraping">TripAdvisor Scraping</h1>
<h2 id="problem-statement-1">Problem Statement</h2>
<h2 id="to-scrap-flight-id-numbers-from-tripadvisor-used-for-self-education-only">To Scrap Flight ID numbers from TripAdvisor (Used for self-education only)</h2>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/2.%20TripAdvisor_Scrap/images/1.%2025_Pages_Of_Flight_On_TripAdvsior.JPG" alt="alt text" /></p>
<h2 id="page-allows-post-request-only">Page allows Post Request only</h2>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/2.%20TripAdvisor_Scrap/images/2.%20challenge_as_page_changes_by_Post_request.JPG" alt="alt text" /></p>
<p>## Post to GET
<img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/2.%20TripAdvisor_Scrap/images/3.%20purged_network_and_converted_post_to_get.JPG" alt="alt text" /></p>
<p>## Database</p>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/2.%20TripAdvisor_Scrap/images/4.%20Database.JPG" alt="alt text" /></p>
<h1 id="jiit-website-scraping">JIIT Website Scraping</h1>
<h2 id="problem-statement-2">Problem Statement</h2>
<h2 id="to-create-a-faculty-information-portal-used-for-self-education-only">To create a faculty information portal (Used for self-education only)</h2>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/3.%20Jaypee_Faculty_Scrap/images/1.%20Faculty_Info_to_Scrap.JPG" alt="alt text" /></p>
<h2 id="python-code">Python Code</h2>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/3.%20Jaypee_Faculty_Scrap/images/2.%20Python_Code.JPG" alt="alt text" /></p>
<p>## Database
<img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/3.%20Jaypee_Faculty_Scrap/images/3.%20Database%201.JPG" alt="alt text" /></p>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/3.%20Jaypee_Faculty_Scrap/images/4.%20Database%202%20EduStorage.JPG" alt="alt text" /></p>
<h1 id="jiit-simplified-scraping">JIIT Simplified Scraping</h1>
<h2 id="problem-statement-3">Problem Statement</h2>
<h2 id="to-update-static-html-pages-at-jiit-simplified-move-static-information-to-mysql-and-recreate-dynamic-pages">To update static HTML pages at JIIT Simplified, move static information to MySQL and recreate Dynamic Pages.</h2>
<h3 id="old-pages-at-jiit-simplified-with-static-html">Old pages at JIIT Simplified with static HTML</h3>
<h3 id="example-visit-httpswwwjiitsimplifiedcomcompanyamazonphp">Example: visit https://www.jiitsimplified.com/company/amazon.php</h3>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/4.%20JIIT_Simplified_Scrap/images/1.%20Old_Placement_Forum_Static_HTML_content.JPG" alt="alt text" /></p>
<h2 id="scarpped-database">Scarpped database</h2>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/4.%20JIIT_Simplified_Scrap/images/2.%20Scrapped%20Database%201.JPG" alt="alt text" /></p>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/4.%20JIIT_Simplified_Scrap/images/3.%20Scrapped%20Database%202.JPG" alt="alt text" /></p>
<h2 id="new-dymanic-placement-forum-for-desktop-only">New Dymanic Placement Forum (for Desktop only)</h2>
<h3 id="example-visit-httpswwwjiitsimplifiedcomdisplayphpcompanyamazon">Example: visit https://www.jiitsimplified.com/display.php?company=amazon</h3>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/4.%20JIIT_Simplified_Scrap/images/4.%20New_Placement_Forum_Dynamic.JPG" alt="alt text" /></p>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/4.%20JIIT_Simplified_Scrap/images/5.%20New%202.JPG" alt="alt text" /></p>
<p><img src="https://raw.githubusercontent.com/newtein/Python-Scraping/master/4.%20JIIT_Simplified_Scrap/images/6.%20New%203.JPG" alt="alt text" /></p>
</section>
<a href="#otherProjects"><h2 class="headB"> Other Projects </h2> </a>
<aside id="sidebar">
<h2> Self-Project </h2>
<p> Mar, 2017 - May, 2017 </p>
<h2> Domain </h2>
<p> Web Crawling and Scraping <p>
<h2 id="technologies-used">Tools/Technologies Used</h2>
<p> Python (Beautifulsoup, Mechanical Soup, Selenium), Database (MySQL) </p>
<div id="includedContentDiv"></div>
<p class="repo-owner"><a href="https://github.com/newtein/Python-Scraping">Python-Scraping</a> is maintained by <a href="https://github.com/newtein"> Harshit Gujral (newtein)</a>.</p>
<script src="https://use.fontawesome.com/8b09d5ebcd.js"></script>
<p> Made with <font color="#ed5565"> <i class="fa fa-heart" aria-hidden="true"></i> </font> by harshit</p>
</aside>
</div>
</div>
</body>
</html>