-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
185 lines (119 loc) · 19.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>FallenAngel Blog</title>
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<meta name="description">
<meta property="og:type" content="website">
<meta property="og:title" content="FallenAngel Blog">
<meta property="og:url" content="http://yoursite.com/index.html">
<meta property="og:site_name" content="FallenAngel Blog">
<meta property="og:description">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="FallenAngel Blog">
<meta name="twitter:description">
<link href="/webfonts/ptserif/main.css" rel='stylesheet' type='text/css'>
<link href="/webfonts/source-code-pro/main.css" rel="stylesheet" type="text/css">
<link rel="stylesheet" href="/css/style.css">
</head>
<body>
<div id="container">
<header id="header">
<div id="header-outer" class="outer">
<div id="header-inner" class="inner">
<a id="main-nav-toggle" class="nav-icon" href="javascript:;"></a>
<a id="logo" class="logo" href="/"></a>
<nav id="main-nav">
</nav>
<nav id="sub-nav">
<div id="search-form-wrap">
<form action="//google.com/search" method="get" accept-charset="UTF-8" class="search-form"><input type="search" name="q" results="0" class="search-form-input" placeholder="Search"><button type="submit" class="search-form-submit"></button><input type="hidden" name="sitesearch" value="http://yoursite.com"></form>
</div>
</nav>
</div>
</div>
</header>
<section id="main" class="outer">
<article id="post-阿里云API签名机制实现-Python" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2016/08/02/阿里云API签名机制实现-Python/">阿里云API签名机制实现-Python</a>
</h1>
</header>
<div class="article-meta">
<a href="/2016/08/02/阿里云API签名机制实现-Python/" class="article-date">
<time datetime="2016-08-02T11:39:14.000Z" itemprop="datePublished">2016-08-02</time>
</a>
</div>
<div class="article-entry" itemprop="articleBody">
<p>用阿里云的云解析API作为例子,python版本为3.5(有些函数用法和python2不一样)<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">import requests,json,urllib.parse,datetime,hashlib,random,hmac,base64,time</span><br><span class="line">AliKey="testid"</span><br><span class="line">AliSecret="testsecret"</span><br><span class="line"></span><br><span class="line">def Url(a):</span><br><span class="line"> b=""</span><br><span class="line"> for i in range(0,len(a)):</span><br><span class="line"> try:</span><br><span class="line"> if a[i]=='/':</span><br><span class="line"> b+="%2F"</span><br><span class="line"> elif a[i]=='"':</span><br><span class="line"> b+="%22"</span><br><span class="line"> elif a[i]=='*':</span><br><span class="line"> b+="%2A"</span><br><span class="line"> elif a[i]=='=':</span><br><span class="line"> b+="%3D"</span><br><span class="line"> elif a[i]=='+':</span><br><span class="line"> b+="%20"</span><br><span class="line"> elif a[i]=='&':</span><br><span class="line"> b+="%26"</span><br><span class="line"> elif a[i]=='%':</span><br><span class="line"> b+="%25"</span><br><span class="line"> else:</span><br><span class="line"> b+=a[i]</span><br><span class="line"> except:</span><br><span class="line"> continue</span><br><span class="line"> return b</span><br><span class="line"></span><br><span class="line">def Ali_Sign(method,a):</span><br><span class="line"> Key=sorted(a.items(), key=lambda d: d[0])</span><br><span class="line"> Key2=urllib.parse.urlencode(Key)</span><br><span class="line"> Key2=Url(Key2)</span><br><span class="line"> Sign=method+"&%2F&"+Key2</span><br><span class="line"> hm=hmac.new(bytearray(AliSecret+"&",encoding="utf8"),Sign.encode("utf8"),digestmod=hashlib.sha1).digest()</span><br><span class="line"> b64hm=base64.b64encode(hm)</span><br><span class="line"> return str(b64hm,encoding="utf8")</span><br><span class="line"></span><br><span class="line">def Ali_Public():</span><br><span class="line"> Key={</span><br><span class="line"> "Format":"JSON","Version":"2015-01-09","AccessKeyId":AliKey,"SignatureMethod":"HMAC-SHA1",</span><br><span class="line"> "Timestamp":datetime.datetime.strftime(datetime.datetime.utcnow(),"%Y-%m-%dT%H:%M:%SZ"),"SignatureVersion":"1.0",</span><br><span class="line"> "SignatureNonce":str(random.randint(0,1000000000))+"-"+str(random.randint(0,100000000))}</span><br><span class="line"> return Key</span><br><span class="line"></span><br><span class="line">def Ali_Domain_GetRecord(Domainname,RRname):</span><br><span class="line"> Key=Ali_Public()</span><br><span class="line"> Key["Action"]="DescribeDomainRecords"</span><br><span class="line"> Key["DomainName"]=Domainname</span><br><span class="line"> Sign=Ali_Sign("GET",Key)</span><br><span class="line"> Key["Signature"]=Sign</span><br><span class="line"> req=requests.get("http://alidns.aliyuncs.com",params=Key)</span><br><span class="line"> print(req.content)</span><br><span class="line"> for i in json.loads(req.text)["DomainRecords"]["Record"]:</span><br><span class="line"> if i["RR"]==RRname:</span><br><span class="line"> return i</span><br><span class="line"> </span><br></pre></td></tr></table></figure></p>
<p>Url()用来二次处理编码,urllib处理后还是有些编码不符合要求.<br>Ali_Sign()对参数进行签名,返回签名值<br>Ali_Public()获取公共参数.我这里的唯一随机数采用两个随机数和一个”-“连接,增加重复难度.<br>Ali_Domain_GetRecord()获取指定二级域名解析记录的详细记录.</p>
<p>PS:代码比较挫而且没有对大部分异常或者错误进行处理.大家可以拿回去改改.</p>
</div>
</div>
</article>
<article id="post-Tessert-OCR-Linux下的编译以及训练" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2016/06/13/Tessert-OCR-Linux下的编译以及训练/">Tessert-OCR Linux下的训练</a>
</h1>
</header>
<div class="article-meta">
<a href="/2016/06/13/Tessert-OCR-Linux下的编译以及训练/" class="article-date">
<time datetime="2016-06-13T13:44:34.000Z" itemprop="datePublished">2016-06-13</time>
</a>
</div>
<div class="article-entry" itemprop="articleBody">
<h2 id="工具及源码下载地址"><a href="#工具及源码下载地址" class="headerlink" title="工具及源码下载地址:"></a>工具及源码下载地址:</h2><p>Tessert-OCR :<a href="https://github.com/tesseract-ocr/tesseract" target="_blank" rel="external">https://github.com/tesseract-ocr/tesseract</a><br>编译教程:<a href="https://github.com/tesseract-ocr/tesseract/wiki/Compiling" target="_blank" rel="external">https://github.com/tesseract-ocr/tesseract/wiki/Compiling</a><br>Leptonica:<a href="http://www.leptonica.org/download.html" target="_blank" rel="external">http://www.leptonica.org/download.html</a><br>jTessBoxEditor:<a href="http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/" target="_blank" rel="external">http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/</a><br>注:编译完tesseract后一定要编译training哦<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">make training</span><br><span class="line">sudo make install training</span><br></pre></td></tr></table></figure></p>
<p>如果遇到 <span style="color:#ff0000"><strong>configure: error: leptonica library with pdf support (>= 1.71) is missing </strong></span><br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export LIBLEPT_HEADERSDIR=/usr/local/include/leptonica/</span><br><span class="line">./configure --with-extra-libraries=/usr/local/lib</span><br></pre></td></tr></table></figure></p>
<h2 id="训练过程"><a href="#训练过程" class="headerlink" title="训练过程"></a>训练过程</h2><h4 id="1-获取训练样本"><a href="#1-获取训练样本" class="headerlink" title="1.获取训练样本"></a>1.获取训练样本</h4><p>就是一堆你要识别的验证码图片,越多越好.越多后面工作量越大.</p>
<h4 id="2-合并样本"><a href="#2-合并样本" class="headerlink" title="2.合并样本"></a>2.合并样本</h4><p>运行jTessBoxEditor工具,在点击菜单栏中Tools—>Merge TIFF。在弹出的对话框中选择样本图像(按Shift选择多张),合并成一个tif文件<br>注:文件名格式<br>[lang].[fontname].exp[num].tif<br><span style="color:#ff0000"><strong>其中lang为语言名称,fontname为字体名称,num为序号,可以随便定义。</strong></span><br>我们这使用 num.font.exp0</p>
<h4 id="3-生成Box-File文件"><a href="#3-生成Box-File文件" class="headerlink" title="3.生成Box File文件"></a>3.生成Box File文件</h4><p>tesseract.exe num.font.exp0.tif num.font.exp0 batch.nochop makebox<br>注意替换为自己的文件名</p>
<h4 id="4-文字校正"><a href="#4-文字校正" class="headerlink" title="4.文字校正"></a>4.文字校正</h4><p>1.把上一步生成的num.font.exp0.box和原本的num.font.exp0.tif放在同一个文件夹里.一般都是在一个文件夹里的<br>2.运行jTessBoxEditor.选择box editor 然后点击open 然后选中num.font.exp0.tif,把所有错误的识别修改为正确的识别,如果有识别不出来的,需要踢出训练库从第二步重新开始<br>3.保存文件</p>
<h4 id="5-定义字体特征文件"><a href="#5-定义字体特征文件" class="headerlink" title="5.定义字体特征文件"></a>5.定义字体特征文件</h4><p>Tesseract-OCR3.01以上的版本在训练之前需要创建一个名称为font_properties的字体特征文件。<br>font_properties不含有BOM头,文件内容格式如下:<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><fontname> <italic> <bold> <fixed> <serif> <fraktur></span><br></pre></td></tr></table></figure><br>其中fontname为字体名称,必须与[lang].[fontname].exp[num].box中的名称保持一致。italic,bold,fixed,serif,fraktur的取值为1或0,表示字体是否具有这些属性。<br>这里在样本图片所在目录下创建一个名称为font_properties的文件,用记事本打开,输入以下下内容:<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">font 0 0 0 0 0</span><br></pre></td></tr></table></figure></p>
<h4 id="6-生成语言文件"><a href="#6-生成语言文件" class="headerlink" title="6.生成语言文件"></a>6.生成语言文件</h4><p>windows 脚本<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">rem 执行改批处理前先要目录下创建font_properties文件</span><br><span class="line"></span><br><span class="line">echo Run Tesseract for Training..</span><br><span class="line">tesseract.exe num.font.exp0.tif num.font.exp0 nobatch box.train</span><br><span class="line"></span><br><span class="line">echo Compute the Character Set..</span><br><span class="line">unicharset_extractor.exe num.font.exp0.box</span><br><span class="line">mftraining -F font_properties -U unicharset -O num.unicharset num.font.exp0.tr</span><br><span class="line"></span><br><span class="line">echo Clustering..</span><br><span class="line">cntraining.exe num.font.exp0.tr</span><br><span class="line"></span><br><span class="line">echo Rename Files..</span><br><span class="line">rename normproto num.normproto</span><br><span class="line">rename inttemp num.inttemp</span><br><span class="line">rename pffmtable num.pffmtable</span><br><span class="line">rename shapetable num.shapetable</span><br><span class="line"></span><br><span class="line">echo Create Tessdata..</span><br><span class="line">combine_tessdata.exe num.</span><br></pre></td></tr></table></figure></p>
<p>Linux 命令<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">tesseract num.font.exp0.tif num.font.exp0 nobatch box.train</span><br><span class="line">unicharset_extractor num.font.exp0.box</span><br><span class="line">mftraining -F font_properties -U unicharset -O num.unicharset num.font.exp0.tr </span><br><span class="line">cntraining num.font.exp0.tr</span><br><span class="line">mv normproto num.normproto</span><br><span class="line">mv inttemp num.inttemp</span><br><span class="line">mv pffmtable num.pffmtable</span><br><span class="line">mv shapetable num.shapetable</span><br><span class="line">combine_tessdata num.</span><br></pre></td></tr></table></figure></p>
<p>注:unicharset_extractor,cntraining,combine_tessdata在之前编译的tessert目录下的training目录下<br>所以一定要编译training 不要只编译tesseract哦</p>
<p><span style="color:#ff0000"><strong>执行命令后需确认打印结果中的Offset 1、3、4、5、13这些项不是-1。</strong></span>这样,一个新的语言文件就生成了。<br>num.traineddata便是最终生成的语言文件,将生成的num.traineddata拷贝到tessdata目录下。可以用它来进行字符识别了。</p>
<p>参考文章:<br><a href="http://blog.csdn.net/yasi_xi/article/details/8763385" target="_blank" rel="external">http://blog.csdn.net/yasi_xi/article/details/8763385</a></p>
</div>
</div>
</article>
<article id="post-Solr配置" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/2016/04/17/Solr配置/">Solr配置</a>
</h1>
</header>
<div class="article-meta">
<a href="/2016/04/17/Solr配置/" class="article-date">
<time datetime="2016-04-17T09:47:09.000Z" itemprop="datePublished">2016-04-17</time>
</a>
</div>
<div class="article-entry" itemprop="articleBody">
<p>配置参考:<a href="http://www.freebuf.com/articles/database/100423.html" target="_blank" rel="external">http://www.freebuf.com/articles/database/100423.html</a></p>
<p>现在说两个注意点</p>
<p>1.如果搭配<font color="red"><strong>Mysql</strong></font><br>打开db-data-config.xml,会看到一段类似的代码<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><dataConfig></span><br><span class="line"><dataSource name="sgk" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/newsgk" user="root" password="password" batchSize="-1" /></span><br><span class="line"></dataConfig></span><br></pre></td></tr></table></figure></p>
<p><font color="red"><strong>batchSize设置为-1</strong></font>(是否可以设置其他值没有尝试),否则一直无法导入</p>
<p>2.如果搭配<font color="red"><strong>MSSQL</strong></font><br>请<font color="red"><strong>不要设置batchSize</strong></font>(是否可以设置其他值没有尝试,-1是不可行的),否则无法导入</p>
</div>
</div>
</article>
</section>
<footer id="footer">
<div class="outer">
<div id="footer-info" class="inner">
© 2016 FallenAngel
Powered by <a href="http://hexo.io/" target="_blank">Hexo</a>, theme by <a href="http://github.com/ppoffice">PPOffice</a>
</div>
</div>
</footer>
<script src="/js/jquery.min.js"></script>
<script src="/js/script.js"></script>
</div>
</body>
</html>