diff --git a/.classpath b/.classpath index 054cdd5..c1667f6 100644 --- a/.classpath +++ b/.classpath @@ -6,11 +6,11 @@ - + diff --git a/.settings/org.eclipse.core.resources.prefs b/.settings/org.eclipse.core.resources.prefs new file mode 100644 index 0000000..d2e9a5a --- /dev/null +++ b/.settings/org.eclipse.core.resources.prefs @@ -0,0 +1,2 @@ +eclipse.preferences.version=1 +encoding//testdata/doccn/dongxiaoutf8-2.txt=UTF-8 diff --git a/README.md b/README.md index 9ec83ad..5155277 100644 --- a/README.md +++ b/README.md @@ -22,8 +22,8 @@ 本系统在它们基础上进行了二次开发和封装,针对moss系统,开发出了客户端存取模块,实现了代码文件提交、结果获取和解析、结果排序等功能;针对sim和jplag,则将其集成到系统中,在moss因网络故障等原因不可用时,可作为替代产品使用。 中英文文档作业相似度的比较则基于[shinglecloud算法](https://www.kom.tu-darmstadt.de/de/research-results/0/1/shinglecloud/)(一种基于文本指纹的、语言无关的相似度快速计算方法),文档主要处理过程如下: -1. 使用tika读取不同格式(txt、doc、docx等)的文档,并将其转换成能统一处理的文本; -2. 使用ikanalyzer对文本进行预处理、精确分词; +1. 使用tika读取不同格式(txt、doc、docx、pdf、html等)不同编码文件中的文本内容,并将其转换成能统一处理的文本; +2. 使用hanlp对文本进行预处理、分词; 3. 使用shinglecloud算法计算文本之间的相似度; 4. 根据相似度排序,输出比较结果。 @@ -33,6 +33,9 @@ 3. [Winnowing: Local Algorithms for Document Fingerprinting](http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf) moss系统采用的核心算法 4. [软件抄袭检测研究综述](https://faculty.ist.psu.edu/wu/papers/spd-survey-16.pdf) +## 更新情况 +1. 2019.12.1 使用hanlp作为分词组件,增加支持pdf、html文件文本的查重,修复若干bug,发布v2.8.6版。 + ## TODO 1. 将jplag整合进系统。已实现。 2. 支持html,jsp文件的查重。 diff --git a/bin/.gitignore b/bin/.gitignore index 6debac6..e960251 100644 --- a/bin/.gitignore +++ b/bin/.gitignore @@ -1,2 +1,2 @@ -/utils/ +/preprocess/ /gui/ diff --git a/bin/gui/plag/edu/FileConvertFrame$4.class b/bin/gui/plag/edu/FileConvertFrame$4.class index 3d5219e..b66aa2f 100644 Binary files a/bin/gui/plag/edu/FileConvertFrame$4.class and b/bin/gui/plag/edu/FileConvertFrame$4.class differ diff --git a/bin/gui/plag/edu/FileConvertFrame.class b/bin/gui/plag/edu/FileConvertFrame.class index a5bb7c8..51e4e7c 100644 Binary files a/bin/gui/plag/edu/FileConvertFrame.class and b/bin/gui/plag/edu/FileConvertFrame.class differ diff --git a/bin/gui/plag/edu/PlagGUI.class b/bin/gui/plag/edu/PlagGUI.class index 964a44c..af337f3 100644 Binary files a/bin/gui/plag/edu/PlagGUI.class and b/bin/gui/plag/edu/PlagGUI.class differ diff --git a/bin/preprocess/plag/edu/IKAnalyzer.class b/bin/preprocess/plag/edu/IKAnalyzer.class deleted file mode 100644 index 1d3e3cf..0000000 Binary files a/bin/preprocess/plag/edu/IKAnalyzer.class and /dev/null differ diff --git a/bin/preprocess/plag/edu/TextExtractor.class b/bin/preprocess/plag/edu/TextExtractor.class index 0297362..9a1ac00 100644 Binary files a/bin/preprocess/plag/edu/TextExtractor.class and b/bin/preprocess/plag/edu/TextExtractor.class differ diff --git a/bin/shingle/plag/edu/ShingleSim$Fileter.class b/bin/shingle/plag/edu/ShingleSim$Fileter.class index f3c2ba1..89c83ef 100644 Binary files a/bin/shingle/plag/edu/ShingleSim$Fileter.class and b/bin/shingle/plag/edu/ShingleSim$Fileter.class differ diff --git a/bin/shingle/plag/edu/ShingleSim.class b/bin/shingle/plag/edu/ShingleSim.class index d882a38..8eef08c 100644 Binary files a/bin/shingle/plag/edu/ShingleSim.class and b/bin/shingle/plag/edu/ShingleSim.class differ diff --git a/help.txt b/help.txt index 9966e05..f87fe3c 100644 --- a/help.txt +++ b/help.txt @@ -1,4 +1,4 @@ -ҵϵͳʹð(v2.8.2) +ҵϵͳʹð(v2.8.6) һ ϵͳwindow10jdk11 64λвͨԡ @@ -19,7 +19,8 @@ 2 ĵıƶȼ ĵIJͳļⲽһ£ֻǡѡҵʱѡĵҵ -磺testdata/doccnµҵĵļչtxtdocdocxеһ֣ +磺testdata/doccnµҵĵļչtxtdocdocxpdfhtml +еһ֡ ҵǡıҵȻִбȽϡťȴȷϴ鿴 ťϵͳ򿪡ȽϽڣԲ鿴ȽϽ ıĵıȽĿǰݲ֧ͨҳпӻԱȡ diff --git a/lib/IKAnalyzer2012_u6.jar b/lib/IKAnalyzer2012_u6.jar deleted file mode 100644 index e3d9aa6..0000000 Binary files a/lib/IKAnalyzer2012_u6.jar and /dev/null differ diff --git a/lib/hanlp-portable-1.7.5.jar b/lib/hanlp-portable-1.7.5.jar new file mode 100644 index 0000000..a1b9db2 Binary files /dev/null and b/lib/hanlp-portable-1.7.5.jar differ diff --git a/out.txt b/out.txt index b28741f..d3dfe10 100644 --- a/out.txt +++ b/out.txt @@ -1,2 +1,57 @@ -1 8.0% testdata\python\stu1_demo.py testdata\python\stu1_lprcmd.py -from stanford:http://moss.stanford.edu/results/874773796 Fri Oct 25 19:19:17 CST 2019 \ No newline at end of file +1 99.51535% dongxiao-2.doc dongxiaogbk.txt +2 92.47312% gumingzhu-2.doc zhucuiyun_2.doc +3 91.408936% wangmeng-2.doc zhucuiyun_2.doc +4 87.63636% dongxiao-2.docx dongxiaoutf8-2.txt +5 84.717606% gumingzhu-2.doc wangmeng-2.doc +6 84.310844% dongxiao-2.doc dongxiao-2.pdf +7 84.168015% dongxiao-2.doc dongxiaoutf8-2.txt +8 83.870964% dongxiao-2.pdf dongxiaogbk.txt +9 83.68336% dongxiaogbk.txt dongxiaoutf8-2.txt +10 82.954544% dongxiao-2.docx dongxiaogbk.txt +11 82.552505% dongxiao-2.doc dongxiao-2.docx +12 75.74404% lijie-2.doc wangmeng-2.doc +13 74.96063% gumingzhu-2.doc wuchangqing-2.doc +14 71.703705% dongxiao-2.pdf dongxiaoutf8-2.txt +15 71.49254% dongxiao-2.docx dongxiao-2.pdf +16 69.92366% wuchangqing-2.doc zhucuiyun_2.doc +17 68.584076% lijie-2.doc zhucuiyun_2.doc +18 65.61151% wangmeng-2.doc wuchangqing-2.doc +19 65.12301% gumingzhu-2.doc lijie-2.doc +20 57.454544% dongxiaogbk.txt meitao-2.doc +21 57.246376% dongxiao-2.doc meitao-2.doc +22 52.258064% lijie-2.doc wuchangqing-2.doc +23 50.757576% dongxiao-2.docx meitao-2.doc +24 50.284416% dongxiao-2.pdf meitao-2.doc +25 48.87218% makai2.doc wangxuan_2.doc.doc +26 48.45869% dongxiaoutf8-2.txt meitao-2.doc +27 46.67074% liuchuanyang-2.doc tangwenpeng-2.doc +28 41.64096% heliwen_2.doc liufan_2.doc +29 40.54834% liufan_2.doc wangchunming_2.doc +30 38.75061% gechunlong-2.doc hanchao_2.doc +31 36.930233% luxiang-2.doc tangwenpeng-2.doc +32 36.89095% jiangfeng-2.doc lijie-2.doc +33 35.925926% weixiao-2.doc yinxu-2.doc +34 35.424637% liuchuanyang-2.doc wuliangchao-2.doc +35 35.039577% gechunlong-2.doc yinxu-2.doc +36 34.839073% gechunlong-2.doc weixiao-2.doc +37 34.325184% wangmeng-2.doc wuliangchao-2.doc +38 34.069096% guozhiquan -2.doc wuliangchao-2.doc +39 33.98907% wuliangchao-2.doc zhucuiyun_2.doc +40 32.858547% tangwenpeng-2.doc xuqiwei-2.doc +41 32.557137% tangwenpeng-2.doc wangchen-2.doc +42 32.296955% liuchuanyang-2.doc yinxu-2.doc +43 32.073547% lijie-2.doc wuliangchao-2.doc +44 32.070206% gechunlong-2.doc wangchen-2.doc +45 32.058823% jiangfeng-2.doc yinpeiyan_2.doc +46 31.946404% sunxiaolei-2.doc wangchunming_2.doc +47 31.471535% gumingzhu-2.doc wuliangchao-2.doc +48 30.698889% sunxiaolei-2.doc yinxu-2.doc +49 30.651136% liuchuanyang-2.doc xuqiwei-2.doc +50 30.63007% heliwen_2.doc wangchunming_2.doc +51 30.559345% liuchuanyang-2.doc weixiao-2.doc +52 30.494392% wangchen-2.doc xuqiwei-2.doc +53 30.429863% tangwenming-2.doc xuqiwei-2.doc +54 30.424183% tangwenming-2.doc wangchen-2.doc +55 30.095451% sunxiaolei-2.doc tangwenpeng-2.doc +56 30.065361% guozhiquan -2.doc liuchuanyang-2.doc +from fh Sun Dec 01 18:57:44 CST 2019 \ No newline at end of file diff --git a/src/gui/plag/edu/FileConvertFrame.java b/src/gui/plag/edu/FileConvertFrame.java index 7777095..180d6ff 100644 --- a/src/gui/plag/edu/FileConvertFrame.java +++ b/src/gui/plag/edu/FileConvertFrame.java @@ -121,11 +121,14 @@ public void actionPerformed(ActionEvent arg0) { if("python".equals(type)) { filter[0]="**/*.py"; } - if("doc".equals(type)){ //ĵ֧ͣdoc txt docx - filter = new String[3]; + if("doc".equals(type)){ //ĵ֧ͣdoc txt docx pdf html + filter = new String[6]; filter[0] = "**/*.doc"; filter[1] = "**/*.txt"; filter[2] = "**/*.docx"; + filter[3] = "**/*.pdf"; + filter[4] = "**/*.html"; + filter[5] = "**/*.htm"; } String[] filestrs = AntFile.scanFiles(srcf, filter); //غĿ¼ļ diff --git a/src/gui/plag/edu/PlagGUI.java b/src/gui/plag/edu/PlagGUI.java index 311f6a1..d3d9b60 100644 --- a/src/gui/plag/edu/PlagGUI.java +++ b/src/gui/plag/edu/PlagGUI.java @@ -133,7 +133,7 @@ public void stateChanged(ChangeEvent arg0) { panel.add(radBntProgram); radBntText = new JRadioButton("\u6587\u672C\u4F5C\u4E1A"); - radBntText.setToolTipText("\u652F\u6301\u6587\u6863\u7C7B\u578B\uFF1Adoc docx txt"); + radBntText.setToolTipText("\u652F\u6301\u6587\u6863\u7C7B\u578B\uFF1Adoc docx txt pdf html\u7B49"); radBntText.addChangeListener(new ChangeListener() { public void stateChanged(ChangeEvent arg0) { //ıҵťѡ diff --git a/src/preprocess/plag/edu/IKAnalyzer.java b/src/preprocess/plag/edu/IKAnalyzer.java deleted file mode 100644 index b99cdc2..0000000 --- a/src/preprocess/plag/edu/IKAnalyzer.java +++ /dev/null @@ -1,70 +0,0 @@ -package preprocess.plag.edu; - -import java.io.IOException; -import java.io.StringReader; - -import org.wltea.analyzer.core.IKSegmenter; -import org.wltea.analyzer.cfg.*; -import org.wltea.analyzer.core.Lexeme; -/** - * 2013.7.25 ʹֲ - * 1 IKAnalyzer2012_u6.jar ,jar ѾԴֵ - * 2 IKAnalyzer.cfg.xmlstopword.dicĿ· - * 3 - * òûȥͣô a - * ܣcpuɼڴռò - * һչʿ⡢ͣô - * IKAnalyzer.cfg.xmlstopword.dic\binĿ¼ - * ԭҪֵַƥдʣܷʽǾȷִʣȥֱ,ӢͳһСд - */ -public class IKAnalyzer { - - /** - * @param args - */ - public static void main(String[] args) { - // TODO Auto-generated method stub - Configuration cfg = DefaultConfig.getInstance(); - System.out.println("main dic:"+cfg.getMainDictionary()); - System.out.println("ext dic:"+cfg.getExtDictionarys()); - System.out.println("stopword dic:"+cfg.getExtStopWordDictionarys()); - - - IKSegmenter ik = new IKSegmenter(new StringReader("a Hello " + - " л񹲺͹ 'world java('"2013꣨,: 19:28 " + - "Ansjķִһictʵ.ҼԼһЩݽṹ㷨ķִ.ʵ˸Чʺ͸׼ȷʵ!" ),true); - Lexeme le = null; - - try { - while((le=ik.next())!=null){ - System.out.print(le.getLexemeText()+"|" ); - } - } catch (IOException e) { - // TODO Auto-generated catch block - e.printStackTrace(); - } - - System.out.println(ik.toString()); - } - public static String segment(String str,boolean bsmart){ - - return segment(str,bsmart,""); - } - public static String segment(String str,boolean bsmart,String split){ - if(str!=null){ - IKSegmenter ik = new IKSegmenter(new StringReader(str),bsmart); - Lexeme le = null; - StringBuffer sb = new StringBuffer(); - try { - while((le=ik.next())!=null){ - sb.append(le.getLexemeText()+split); - } - } catch (IOException e) { - // TODO Auto-generated catch block - e.printStackTrace(); - } - return sb.toString(); - } - return null; - } -} diff --git a/src/preprocess/plag/edu/TextExtractor.java b/src/preprocess/plag/edu/TextExtractor.java index 4f566e5..bbd33ea 100644 --- a/src/preprocess/plag/edu/TextExtractor.java +++ b/src/preprocess/plag/edu/TextExtractor.java @@ -37,7 +37,11 @@ public static String getTxt(File f) { try { is = new FileInputStream(f); Tika tika = new Tika(); - String str = tika.parseToString(new FileInputStream(f)); + Metadata metadata = new Metadata(); + metadata.set(Metadata.RESOURCE_NAME_KEY, f.getName()); //gbktxtıȡ + String str = tika.parseToString(new FileInputStream(f),metadata); + // System.out.println(f.getName()); + // System.out.println(str); return str; } catch (FileNotFoundException e) { @@ -100,7 +104,10 @@ public static String fileToTxt(File f,Metadata metadata) { */ public static void main(String[] args) { // TODO Auto-generated method stub - File f = new File("D:\\fh\\ѧ\\201302\\\\ѧύҵ\\һҵ\\sunxiaolei-1.doc"); + // File f = new File("./testdata/doccn/dongxiao-2.doc"); + File f = new File("./testdata/doccn/dongxiao-2.pdf"); + // File f = new File("./testdata/doccn/dongxiaogbk.txt"); + // File f = new File("./testdata/doccn/dongxiaoutf8-2.txt"); System.out.println(TextExtractor.getTxt(f)); Metadata metadata = new Metadata(); System.out.println(TextExtractor.fileToTxt(f,metadata)); diff --git a/src/preprocess/plag/edu/Tokenizer.java b/src/preprocess/plag/edu/Tokenizer.java new file mode 100644 index 0000000..7d9037b --- /dev/null +++ b/src/preprocess/plag/edu/Tokenizer.java @@ -0,0 +1,54 @@ +package preprocess.plag.edu; + +import java.util.List; + +import com.hankcs.hanlp.HanLP; +import com.hankcs.hanlp.dictionary.CustomDictionary; +import com.hankcs.hanlp.seg.common.Term; +import com.hankcs.hanlp.tokenizer.NotionalTokenizer; + +public class Tokenizer { + //ַתָָķִʹַ + public static String segment(String text,String sep) { + StringBuilder sb = new StringBuilder(); + HanLP.Config.Normalization = true; //->壬ȫ->ǣд->Сд + List tokens = NotionalTokenizer.segment(text);//ִʣȥͣô + for(Term token : tokens) { + sb.append(token.word+sep); + } + return sb.toString(); + } + + public static void main(String[] args) { + // TODO Auto-generated method stub + HanLP.Config.Normalization = true; //->壬ȫ->ǣд->Сд + CustomDictionary.insert("4G", "nz 1000"); + String text = "i am from china.Сеķιèеľȴ޳ɡιЩС,i will go back HomeҐ "; + System.out.println(text); + //ȷִ + List tokens = HanLP.segment(text); + System.out.println(tokens); // ͣôʵλdata/dictionary/stopwords.txt޸ + for (Term token : tokens) { + System.out.print("("+token.word+","+token.offset+","+token.length()+")"); + + } + System.out.println(); + // Զȥͣô,ᶪʧԭļеλϢ + tokens = NotionalTokenizer.segment(text); + System.out.println(tokens); // ͣôʵλdata/dictionary/stopwords.txt޸ + for (Term token : tokens) { + System.out.print("("+token.word+","+token.offset+","+token.length()+")"); + + } + System.out.println(); + // ԶϾ+ȥͣô + for (List sentence : NotionalTokenizer.seg2sentence(text)) + { + System.out.println(sentence); + } + //ӢеͣôҲᱻȥ + String str = Tokenizer.segment(text," "); + System.out.println(str); + } + +} diff --git a/src/shingle/plag/edu/ShingleSim.java b/src/shingle/plag/edu/ShingleSim.java index c9424bd..5b1c776 100644 --- a/src/shingle/plag/edu/ShingleSim.java +++ b/src/shingle/plag/edu/ShingleSim.java @@ -14,26 +14,16 @@ import java.util.Collections; import java.util.List; -import preprocess.plag.edu.IKAnalyzer; import preprocess.plag.edu.TextExtractor; +import preprocess.plag.edu.Tokenizer; import utils.edu.FileIO; -//import sim.edu.TestWinnowing.Fileter; -//import preprocess.plag.edu; import data.plag.edu.SimData; -import de.tud.kom.stringmatching.gst.GST; -import de.tud.kom.stringmatching.gst.GSTTile; -import de.tud.kom.stringmatching.gst.utils.GSTHighlighter; import de.tud.kom.stringmatching.shinglecloud.ShingleCloud; -import de.tud.kom.stringmatching.shinglecloud.ShingleCloudMatch; -import de.tud.kom.stringutils.preprocessing.WhiteSpaceRemovalPreprocessing; -import de.tud.kom.stringutils.tokenization.CharacterTokenizer; -import de.tud.kom.stringutils.tokenization.WordTokenizer; -//import fengci.edu.IKAnalyzer; public class ShingleSim { String dic = null; //ҵ· - float threshold = 0.3f; //0.3 + float threshold = 0.3f; //Ĭ0.3 List filels = new ArrayList(); //ҪȽϵļ List listsd = new ArrayList(); //ļȽϵĽ @@ -53,14 +43,17 @@ public void explore(File file) { } } - // ʵļ˽ӿڣڲ෽ʽ,ֻdoctxtdocxļĿ¼ + // ʵļ˽ӿڣڲ෽ʽ,ֻdoctxtdocxpdfļĿ¼ class Fileter implements FileFilter { @Override public boolean accept(File arg0) { // TODO Auto-generated method stub - if (arg0.getName().endsWith(".doc") // - || arg0.getName().endsWith(".txt") - || arg0.getName().endsWith(".docx") || arg0.isDirectory()) + String fn = arg0.getName().toLowerCase(); + if (fn.endsWith(".doc") // + || fn.endsWith(".txt") + || fn.endsWith(".docx") + || fn.endsWith(".pdf") + || arg0.isDirectory()) return true; return false; } @@ -70,7 +63,8 @@ public String processZHText(File file){ String resstr=null; try { String str = TextExtractor.getTxt(file); - resstr = IKAnalyzer.segment(str,true," "); //ִܷʡͣôʹˣոֿ + //resstr = IKAnalyzer.segment(str,true," "); //ִܷʡͣôʹˣոֿ + resstr = Tokenizer.segment(str," "); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); diff --git a/testdata/doccn/dongxiao-2.docx b/testdata/doccn/dongxiao-2.docx new file mode 100644 index 0000000..f858587 Binary files /dev/null and b/testdata/doccn/dongxiao-2.docx differ diff --git a/testdata/doccn/dongxiao-2.pdf b/testdata/doccn/dongxiao-2.pdf new file mode 100644 index 0000000..dbed521 Binary files /dev/null and b/testdata/doccn/dongxiao-2.pdf differ diff --git a/testdata/doccn/dongxiaogbk.txt b/testdata/doccn/dongxiaogbk.txt new file mode 100644 index 0000000..e5c91dc --- /dev/null +++ b/testdata/doccn/dongxiaogbk.txt @@ -0,0 +1,41 @@ +ʶ壨ƸӢļij12x2 +1.ԣSoftware Testinghttps://zh.wikipedia.org/wiki/%E8%BD%AF%E4%BB%B6%E6%B5%8B%E8%AF%95 壺ҲΪ˶ж P18 + 2. Ԫ: unit testing http://www.igsgroup.com.cn/common/ISTQB%E8%BD%AF%E4%BB%B6%E6%B5%8B%E8%AF%95%E4%B8%93%E4%B8%9A%E6%9C%AF%E8%AF%AD%E5%AF%B9%E7%85%A7%E8%A1%A8v2.1.pdf + 壺ϸƹ˵飬ģҪ·Ʋģڲ P94 + 3. ɲ: integration testing ͬ 壺ڵԪԵĻϣгģ򡢵IJԣԪ򲿼Ľӿڹϵʹ֮Ҫ P25 + 4. ϵͳԣsystem testing ͬ 壺ԼɵӲϵͳеIJ P26 + 5. ղ: acceptance testing ͬ 壺ĿҪͺͬ˫ǩĵеIJԺ P26 + 6. ܲԣfunctional testing ͬ 壺ܲԾǶԲƷĸ֤ܽݹܲԣƷǷﵽûҪĹܡ http://baike.baidu.com/view/651435.htm + 7. ںвԣblack-box testing ͬ 壺δ֪ڲṹеIJ P26 + 8. ׺вԣwhite-box testing ͬ 壺֪ڲṹеIJ P26 + 9. ܲԣperformance testing ͬ 壺ڼϵͳеܡP135 + 10. ԣtesting 壺ԼеƷв P158 + 11.CMMCapability Maturity Model for Software ģ http://baike.baidu.com/view/8110.htm 壺֯ڶ塢ʵʩƺ͸̵ʵиչ׶ε http://baike.baidu.com/view/8110.htm + 12. ISO9000ϵ׼ 壺TC176ϵίԱᣩƶйʱ׼ http://baike.baidu.com/view/9486.htm +⣺2x12 +1 ںвԺͰ׺вԵЩʹúںвԸ׷?Щʹð׺вԸ׷֣2 +ںвDz֪ڲṹ׺в֪ڲṹ +ںвԱڷ1Ƿвȷ©Ĺܣ2ڽӿϣǷȷĽܣܷȷĽ +׺вڷ֣1е߼жȡ桱ȡ١ٲһ顣2ѭı߽еĽִѭ +http://zhidao.baidu.com/question/13988876.html +2 ɲԺϵͳԵϵ +P132 ɲԶģĽӿڣϵͳԶϵͳɲԺϵͳԶõںв +ʴ⣺(52) +1 (10)ٲģͣԼľĿش⣺ + ٲģͣоͼƻơ롪ԡά http://baike.baidu.com/view/551037.htm +1 ʵĿЩ׶Σȼ򵥲Ŀ + һƱϵͳһʼʦ˵Ҫ󣨿оͷͬѧ󣬿ʼʦҪʲôȻиӦ뷨ƣʼ루룩ûбܲУԣ +2 ΪԱдΪҪ3׶Σ˵ԭ + ƣ롣ֻ֪ԼҪʲô֪ԼҪʲôƣиģӣ֪ôŪ룬ȻdzԱܽгԱ + +2 (12)дԵ2ֲͬ壬ָǵϲһ֣Ϊʲô +һ֣P18 Bill Hetzel ԵĿIJΪ˷ȱݺʹҲǶж +ڶ֣ P18 Grenford J.Myers Ϊ֤д֤޴ +ڶƬ㡣ԭ򣺵һȫ㣬Ϊǿ϶дģûbugģҸϲڶ + +3 (30)Vģͣ˵ԹǴĸ׶οʼģϾĿʵĿоЩԽ׶ΣЩ͵IJԣ繦ܡڰ׺еȣΪĸԽ׶ҪΪʲôP30 + ûҪϸ롪Ԫԡɲԡϵͳԡղ + Ʊϵͳÿ࣬Ū֮󣬿϶ȼûдл᲻ᱨԪԣһЩࡢĵãܱܲãɲԣһԺƱIJԣܲܳɹϵͳԣʦղԣ + ׺вԣ֮һûʲô + ںвԣʦʱûЧ + Ҿû󣬾ͺҪΪԽ緢֣ʧԽС diff --git a/testdata/doccn/dongxiaoutf8-2.txt b/testdata/doccn/dongxiaoutf8-2.txt new file mode 100644 index 0000000..952f2ff --- /dev/null +++ b/testdata/doccn/dongxiaoutf8-2.txt @@ -0,0 +1,36 @@ +姓名:董晓 学号:112127130103 + +名词定义(中文名称给出英文及定义的出处12x2) +1.软件测试:Software Testing——出处:https://zh.wikipedia.org/wiki/%E8%BD%AF%E4%BB%B6%E6%B5%8B%E8%AF%95 定义:发现软件错误,也是为了对软件质量进行度量和评估 出处: P18 + 2. 单元测试: unit testing 出处:http://www.igsgroup.com.cn/common/ISTQB%E8%BD%AF%E4%BB%B6%E6%B5%8B%E8%AF%95%E4%B8%93%E4%B8%9A%E6%9C%AF%E8%AF%AD%E5%AF%B9%E7%85%A7%E8%A1%A8v2.1.pdf + 定义:依据详细设计规格说明书,对模块内所有重要控制路径设计测试用例,来发现模块内部错误 P94 + 3. 集成测试: integration testing 出处同上 定义:在单元测试的基础上,将所有程序模块进行有序、递增的测试,检验程序单元或部件的接口关系,使之符合要求 P25 + 4. 系统测试:system testing 出处同上 定义:对集成的软件和硬件系统进行的测试 P26 + 5. 验收测试: acceptance testing 出处同上 定义:按照项目要求和合同,供需双方签订的验收文档进行的测试和评审 P26 + 6. 功能测试:functional testing 出处同上 定义:功能测试就是对产品的各功能进行验证,根据功能测试用例,逐项测试,检查产品是否达到用户要求的功能。 出处: http://baike.baidu.com/view/651435.htm + 7. 黑盒测试:black-box testing 出处同上 定义:未知程序内部结构进行的测试 P26 + 8. 白盒测试:white-box testing 出处同上 定义:已知程序内部结构进行的测试 P26 + 9. 性能测试:performance testing 出处同上 定义:用来测试软件在集成系统中的运行性能。P135 + 10. α测试:αtesting 定义:对即将面市的软件产品进行测试 P158 + 11.CMM:Capability Maturity Model for Software 能力成熟度模型 http://baike.baidu.com/view/8110.htm 定义:对于软件组织在定义、实施、度量、控制和改善其软件过程的实践中各个发展阶段的描述 http://baike.baidu.com/view/8110.htm + 12. ISO9000:质量管理体系标准 定义:由TC176(质量管理体系技术委员会)制定的所有国际标准。 http://baike.baidu.com/view/9486.htm +简答题:(2x12) +1 黑盒测试和白盒测试的区别?哪些错误使用黑盒测试更容易发现?哪些错误使用白盒测试更容易发现?各举2例。 +黑盒测试是不知道软件程序内部结构,白盒测试是知道软件程序内部结构。 +黑盒测试便于发现1、是否有不正确或遗漏的功能?2、在接口上,输入是否能正确的接受?能否输出正确的结果? +白盒测试易于发现:1、对所有的逻辑判定,取“真”与取“假”的两种情况都能至少测一遍。2、在循环的边界和运行的界限内执行循环体 +http://zhidao.baidu.com/question/13988876.html +2 集成测试和系统测试的区别和联系? +P132 集成测试对象是模块间的接口,系统测试对象是整个系统。集成测试和系统测试都用到黑盒测试 +问答题:(52) +1 (10)描述软件开发的瀑布模型,并结合自己参与的具体项目,回答以下问题: + 瀑布模型:可行性研究和计划—需求分析—设计—编码—测试—运行维护 http://baike.baidu.com/view/551037.htm +(1) 实际项目开发经历了哪些阶段?(先简单阐述所做的项目) + 做一个航空售票系统。一开始老师说要求(可行性研究和分析),同学们听见后,开始分析老师想要什么东西(需求分析),然后脑子里大概有个相应的想法(设计),开始打代码(编码),最后检查有没有报错,看能不能运行(测试) +(2) 作为程序员,依次写出你认为最重要的3个阶段,并说明原因? + 需求分析,设计,编码。需求分析,只有知道自己想要什么,才知道自己要做成什么东西;设计,有个大体的模子,才能知道该怎么弄;编码,既然是程序员,不编码能叫程序员吗。 + +2 (12)写出软件测试的2种不同定义,指出它们的区别,你喜欢哪一种?为什么? +第一种:P18 Bill Hetzel 提出测试的目的不仅仅是为了发现软件缺陷和错误,也是对软件质量进行度量和评估。以提高软件质量。 +第二种: P18 Grenford J.Myers 测试是为了证明程序有错,而不是证明程序无错误 +第二种片面点。原因:第一种提出更加全面点,因为软件是肯定有错的,不可能软件是没bug的,所以我更喜欢第二种