Skip to content

Commit

Permalink
add jplag
Browse files Browse the repository at this point in the history
  • Loading branch information
fanghon committed Oct 18, 2019
1 parent 0c83a11 commit d16047d
Show file tree
Hide file tree
Showing 91 changed files with 455 additions and 51 deletions.
7 changes: 6 additions & 1 deletion .classpath
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
<?xml version="1.0" encoding="UTF-8"?>
<classpath>
<classpathentry kind="src" path="src"/>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER">
<attributes>
<attribute name="module" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="lib" path="lib/IKAnalyzer2012_u6.jar"/>
<classpathentry kind="lib" path="lib/shinglecloud-0.51.jar"/>
<classpathentry kind="lib" path="lib/tika-app-1.4.jar"/>
<classpathentry kind="lib" path="lib/ant.jar"/>
<classpathentry kind="lib" path="lib/substance-5.3.jar"/>
<classpathentry kind="lib" path="lib/jplag-2.12.1-SNAPSHOT-jar-with-dependencies.jar"/>
<classpathentry kind="output" path="bin"/>
</classpath>
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
软件主要检查、比较学生提交的电子档文本相似度,能对程序语言(如java、c等)、中英文文档(如实验报告等)之间的文本相似度进行比较分析,输出相似度高的文档,进而辅助发现学生之间互相抄袭的行为。

## 需求
jdk1.6及以上版本
jdk11

## 安装
直接下载或clone项目源代码版,或下载软件的发布版[releases](https://github.com/fanghon/antiplag/releases)
Expand All @@ -13,11 +13,12 @@ jdk1.6及以上版本
![程序主界面](./maingui.png)

## 原理
系统采用的主要技术是自然语言处理(nlp)中的文本相似度计算。程序类文本的相似度比较基于两个开放系统:
* 一个是基于网络服务的[MOSS系统](http://theory.stanford.edu/~aiken/moss/)(斯坦福大学开放的支持多种编程语言代码相似度比较的系统);
* 另一个是本地执行的[sim系统](https://dickgrune.com/Programs/similarity_tester/)(支持java、c等语言的文本相似度比较)。
系统采用的主要技术是自然语言处理(nlp)中的文本相似度计算。程序类文本的相似度比较基于3个开放系统:
* 一是基于网络服务的[MOSS系统](http://theory.stanford.edu/~aiken/moss/)(斯坦福大学开放的支持多种编程语言代码相似度比较的系统);
* 二是本地执行的[sim系统](https://dickgrune.com/Programs/similarity_tester/)(支持java、c等语言的文本相似度比较)。
* 三是本地执行的[jplag系统](https://github.com/jplag/jplag/)(支持java、c/c++、python等语言的文本相似度比较)。

本系统在它们基础上进行了二次开发和封装,针对moss系统,开发出了客户端存取模块,实现了代码文件提交、结果获取和解析、结果排序等功能;针对sim,则将其集成到系统中,在moss因网络故障等原因不可用时,可作为替代产品使用。
本系统在它们基础上进行了二次开发和封装,针对moss系统,开发出了客户端存取模块,实现了代码文件提交、结果获取和解析、结果排序等功能;针对sim和jplag,则将其集成到系统中,在moss因网络故障等原因不可用时,可作为替代产品使用。

中英文文档作业相似度的比较则基于[shinglecloud算法](https://www.kom.tu-darmstadt.de/de/research-results/0/1/shinglecloud/)(一种基于文本指纹的、语言无关的相似度快速计算方法),文档主要处理过程如下:
1. 使用tika读取不同格式(txt、doc、docx等)的文档,并将其转换成能统一处理的文本;
Expand All @@ -26,7 +27,7 @@ jdk1.6及以上版本
4. 根据相似度排序,输出比较结果。

## TODO
1. 将jplag整合进系统。
1. 将jplag整合进系统。已实现。
2. 支持存储以往作业文档,支持基于数据库的作业查重。
2. 开发web版作业查重软件。

Expand Down
2 changes: 2 additions & 0 deletions bin/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/utils/
/gui/
Binary file modified bin/gui/plag/edu/CompareResultFrame$1.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/CompareResultFrame$2.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/CompareResultFrame$3.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/CompareResultFrame.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/FileConvertFrame$1.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/FileConvertFrame$2.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/FileConvertFrame$3.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/FileConvertFrame$4.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$1.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$2.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$3.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$4.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$5.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$6.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$7.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$8.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI$9.class
Binary file not shown.
Binary file modified bin/gui/plag/edu/PlagGUI.class
Binary file not shown.
Binary file modified bin/moss/plag/edu/DataBase.class
Binary file not shown.
Binary file modified bin/moss/plag/edu/Http.class
Binary file not shown.
Binary file modified bin/moss/plag/edu/Moss.class
Binary file not shown.
Binary file modified bin/moss/plag/edu/Text.class
Binary file not shown.
Binary file modified bin/preprocess/plag/edu/IKAnalyzer.class
Binary file not shown.
Binary file modified bin/preprocess/plag/edu/TextExtractor.class
Binary file not shown.
Binary file modified bin/shingle/plag/edu/ShingleSim.class
Binary file not shown.
Binary file modified bin/utils/edu/AntFile.class
Binary file not shown.
Binary file modified bin/utils/edu/FileIO.class
Binary file not shown.
Binary file modified bin/utils/edu/MossClient.class
Binary file not shown.
Binary file modified bin/utils/edu/StreamGobbler.class
Binary file not shown.
Binary file modified bin/utils/edu/WinCMD.class
Binary file not shown.
Binary file not shown.
39 changes: 5 additions & 34 deletions mossout.txt
Original file line number Diff line number Diff line change
@@ -1,35 +1,6 @@
Uploading .\testdata\wpsdoc\bixinghui.doc...done
Uploading .\testdata\wpsdoc\chengxi.doc...done
Uploading .\testdata\wpsdoc\chenxiaofeng.doc...done
Uploading .\testdata\wpsdoc\chenyufan.doc...done
Uploading .\testdata\wpsdoc\gaoming.doc.docx...done
Uploading .\testdata\wpsdoc\gezhongqi.doc...done
Uploading .\testdata\wpsdoc\huangkaiming.doc...done
Uploading .\testdata\wpsdoc\huangzhi.doc...done
Uploading .\testdata\wpsdoc\jihua.docx...done
Uploading .\testdata\wpsdoc\lichenguang.doc...done
Uploading .\testdata\wpsdoc\litao.docx...done
Uploading .\testdata\wpsdoc\majunxian.doc...done
Uploading .\testdata\wpsdoc\nijinhua.doc...done
Uploading .\testdata\wpsdoc\shaohaohao.doc...done
Uploading .\testdata\wpsdoc\shaoyuanxu.doc...done
Uploading .\testdata\wpsdoc\shenjie.doc...done
Uploading .\testdata\wpsdoc\sunshangxing.docx...done
Uploading .\testdata\wpsdoc\tangwenyuan.doc...done
Uploading .\testdata\wpsdoc\wangjingxuan.doc...done
Uploading .\testdata\wpsdoc\wangpeng.doc...done
Uploading .\testdata\wpsdoc\wangwei.docx...done
Uploading .\testdata\wpsdoc\wuhang.doc...done
Uploading .\testdata\wpsdoc\xutianxiu.doc...done
Uploading .\testdata\wpsdoc\yanghao.docx...done
Uploading .\testdata\wpsdoc\yangweichao.doc...done
Uploading .\testdata\wpsdoc\yankai.docx...done
Uploading .\testdata\wpsdoc\zhangsheng.doc...done
Uploading .\testdata\wpsdoc\zhangshuyang.doc...done
Uploading .\testdata\wpsdoc\zhaoxingyi.doc...done
Uploading .\testdata\wpsdoc\zhenglinpeng.doc...done
Uploading .\testdata\wpsdoc\zhengxianyang.doc...done
Uploading .\testdata\wpsdoc\zhuangyu.doc...done
Uploading .\testdata\wpsdoc\zhuchengpeng.docx...done
Uploading .\testdata\python\demo.py...done
Uploading .\testdata\python\demo1.py...done
Uploading .\testdata\python\lprcmd.py...done
Uploading .\testdata\python\lprcmd2.py...done
Query submitted. Waiting for the server's response.
http://moss.stanford.edu/results/306585337
http://moss.stanford.edu/results/116291522
2 changes: 0 additions & 2 deletions out.txt
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
1 12.066752% chengxi.doc zhuchengpeng.docx
from fh Sun Sep 22 10:06:33 CST 2019
53 changes: 48 additions & 5 deletions src/gui/plag/edu/PlagGUI.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package gui.plag.edu;

import java.awt.BorderLayout;
import java.awt.Desktop;
import java.awt.EventQueue;

import javax.swing.JFrame;
Expand All @@ -23,6 +24,9 @@
import java.awt.event.ActionListener;
import java.awt.event.ActionEvent;
import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import javax.swing.event.ChangeListener;
import javax.swing.event.ChangeEvent;
Expand All @@ -47,6 +51,8 @@ public class PlagGUI extends JFrame {
private JComboBox combMethod;
private JComboBox combLang;


WinCMD cmd;
/**
* Launch the application.
*/
Expand Down Expand Up @@ -166,7 +172,7 @@ public void stateChanged(ChangeEvent arg0) {
panel_1.add(label_1);

combLang = new JComboBox();
combLang.setModel(new DefaultComboBoxModel(new String[] {"java", "c", "csharp", "javascript"}));
combLang.setModel(new DefaultComboBoxModel(new String[] {"java", "c", "python", "csharp", "javascript"}));
combLang.setBounds(220, 51, 75, 21);
panel_1.add(combLang);

Expand All @@ -182,17 +188,24 @@ public void itemStateChanged(ItemEvent arg0) {
combLang.removeAllItems();
combLang.addItem("java");
combLang.addItem("c");
combLang.addItem("python");
combLang.addItem("csharp");
combLang.addItem("javascript");

}else if("sim".equals(method)){
combLang.removeAllItems();
combLang.addItem("java");
combLang.addItem("c");
}else if("jplag".equals(method)) {
combLang.removeAllItems();
combLang.addItem("java");
combLang.addItem("c/c++");
combLang.addItem("python3");
combLang.addItem("text");
}
}
});
combMethod.setModel(new DefaultComboBoxModel(new String[] {"moss", "sim"}));
combMethod.setModel(new DefaultComboBoxModel(new String[] {"moss", "jplag", "sim"}));
combMethod.setBounds(80, 51, 70, 21);
panel_1.add(combMethod);

Expand Down Expand Up @@ -233,14 +246,14 @@ public void actionPerformed(ActionEvent arg0) {
}

}
WinCMD cmd = new WinCMD();
cmd = new WinCMD();
int res = cmd.exec(methodtype, lang, value, f.getAbsolutePath());
if(res==0){
JOptionPane.showMessageDialog(PlagGUI.this, "执行完毕,请查看结果");
JOptionPane.showMessageDialog(PlagGUI.this, "执行完毕,请查看结果。如果结果为空,可以尝试调低相似度限值");
}else if(res<0){
JOptionPane.showMessageDialog(PlagGUI.this, "执行失败,请重试");
}else if(res>0){
JOptionPane.showMessageDialog(PlagGUI.this, "执行完毕,未发现符合限值要求的结果");
JOptionPane.showMessageDialog(PlagGUI.this, "执行完毕,未发现符合限值要求的结果,可以尝试调低相似度限值");
}
}

Expand All @@ -264,7 +277,37 @@ public void actionPerformed(ActionEvent arg0) {
button_1.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent arg0) {
//查看结果
String methodtype = (String)combMethod.getSelectedItem();
String lang = (String)combLang.getSelectedItem();

CompareResultFrame crf = new CompareResultFrame();
if(radBntProgram.isSelected()) {
if("jplag".equals(methodtype)) {
File rf = new File("jplagresult/matches_avg.csv");
crf.setResfile(rf);

rf = new File("jplagresult/index.html");
try { //加载默认浏览器,显示结果网页
Desktop.getDesktop().browse(rf.toURI());
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}else if("moss".equals(methodtype)) {
try { //加载默认浏览器,显示结果网页
String url = cmd.getMoss().getUrl();
if(url!=null) {
Desktop.getDesktop().browse(new URI(url));
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}


}

crf.setVisible(true);
}
});
Expand Down
70 changes: 67 additions & 3 deletions src/utils/edu/WinCMD.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,17 @@
import java.io.*;

import data.plag.edu.SimData;
import jplag.ExitException;
import jplag.JPlag;
import jplag.Program;
import jplag.options.CommandLineOptions;
import moss.plag.edu.*;

public class WinCMD {
String outfile = "out.txt";
String outfile = "out.txt";
String mossoutfile = "mossout.txt";
Moss moss = null;

public static void main(String args[]) {
/*
* if (args.length < 1) {
Expand Down Expand Up @@ -65,6 +71,12 @@ public static void main(String args[]) {
}


}
public Moss getMoss() {
return moss;
}
public void setMoss(Moss moss) {
this.moss = moss;
}
//清空out文件
public void clearOut(File f){
Expand Down Expand Up @@ -95,6 +107,8 @@ public int exec(String methodtype,String lang,int threshold,String files){
res = execMossJava(lang,threshold, files, lists);
}else if("sim".equals(methodtype)){
res = this.execSim(lang, threshold, files, lists);
}else if("jplag".equals(methodtype)) {
res = this.execJplag(lang, threshold, files, lists);
}
return res;
}
Expand All @@ -117,6 +131,56 @@ String pathconvert(String path){

return res;
}

//调用Jplag的方法,对代码进行比较,成功返回0,失败返回-1
public int execJplag(String lang,float threshold,String files,List<SimData> lists){
int res = -1;
String INPUT_FILE_FOLDER_NAME=files ; //输入文件目录
String jplagResultsFolderName="./jplagresult/"; //检查结果放在项目的子目录下
float MINIMUM_FILE_SIMILARITY = threshold ;
String EXCLUDE_FILES = null ;
ArrayList<String> args = new ArrayList<String>();

args.add("-l");
if(!"java".equals(lang)) {
args.add(lang); //设置语言类型参数,不加此参数,就使用默认值,为java19
}else {
args.add("java19");
}
args.add("-s"); //递归查询输入文件目录下的子目录
args.add("-r"); //指定结果存放的路径
args.add(jplagResultsFolderName);
args.add("-m"); //设置相似度检查门限参数值
args.add((int) (MINIMUM_FILE_SIMILARITY) + "%");
if (EXCLUDE_FILES!=null) { // 设置被排除的文件
args.add("-x");
args.add(EXCLUDE_FILES);
}
args.add(INPUT_FILE_FOLDER_NAME);
String[] toPass = new String[args.size()];
toPass = args.toArray(toPass);
// System.out.println(toPass.toString());
// JPlag.main(toPass);
try {
CommandLineOptions options = new CommandLineOptions(toPass, null);
Program program = new Program(options);

System.out.println("jplag initialize ok "+program.get_commandLine());
program.run();
res = 0; //执行成功
}
catch(ExitException ex) {
System.out.println("Error: "+ex.getReport());

}

return res ;
}





//java客户端执行moss,参数lang语言,threshold相似度限值,files比较文件所在的目录, lists比较结果,成功返回0,失败返回-1,
//无符合条件结果返回1
public int execMossJava(String lang,float threshold,String files,List<SimData> lists){
Expand All @@ -128,7 +192,7 @@ public int execMossJava(String lang,float threshold,String files,List<SimData> l
File dir = new File(files);
res = mc.sendMoss(dir,lang);
if(res==0){ //上传成功
Moss moss = new Moss();
moss = new Moss();
res = moss.analyMoss(mossoutfile,threshold, lists);
if(res==0 && lists.size()>0){ //分析到有效数据
FileIO.saveFile(new File(outfile), lists,2,"from stanford:"+moss.getUrl()); //保存结果到out.txt文件
Expand Down Expand Up @@ -170,7 +234,7 @@ public int execMoss(String lang,float threshold,String files,List<SimData> lists
// File file = new File("mossout.txt");
// analySim(file,lang,lists);
if(res==0){ //上传执行成功
Moss moss = new Moss();
moss = new Moss();
res = moss.analyMoss(mossoutfile,threshold, lists);
if(res==0 && lists.size()>0){ //分析到有效数据,注意:如果没有超过门限的值,size也为0
FileIO.saveFile(new File(outfile), lists,2,"from stanford:"+moss.getUrl()); //保存结果到out.txt文件
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
8 changes: 8 additions & 0 deletions testdata/docen/1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
My Hobbies and Interests
From Monday until Friday most people are busyworking or studying, but in the evenings and off weekends they are free to relax and enjoy themselves. Some watch television or go to the movies;others participate in sports.It depends on individual interests. There are many different ways to spend our spare time.

����Almost everyone has some kind of hobby. It may be anything from collecting stamps to making model airplanes.Some hobbies are worth a lot of money; others are valuable only to their owners.

����I know a man Who has a coin collection worth several thousand yuan. A short time ago he bought a rare ten-yuan piece worth 250 yuan. He was very happy about the purchase and thought the price was reasonable, on the other hand, my son collects match boxes. He has almost 600 of them but I doubt if they are wortfi any money. However, to my son they are extremely valuable. Nothing makes him happier than to find a new match-box for his collection.

����That's what a hobby means, i guess. It is something we like to do in our spare time simply for the fun of. it. The value in money is not important, but the pleasure it gives us is.
5 changes: 5 additions & 0 deletions testdata/docen/2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
From Monday until Friday most people are busyworking or studying, but in the evenings and off weekends they are free to relax and enjoy themselves. Some watch television or go to the movies;others participate in sports.It depends on individual interests. There are many different ways to spend our spare time.

����Almost everyone has some kind of hobby. It may be anything from collecting stamps to making model airplanes.Some hobbies are worth a lot of money; others are valuable only to their owners.

����I know a man Who has a coin collection worth several thousand yuan. A short time ago he bought a rare ten-yuan piece worth 250 yuan. He was very happy about the purchase and thought the price was reasonable, on the other hand, my son collects match boxes. He has almost 600 of them but I doubt if they are wortfi any money. However, to my son they are extremely valuable. Nothing makes him happier than to find a new match-box for his collection.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
26 changes: 26 additions & 0 deletions testdata/javaabctograde/gaoxinjian.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
public class gaoxinjian{
public static void main(String[] args){
char grade='b';
char a;
char b;
char c;
char d;
if(grade=='a')
{
System.out.println("90~100");
}else{
if(grade=='b')
{
System.out.println("70~90");
}else{
if(grade=='c'){
System.out.println("60~70");
}else{
if(grade=='D'){
System.out.println("0~60");
}
}
}
}
}
}
Loading

0 comments on commit d16047d

Please sign in to comment.