Skip to content

Commit

Permalink
Merge branch 'main' into dependabot/maven/com.google.errorprone-error…
Browse files Browse the repository at this point in the history
…_prone_core-2.25.0
  • Loading branch information
Gram21 authored Mar 7, 2024
2 parents 36d27d0 + d62217d commit d3cd5a3
Show file tree
Hide file tree
Showing 22 changed files with 364 additions and 259 deletions.
4 changes: 0 additions & 4 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,6 @@ on:
- '**/src/**'
- '**/pom.xml'
- 'pom.xml'

# Publish `v1.2.3` tags as releases.
tags:
- v*

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down
68 changes: 11 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,37 +6,16 @@
[![Latest Release](https://img.shields.io/github/release/ArDoCo/Core.svg)](https://github.com/ArDoCo/Core/releases/latest)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7274034.svg)](https://doi.org/10.5281/zenodo.7274034)

The goal of this project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating
elements (inconsistencies).
The goal of the ArDoCo project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating elements (inconsistencies).
An element can be any representable item of the model, like a component or a relation.
To do so, we first create trace links and then make use of them and other information to identify inconsistencies.

ArDoCo is actively developed by researchers of
the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_
of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at
the [KIT](https://www.kit.edu).
ArDoCo is actively developed by researchers of the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_ of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at the [KIT](https://www.kit.edu).

## User Interfaces
This **Core** repository contains the framework and core definitions for the other approaches.
As such, there is the definition of our pipeline and the data handling as well as the definitions for the various pipeline steps, inputs, outputs, etc.

To be able to execute the core algorithms from this repository, you can write own user interfaces that (should) use
the [ArDoCoRunner](https://github.com/ArDoCo/Core/blob/main/pipeline/pipeline-core/src/main/java/edu/kit/kastel/mcse/ardoco/core/execution/runner/ArDoCoRunner.java).

We provide an example Command Line Interface (CLI) at [ArDoCo/CLI](https://github.com/ArDoCo/CLI) as well as a simple Graphical User Interface (GUI)
at [ArDoCo/GUI](https://github.com/ArDoCo/GUI).

Future user interfaces like an enhanced GUI or a web interface are planned.

## Documentation

For more information about the setup or the architecture have a look on the [Wiki](https://github.com/ArDoCo/Core/wiki).
The docs are at some points deprecated, the general overview and setup should still hold.

## Case Studies / Benchmarks

To test the Core, you could use case studies and benchmarks provided in ..

* [ArDoCo Benchmark](https://github.com/ArDoCo/Benchmark)
* [SWATTR](https://github.com/ArDoCo/SWATTR)
For more information about the setup, the project structure, or the architecture, please have a look at the [Wiki](https://github.com/ArDoCo/Core/wiki).

## Maven

Expand All @@ -45,7 +24,7 @@ To test the Core, you could use case studies and benchmarks provided in ..
<dependencies>
<dependency>
<groupId>io.github.ardoco.core</groupId>
<artifactId>pipeline</artifactId> <!-- or any other subproject -->
<artifactId>framework</artifactId> <!-- or any other subproject -->
<version>VERSION</version>
</dependency>
</dependencies>
Expand All @@ -69,33 +48,8 @@ For snapshot releases, make sure to add the following repository
</repositories>
```

## Microservice for text preprocessing

Text preprocessing works locally, but there is also the option to host a microservice for this.
The benefit is that the models do not need to be loaded each time, saving some runtime (and local memory).

The microservice can be found at [ArDoCo/StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service/).

The microservice is secured with credentials and the usage of the microservice needs to be activated and the URL of the microservice configured.
These settings can be provided to the execution via environment variables.
To do so, set the following variables:

```env
NLP_PROVIDER_SOURCE=microservice
MICROSERVICE_URL=[microservice_url]
SCNLP_SERVICE_USER=[your_username]
SCNLP_SERVICE_PASSWORD=[your_password]
```

The first variable `NLP_PROVIDER_SOURCE=microservice` activates the microservice usage.
The next three variables configure the connection, and you need to provide the configuration for your deployed microservice.

## Attribution

The initial version of this project is based on the master
thesis [Linking Software Architecture Documentation and Models](https://doi.org/10.5445/IR/1000126194).

## Acknowledgements

This work was supported by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by
KASTEL Security Research Labs (46.23.01).
## Relevant repositories
The following is an excerpt of repositories that use this framework and implement the different approaches and pipelines of ArDoCo:
* [ArDoCo/TLR](https://github.com/ArDoCo/TLR): implementing different traceability link recovery approaches
* [ArDoCo/InconsistencyDetection](https://github.com/ArDoCo/InconsistencyDetection): implementing inconsistency detection approaches
* [ArDoCo/LiSSA](https://github.com/ArDoCo/LiSSA): implementing processing of sketches and diagrams for, e.g., TLR
78 changes: 53 additions & 25 deletions docs/Home.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,77 @@
# ArDoCo

<p align="center">
<img alt="ArDoCo" src="https://github.com/ArDoCo/.github/raw/main/profile/logo.png" height="210"/>
</p>

ArDoCo (Architecture Documentation Consistency) is a framework to connect architecture documentation and models while
identifying missing or deviating elements (inconsistencies). An element can be any representable item of the model, like
a component or a relation. To do so, ArDoCo first creates trace links and then makes use of them and other information
to identify inconsistencies.

You can find [ArDoCo on GitHub](https://github.com/ArDoCo).
You can find ArDoCo on the [website](https://ardoco.de) and [on GitHub](https://github.com/ArDoCo).

Before contributing, please read the [Quickstart Guide](quickstart).

JavaDocs can be found [here](https://ardoco.github.io/Core-Docs/).
<!-- JavaDocs can be found [here](https://ardoco.github.io/Core-Docs/). -->

To get to know the project, please read the following pages:

* [Core Pipeline Definition](pipeline)
* [Intermediate Artifacts](intermediate-artifacts)
* [Text Preprocessing Microservice](Text-Preprocessing-Microservice)
* [Traceability Link Recovery (TLR)](traceability-link-recovery)
* [Inconsistency Detection (ID)](inconsistency-detection)
* [Linking Sketches and Software Architecture (LiSSA)](LiSSA)

## Project Structure

* [Core](https://github.com/ArDoCo/Core): Core framework with framework and API definitions
* Pipelines
* [TLR](https://github.com/ArDoCo/TLR): Traceability Link Recovery (TLR) Modules
* [StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service): RESTful web service for text preprocessing
* [InconsistencyDetection](https://github.com/ArDoCo/InconsistencyDetection): Inconsistency Detection (ID) Modules
* [LiSSA](https://github.com/ArDoCo/LiSSA): Linking Sketches and Software Architecture Modules
* Testing and Evaluation
* [IntegrationTests](https://github.com/ArDoCo/IntegrationTests): Integration Tests
* [Benchmark](https://github.com/ArDoCo/Benchmark): Benchmarks
* [Evaluator](https://github.com/ArDoCo/Evaluator): Evaluation code that compares CSVs (e.g., output and gold standard)
* [SimpleTracelinkDiscovery](https://github.com/ArDoCo/SimpleTracelinkDiscovery): Baseline approach
* GUIs, CLIs, etc.
* [TraceView](https://github.com/ArDoCo/TraceView): WIP visualisation of the outputs for TLR and ID
* *outdated* [CLI](https://github.com/ArDoCo/CLI): Command Line Interface (*outdated*)
* [actions](https://github.com/ArDoCo/actions): Reusable GitHub Actions

## System Requirements

The `complete` profile includes all the requirements that the special profiles also need. This profile is activated by
default.
The project requires **JDK 21**.
Furthermore, we advise at least **4 GB of RAM**.

All profiles require JDK 21.
## Benchmarks

The dependencies of the other profiles at a glance:
You can test ArDoCo using the projects provided in our [Benchmark repository](https://github.com/ArDoCo/Benchmark).

* tlr: -
* inconsistency: -
* lissa (LInking Sketches and Software Architecture): Docker (local
or [remote](https://github.com/ArDoCo/Core/blob/lissa/stages/diagram-recognition/src/main/kotlin/edu/kit/kastel/mcse/ardoco/lissa/diagramrecognition/informants/DockerInformant.kt#L20-L23))
## Related Publications

## Case Studies & Benchmarks
* J. Keim, S. Corallo, D. Fuchß, T. Hey, T. Telge und A. Koziolek. "Recovering Trace Links Between Software Documentation And Code". 2024. In: Proceedings of 46th IEEE International Conference on Software Engineering (ICSE 2024). [doi:10.5445/IR/1000165692](https://doi.org/10.5445/IR/1000165692/post)

You can test ArDoCo using our case studies and benchmarks provided in ...
* J. Keim, S. Corallo, D. Fuchß und A. Koziolek. "Detecting Inconsistencies in Software Architecture Documentation Using Traceability Link Recovery". 2023. In: IEEE 20th International Conference on Software Architecture (ICSA 2023). [doi:10.1109/ICSA56044.2023.00021](https://doi.org/10.1109/ICSA56044.2023.00021)

* [Case Studies](https://github.com/ArDoCo/SWATTR)
* [Benchmarks](https://github.com/ArDoCo/Benchmark)
* D. Fuchß, S. Corallo, J. Keim, J. Speit und A. Koziolek. "Establishing a Benchmark Dataset for Traceability Link Recovery between Software Architecture Documentation and Models". 2022. In: 2nd International Workshop on Mining Software Repositories for Software Architecture - Co-located with 16th European Conference on Software Architecture.

## Publications
* J. Keim, S. Schulz, D. Fuchß, C. Kocher, J. Speit, A. Koziolek. "Trace Link Recovery for Software Architecture Documentation". 2021. In: Software Architecture: 15th European Conference (ECSA 2021). [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7)

Trace Link Recovery for Software Architecture Documentation Keim, J.; Schulz, S.; Fuchß, D.; Kocher, C.; Speit, J.;
Koziolek, A. 2021. Software Architecture: 15th European Conference, ECSA 2021, Virtual Event, Sweden, September 13-17,
2021, Proceedings. Ed.: S. Biffl, 101–116, Springer
Verlag. [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7)
* J. Keim and A. Koziolek. "Towards Consistency Checking Between Software Architecture and Informal Documentation". 2019. In: IEEE 16th International Conference on Software Architecture Companion (ICSA-C). [doi:10.1109/ICSA-C.2019.00052](https://doi.org/10.1109/ICSA-C.2019.00052)

The initial version of ArDoCo is based on the master
thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194).

The initial version of ArDoCo is based on the master thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194).

## Contact

This project is currently developed by researchers of the Karlsruhe Institute of Technology.
This project is currently developed by researchers of the Karlsruhe Institute of Technology (KIT).

You find us on our websites:

You find us on our
websites: [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php), [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php),
and [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php)
* [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php),
* [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php), and
* [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php)
12 changes: 12 additions & 0 deletions docs/Inconsistency-Detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

Currently, there are two kinds of inconsistencies that are supported by the approach: Missing Model Elements (MMEs) and Undocumented Model Elements (UMEs).

Undocumented Model Elements (UMEs) are elements within the Software Architecture Model (SAM) that are not documented in the natural language Software Architecture Documentation (SAD).
Our heuristic looks for model elements that do not have any (or below a certain threshold, per default 1) trace links associated with them.
In the configuration options, you can fine tune the threshold as well as set up a regex-based whitelist.

Missing Model Elements (MMEs) are architecture elements that are described within the SAD that cannot be traced to the SAM.
For this, we make use of the recommendations from the Recommendation Generator within the [Traceability Link Recovery (TLR)](traceability-link-recovery).
Each of these recommendations that are not linked with a model element are potential inconsistencies.
To further increase precision, we make use of filters.
For example, we use a filter to get rid of commonly used software (development) terminology that look similar to, e.g., components but rarely are model elements.
128 changes: 128 additions & 0 deletions docs/Intermediate-Artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@

Currently, there are three kinds of intermediate artifacts.
First, the input text has an internal representation (cf. [edu/kit/kastel/mcse/ardoco/core/api/text/Text.java](https://github.com/ArDoCo/Core/blob/main/framework/common/src/main/java/edu/kit/kastel/mcse/ardoco/core/api/text/Text.java)) to cover all the annotations from the preprocessing.
Second, there is the intermediate representation of software architecture models (SAMs) that we cover [below](#software-architecture-models).
Third, we create a uniform representation for code that we also explain [below](#code).

```mermaid
classDiagram
class ModelElement
class Model
class Entity
class CodeModel
class ArchitectureModel
ModelElement <|-- Entity
ModelElement <|-- Model
Model <|-- CodeModel
Model <|-- ArchitectureModel
Model "0..1" o--"*" Entity: elements
```

## Software Architecture Models

```mermaid
classDiagram
class Entity
class ArchitectureItem
class Component
class Interface
class Signature
Entity <|-- ArchitectureItem
ArchitectureItem <|-- Component
ArchitectureItem <|-- Interface
ArchitectureItem <|-- Signature
Interface o-- "*" Signature: signatures
Interface "*" <-- "*" Component: provided
Interface "*" <-- "*" Component: required
Component "*" <-- Component: subcomponents
```

In this software model, each class is categorized as an ArchitectureItem, which inherits properties from Entity, including a name and identifier.
There are three types of ArchitectureItems: Component, Interface, and Signature.

A Component represents various architectural elements in different modeling languages.
For instance, it corresponds to a UML Component.
In the PCM context, it encompasses both BasicComponent and CompositeComponent.
BasicComponents do not contain sub-components, while CompositeComponents may have sub-components.

Components can either require or provide Interfaces.
Provided Interfaces are implemented by the Component, while Required Interfaces specify the functionality required by a Component.

An Interface contains multiple method Signatures.
Signatures are linked to Interfaces in a composite relationship, meaning each Signature is associated with an Interface.


## Code

```mermaid
classDiagram
class Entity
class CodeItem
class Module
class Package
class CompilationUnit
class CodeAssembly
class ComputationalObject
class ControlElement
class Datatype
class ClassUnit
class InterfaceUnit
Entity <|-- CodeItem
CodeItem <|-- ComputationalObject
CodeItem <|-- Module
CodeItem <|-- Datatype
ComputationalObject <|-- ControlElement
Module <|-- Package
Module <|-- CompilationUnit
Module <|-- CodeAssembly
Datatype <|-- ClassUnit
Datatype <|-- InterfaceUnit
Module "0..1" o--> "*" CodeItem: codeElements
ClassUnit "0..1" o--> "*" CodeItem: codeElements
InterfaceUnit "0..1" o--> "*" CodeItem: codeElements
Datatype "*" <-- "*" Datatype: implementedTypes
Datatype "*" <-- "*" Datatype: extendedTypes
```

The intermediate model for code is based on the source code package within the [Knowledge Discover Model (KDM)](https://www.omg.org/spec/KDM/1.3/PDF).

The different classes in the code model inherit from CodeItem, which itself is a specialized Entity.
Thus, each class has a name and identifier.

There are three kinds of source code elements: Module, Datatype, and ComputationalObject.

Modules are typically logical components of the system with a certain level of abstraction.
A Module can contain CodeItems, and there are three differentiations of Modules: CompilationUnit, Package, and CodeAssembly.

A CompilationUnit represents a source file where code is stored.
It includes a relative path to the file's location on disk and its programming language.
The CompilationUnit is partly based on the InventoryModel from KDM.

A Package is a logical collection of source code elements (i.e., CodeItems).
Packages can also contain sub-Packages, similar to the structure commonly found in Java.

A CodeAssembly consists of source code artifacts linked together to make them runnable.
For example, source code files together with their headers are grouped in a CodeAssembly.

There are two kinds of Datatypes: CodeUnit and InterfaceUnit.
A CodeUnit is akin to a class in Java and can contain other CodeItems like methods and inner classes.
Similarly, an InterfaceUnit can also contain code elements like methods.

The relationships implementedTypes and extendedTypes from the KDM model are present in the intermediate model.
A Datatype can implement an arbitrary number of extendedTypes relations, representing inheritance in object-oriented programming languages.

The construction around extendedTypes and implementedTypes also enables interfaces to extend other interfaces, akin to Java.
Interfaces can also extend classes, a feature present in some programming languages like TypeScript.

The KDM includes several primitive datatypes like boolean, which are not realized within this model as they are not currently needed.
If future work extends the approaches with a thorough comparison of datatypes, then the intermediate model may need further sub-classing of the KDM.

Currently, there is only one type of ComputationalObject: the ControlElement.
The ControlElement represents callable parts with specific behaviors, such as functions, procedures, or methods.
Unlike the KDM, this work does not make a further distinction between CallableUnits and MethodUnits.
Additionally, it does not utilize parameters, return types, or similar elements of the KDM and therefore does not model them.
5 changes: 3 additions & 2 deletions docs/LiSSA.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Linking Sketches and Software Architecture (LiSSA)

The LiSSA approach aims to connect sketches and informal diagrams (such as class diagrams, component diagrams, ...) with
formal models like component models.

## Linking Sketches and Software Architecture (LiSSA)
The following diagram shows the pipeline that is planned for the LiSSA approach.

```mermaid
Expand All @@ -14,7 +15,7 @@ stateDiagram-v2
RecommendationGeneration
ConnectionGeneration
InconsistencyDetection
DiagramDetection --> RecommendationGeneration
TextPreprocessing --> TextExtraction
ArchitectureModel --> RecommendationGeneration
Expand Down
Loading

0 comments on commit d3cd5a3

Please sign in to comment.