Skip to content
/ pyodm Public

Observers-based Data Modeling. CN contact: Fares Meghdouri

License

Notifications You must be signed in to change notification settings

CN-TU/pyodm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyODM

Contact: Fares Meghdouri - fares.meghdouri@tuwien.ac.at

Paper: "Modeling Data with Observers"

@article{meghdouri2022modeling,
  title={Modeling data with observers},
  author={Meghdouri, Fares and Iglesias V{\'a}zquez, F{\'e}lix and Zseby, Tanja},
  journal={Intelligent Data Analysis},
  volume={26},
  number={3},
  pages={785--803},
  year={2022},
  publisher={IOS Press}
}

Installation

pyodm can be installed using pip by running pip install git+https://github.com/CN-TU/pyodm

Note that in order for ODM to work with an M-Tree core, the implementation (package) in M-Trees needs to be installed. The repository is private and will be available soon.

Usage

Please note that many parameters can be adjusted in order to build a representative model refer to the paper for more information.

Create a model

import pyodm

# create a new model with default parameters
model = pyodm.ODM(random_state=1)

Construct a coreset

import numpy
#import pandas

# read the data
X = np.load('my_dataset.npy')

#or
#X = pandas.read_csv('my_dataset.csv').values

# model the data
model.fit(X)

# access the array of observers
print(model.observers)

# access the array of radius
print(model.radius)

# access the array of populations
print(model.population)

Outlierness scores

In order to get the outlierness score of a set of points (based on an ODM model), run the foolowing after fitting a model

# read the data
X_test = np.load('my_test_dataset.npy')

# get the outlierness scores
outlierness_scores = model.outlierness(X_test)

Anomaly detection

One can convert the outlierness score into a binary label (outlier/inlier) using the following

# read the data
X_test = np.load('my_test_dataset.npy')

# convert the outlierness scores into binary labels using a contamination threshold
predictions = model.predict(X_test) 

Points labeling

to get the label of the closest observer to a set of points use

# read the data
X_test = np.load('my_test_dataset.npy')

# return the predicted label of each test point (refering to `model.observers`)
predicted_labels = model.labels(X_test)

Get parameters

# return a dictionnary of the current parameters
model.get_params()

This will return a dictionnary of parameters used to build the model.\

Visual Examples

Example1: Three datasets in which datapoints are represented in gray and the ODM model in red each with a different configuration. Three datasets in which data-points are represented in gray and the ODM model in red each with a different configuration.

Example2: Convergence path of an observer. Convergence path of an observer.

Example3: Two clusters datasets with two observers. Two clusters datasets with two observers.

Example4: Five clusters datasets with six observers. Five clusters datasets with six observers.

About

Observers-based Data Modeling. CN contact: Fares Meghdouri

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages