Skip to content

Getting Started

Gabriel Iuhasz edited this page Nov 14, 2016 · 18 revisions

In order to run ADP we must execute the following command:

python dmonadp.py <args>

There are currently two ways of configuring ADT. First we have the command line arguments and second we have the configuration file.

#Command Line arguments

$ python dmonadp.py -h

h -> This argument will list a short help message detailing some basic usage of for ADT

$ python dmonadp.py -f <file_location>

f -> This argument will ensure that the selected configuration file is loaded.

$ python dmonadp.py -e <es_endpoint>

e -> This argument allows the setting for the elasticsearch endpoint.

NOTE: It is important to note that in future versions ADT will be integrated with DMon and will be able to query the DMon query endpoint not just the elasticsearch one.

$ python dmonadp.py -a <es_query> -t -m <method_name> -v <folds> -x <model_name>

a -> This represents the query that is to be issued to elasticsearch. The resulting data will be used for training. The query is a standard elasticsearch query containing also the desired timeframe for the data.

t -> This instructs ADT to initiate the training of the predictive model.

m -> This represents the name of the method used to create the predictive model.

v -> This instructs ADT to run cross validation on the selected model for a set of defined folds.

x -> This allows the exporting of the predictive model in PMML format.

NOTE: The last two arguments, v and x, are optional.

$ python dmonadp.py -a <query> -d <model_name>

d -> This enables the detection of anomalies using a specified pre-trained predictive model (identified by its name).

#Configuration File

The configuration file allows the definition of all of the arguments already listed. Here is an example:

[Connector]
ESEndpoint:85.120.206.27
ESPort:9200
DMonPort:5001
From:1479105362284
To:1479119769978
Query:yarn:cluster, nn, nm, dfs, dn, mr;system
Nodes:
QSize:0
QInterval:10s

[Mode]
Training:true
Validate:False
Detect:false

[Filter]
#Columns:colname;colname2;colname3
#Rows:ld:145607979;gd:145607979
#DColumns:colname;colname2;colname3


[Detect]
Method:skm
Type:clustering
Export:test1
Load:test1

[MethodSettings]
n:10
s:10


[Point]
Memory: cached:gd:231313;buffered:ld:312123;used:ld:12313;free:gd:23123
Load: shortterm:gd:2.0;midterm:ld:0.1;longterm:gd:1.0
Network: tx:gd:34344;rx:ld:323434

[Misc]
heap:512m
checkpoint:false
delay:2m
interval:15m
resetindex:false

The Connector section sets the parameters for use in connecting and querying DMON:

  • ESEndpoint -> sets the current endpoint for DMON, it can be also in the form of a list if more than one elasticsearch instance is used by DMON
  • ESPort -> sets the port for the elasticsearch instances (NOTE: Only used for development and testing)
  • DMonPort -> sets the port for DMON
  • From -> sets the first timestamp for query (NOTE: Can use time arithmetic of the form "now-2h")
  • To -> sets the second timestamp for query
  • Query -> defines what metrics context to query from DMON ** each metric context is divided into subfields as follows:
  • ** yarn-> cluster, nn, nm, dfs, dn, mr
  • ** system -> memory, load, network
  • ** spark -> not for this version (v0.1.0)
  • ** storm -> not for this version (v0.1.0) NOTE: Each large context is delimited by ";" while each subfield is divided by ", ".
  • Nodes -> list of desired nodes, if nothing specified than uses all available nodes
  • QSize -> sets the query size (number of instances), if set to 0 then no limit is set
  • _QInterval) -> sets aggregation interval

The MethodSettings section of the configuration files allows the setting of different parameters of the chosen training method. These parameters can't be set using the command line arguments.

NOTE: This tool is still a work in progress. All commands and their behaviours are subject to changes. Please consult the repository changelog to see any significant changes.

Clone this wiki locally