Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API-SERVER] New DB inherritance approach #280

Merged
merged 99 commits into from
Mar 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
96f823f
[API-SERVER] New DB inherritance approach
leondavi Jan 16, 2024
959df0a
[DB] Add DB implementation
leondavi Jan 17, 2024
d3fe059
[APIServer] WIP DB Concept
leondavi Jan 17, 2024
acbaf7a
[API Server] Pass tests
leondavi Jan 17, 2024
9624c7e
Comments on formats
GuyPerets106 Jan 23, 2024
9f2398a
[ApiServerDB] WIP decodeHttpMainServer
ohad123 Jan 23, 2024
4f004fc
[ApiServerDB] Changed worker seperators in stats
GuyPerets106 Jan 24, 2024
5c19127
[Api Server] change NerlComDB functions
ohad123 Jan 24, 2024
05854e6
[ApiServerDB] Fixed stats request issues
GuyPerets106 Jan 24, 2024
1a7568b
[ApiServerDB] WIP NerlcomDB - stats from experiment
ohad123 Jan 25, 2024
9a6fcfb
[ApiServerDB] Changes to Routing IMPORTANT
GuyPerets106 Jan 25, 2024
b3ad95b
Merge branch 'ApiServerDB' of github.com:leondavi/NErlNet into ApiSer…
GuyPerets106 Jan 25, 2024
a873c97
[ApiServerDB] Changed Encoded String Format
GuyPerets106 Jan 25, 2024
799ea7b
[ApiServerDB] fix decoderHttpMainServer
ohad123 Jan 25, 2024
eb08392
[ApiServer] add comments to decoder
ohad123 Jan 25, 2024
2ecec82
[ApiServerDB] Doubled client name bug fixed
GuyPerets106 Jan 25, 2024
d6be794
[ApiServerDB] WIP fix NerlComDb
ohad123 Jan 25, 2024
e4d5b30
[ApiServerDB] fix comment - use defintions for separators
ohad123 Jan 26, 2024
c4f6712
[ApiServerDB] Removed prints
GuyPerets106 Jan 30, 2024
0271514
Added Example Notebook
GuyPerets106 Jan 30, 2024
a2b684f
[ApiServerDB] CommStats changes , Tasks in comments
GuyPerets106 Jan 31, 2024
984777c
[ApiServerDB] WIP
NoaShapira8 Feb 2, 2024
18b3e27
[ApiServerDB] Parsing trainRes completed
GuyPerets106 Feb 5, 2024
dd71d09
[ApiServer] WIP rebuild of experiment flow
ohad123 Feb 8, 2024
0903058
Merge branch 'master' of https://github.com/leondavi/NErlNet into Api…
ohad123 Feb 8, 2024
adcec42
[Api Server] add new json for experiment flow
ohad123 Feb 11, 2024
1174e8f
[ApiServerDB] WIP experiment flow
ohad123 Feb 11, 2024
ab220ef
[ApiServer]WIP
NoaShapira8 Feb 16, 2024
5ee34a6
[ApiServer] WIP finish experiment json parser
ohad123 Feb 16, 2024
09b53d0
[ApiServer] WIP fix parse_experiment_flow_json
ohad123 Feb 16, 2024
36b56a0
[ApiServerDB] WIP
ohad123 Feb 16, 2024
2252b78
[ApiServerDB] WIP
leondavi Feb 16, 2024
f3e7553
[ApiServerDB] WIP: run_current_experiment_phase
NoaShapira8 Feb 16, 2024
e14489c
[ApiServerDB] WIP
leondavi Feb 16, 2024
a7a4278
[API_SERVER] Add terminate action and new send jsons action
leondavi Feb 17, 2024
fb898a4
[ApiServerDB] Ack Updates
GuyPerets106 Feb 17, 2024
6571a9f
[ApiServer] WIP
ohad123 Feb 18, 2024
00033fa
[ApiServerDB] WIP
ohad123 Feb 18, 2024
6bce283
[ApiServerDB]WIP
ohad123 Feb 18, 2024
ef0c6b5
[ApiServerDB] WIP
ohad123 Feb 18, 2024
29a063e
[ApiServerDB] JsonReceivedAck
GuyPerets106 Feb 18, 2024
9e0f78b
[ApiServerDB] WIP
ohad123 Feb 18, 2024
f0b8e55
[ApiServerDB] WIP
ohad123 Feb 20, 2024
389da10
[ApiServerDB] WIP
ohad123 Feb 22, 2024
5b4c966
[ApiServerDB] WIP
ohad123 Feb 22, 2024
b68b109
[ApiServerDB] Updated predictRes Erlang Side
GuyPerets106 Feb 22, 2024
1fbb24f
[ApiServerDB] WIP
ohad123 Feb 22, 2024
ad46793
[APiServerDB] WIP
ohad123 Feb 22, 2024
af29fa2
[ApiServerDB] WIP
ohad123 Feb 22, 2024
dcf90c5
[ApiServerDB] WIP
ohad123 Feb 22, 2024
10eec93
[ApiServerDB] Fixed predictRes erlang-side
GuyPerets106 Feb 23, 2024
2ca8950
[ApiServerDB] WIP
ohad123 Feb 24, 2024
567b4d1
[ApiServerDB] WIP
ohad123 Feb 24, 2024
767823e
[ApiServerDB] Added Batch Timestamp
GuyPerets106 Feb 24, 2024
e3daca7
[ApiServerDB] Removed Deprecated "NumOfSamples"
GuyPerets106 Feb 24, 2024
dbc6b11
[ApiServerDB] Added Batch Timestamp to train phase
GuyPerets106 Feb 24, 2024
36d03b7
[ApiServerDB] WIP
ohad123 Feb 24, 2024
428c85c
[ApiServerDB] WIP
ohad123 Feb 24, 2024
2f8a120
[ApiServerDB] Added tensor format to trainRes
GuyPerets106 Feb 24, 2024
bcf6611
Merge branch 'ApiServerDB' of github.com:leondavi/NErlNet into ApiSer…
GuyPerets106 Feb 24, 2024
1d2d545
[ApiServerDB] Removed print
GuyPerets106 Feb 24, 2024
3575408
[ApiServerDB[WIP
ohad123 Feb 24, 2024
6256e0f
[ApiServerDB] WIP
ohad123 Feb 24, 2024
7ea6274
[ApiServerDB] WIP
ohad123 Feb 25, 2024
dc92323
[ApiServerDB] WIP
ohad123 Feb 25, 2024
5f96b99
[ApiServerDB]WIP
ohad123 Feb 25, 2024
81864af
[ApiServerDB] WIP
NoaShapira8 Feb 27, 2024
8eab3ed
[ApiServerDB] Build data frame for loss by ts
NoaShapira8 Feb 28, 2024
0010b96
[ApiServerDB] add func get_mean_loss_list
NoaShapira8 Feb 28, 2024
7dc342e
[ApiServerDB] WIP
NoaShapira8 Feb 29, 2024
652913c
[ApiServerDB] WIP
NoaShapira8 Feb 29, 2024
876e7d9
[ApiServerDB] WIP
ohad123 Feb 29, 2024
1ac898e
[ApiServerDB] WIP
ohad123 Feb 29, 2024
ebca014
[ApiServerDB]WIP
ohad123 Mar 2, 2024
cde6f76
[ApiServerDB] Nerlnet Restart Erlang Side
GuyPerets106 Mar 3, 2024
d415fce
Merge branch 'ApiServerDB' of github.com:leondavi/NErlNet into ApiSer…
GuyPerets106 Mar 3, 2024
0900145
[ApiServerDB] add confusion matrix
ohad123 Mar 4, 2024
a1a1982
Merge branch 'ApiServerDB' of github.com:leondavi/NErlNet into ApiSer…
ohad123 Mar 4, 2024
85477f4
[ApiServerDB] WIP
ohad123 Mar 4, 2024
b7724c1
[ApiServerDB] NerlnetGraph visualization in apiServer
GuyPerets106 Mar 5, 2024
9786692
[ApiServerDB] Nerlnet Graph visualization completed
GuyPerets106 Mar 5, 2024
7b6ecc3
[ApiServerDB] Finish get_confusion_matrices
NoaShapira8 Mar 6, 2024
3d8d32b
[ApiServerDB] finish get_model_performence_stats
NoaShapira8 Mar 6, 2024
a6aa35f
[ApiServerDB] new experiment test
NoaShapira8 Mar 6, 2024
487cf22
Merge branch 'master' of github.com:leondavi/NErlNet into ApiServerDB
leondavi Mar 7, 2024
7c6a4a2
tmp
leondavi Mar 7, 2024
d0d6ac5
[CI] New ApiServer DP interface
leondavi Mar 7, 2024
88bbcee
[CI] Issue with experiment
leondavi Mar 7, 2024
8747dde
[ApiServerDB] fix get_min_loss to be OrderDict
NoaShapira8 Mar 7, 2024
1a268ce
[ApiServerDB] fix assert issue
NoaShapira8 Mar 7, 2024
423a045
[CI] Issue with assert 1K batches train and predict
leondavi Mar 7, 2024
8460d29
[ApiServerDB] Fix assert issue
NoaShapira8 Mar 10, 2024
4d6d78f
[CI] Baseline generate
leondavi Mar 10, 2024
e87eb91
[ApiServerDB] get communication stats
NoaShapira8 Mar 10, 2024
3780f20
[CI] Issue with confusion matrix test
leondavi Mar 10, 2024
7c354ba
Merge branch 'ApiServerDB' of github.com:leondavi/NErlNet into ApiSer…
leondavi Mar 10, 2024
57cfdb7
Add comm stats to test - TODO missed batches
leondavi Mar 10, 2024
8f56b3e
[ApiServerDB] WIP stats
NoaShapira8 Mar 10, 2024
41ece72
[ApiServerDB] flow_test
NoaShapira8 Mar 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NerlnetJupyterLaunch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -139,5 +139,7 @@ cd $JUPDIR
generate_set_jupyter_env
generate_readme_md

# TODO add networkx and pygraphviz installations!

jupyter-lab

596 changes: 596 additions & 0 deletions examples/NerlnetExperiment.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"connectionsMap":
{
"r1":["mainServer", "r2"],
"r2":["r3", "s1"],
"r3":["r4", "c1" , "s2"],
"r4":["r1", "c2"]
}
}
124 changes: 124 additions & 0 deletions inputJsonsFiles/DistributedConfig/dc_test_synt_1d_2c_2s_4r_4w.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
{
"nerlnetSettings": {
"frequency": "60",
"batchSize": "50"
},
"mainServer": {
"port": "8081",
"args": ""
},
"apiServer": {
"port": "8099",
"args": ""
},
"devices": [
{
"name": "pc1",
"ipv4": "10.0.0.11",
"entities": "c1,c2,r2,r1,r3,r4,s1,s2,apiServer,mainServer"
}
],
"routers": [
{
"name": "r1",
"port": "8086",
"policy": "0"
},
{
"name": "r2",
"port": "8087",
"policy": "0"
},
{
"name": "r3",
"port": "8088",
"policy": "0"
},
{
"name": "r4",
"port": "8089",
"policy": "0"
}
],
"sources": [
{
"name": "s1",
"port": "8085",
"frequency": "300",
"policy": "0",
"epochs": "1",
"type": "0"
},
{
"name": "s2",
"port": "8090",
"frequency": "300",
"policy": "0",
"epochs": "1",
"type": "0"
}
],
"clients": [
{
"name": "c1",
"port": "8083",
"workers": "w1,w2"
},
{
"name": "c2",
"port": "8084",
"workers": "w3,w4"
}
],
"workers": [
{
"name": "w1",
"model_sha": "d8df752e0a2e8f01de8f66e9cec941cdbc65d144ecf90ab7713e69d65e7e82aa"
},
{
"name": "w2",
"model_sha": "d8df752e0a2e8f01de8f66e9cec941cdbc65d144ecf90ab7713e69d65e7e82aa"
},
{
"name": "w3",
"model_sha": "d8df752e0a2e8f01de8f66e9cec941cdbc65d144ecf90ab7713e69d65e7e82aa"
},
{
"name": "w4",
"model_sha": "d8df752e0a2e8f01de8f66e9cec941cdbc65d144ecf90ab7713e69d65e7e82aa"
}
],
"model_sha": {
"d8df752e0a2e8f01de8f66e9cec941cdbc65d144ecf90ab7713e69d65e7e82aa": {
"modelType": "0",
"_doc_modelType": " nn:0 | approximation:1 | classification:2 | forecasting:3 | image-classification:4 | text-classification:5 | text-generation:6 | auto-association:7 | autoencoder:8 | ae-classifier:9 |",
"layersSizes": "5,10,5,3,3",
"_doc_layersSizes": "List of postive integers [L0, L1, ..., LN]",
"layerTypesList": "1,3,3,3,5",
"_doc_LayerTypes": " Default:0 | Scaling:1 | CNN:2 | Perceptron:3 | Pooling:4 | Probabilistic:5 | LSTM:6 | Reccurrent:7 | Unscaling:8 |",
"layers_functions": "1,6,6,11,4",
"_doc_layers_functions_activation": " Threshold:1 | Sign:2 | Logistic:3 | Tanh:4 | Linear:5 | ReLU:6 | eLU:7 | SeLU:8 | Soft-plus:9 | Soft-sign:10 | Hard-sigmoid:11 |",
"_doc_layer_functions_pooling": " none:1 | Max:2 | Avg:3 |",
"_doc_layer_functions_probabilistic": " Binary:1 | Logistic:2 | Competitive:3 | Softmax:4 |",
"_doc_layer_functions_scaler": " none:1 | MinMax:2 | MeanStd:3 | STD:4 | Log:5 |",
"lossMethod": "2",
"_doc_lossMethod": " SSE:1 | MSE:2 | NSE:3 | MinkowskiE:4 | WSE:5 | CEE:6 |",
"lr": "0.01",
"_doc_lr": "Positve float",
"epochs": "1",
"_doc_epochs": "Positve Integer",
"optimizer": "5",
"_doc_optimizer": " GD:0 | CGD:1 | SGD:2 | QuasiNeuton:3 | LVM:4 | ADAM:5 |",
"optimizerArgs": "",
"_doc_optimizerArgs": "String",
"infraType": "0",
"_doc_infraType": " opennn:0 | wolfengine:1 |",
"distributedSystemType": "0",
"_doc_distributedSystemType": " none:0 | fedClientAvg:1 | fedServerAvg:2 |",
"distributedSystemArgs": "",
"_doc_distributedSystemArgs": "String",
"distributedSystemToken": "none",
"_doc_distributedSystemToken": "Token that associates distributed group of workers and parameter-server"
}
}
}
64 changes: 64 additions & 0 deletions inputJsonsFiles/experimentsFlow/exp_new_arc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{

"experimentName": "synthetic_3_gausians",
"batchSize": 50,
"csvFilePath": "/tmp/nerlnet/data/NerlnetData-master/nerlnet/synthetic/synthetic_full.csv",
"numOfFeatures": "5",
"numOfLabels": "3",
"headersNames": "Norm(0:1),Norm(4:1),Norm(10:3)",
"Phases":
[
{
"phaseName": "training1",
"phaseType": "training",
"sourcePieces":
[
{
"sourceName": "s1",
"startingSample": "0",
"numOfBatches": "10",
"workers": "w1,w2,w3,w4"
},
{
"sourceName": "s2",
"startingSample": "500",
"numOfBatches": "10",
"workers": "w1,w2,w3,w4"
}
]
},
{
"phaseName": "validation1",
"phaseType": "prediction",
"sourcePieces":
[
{
"sourceName": "s1",
"startingSample": "1000",
"numOfBatches": "5",
"workers": "w1,w2,w3,w4"
},
{
"sourceName": "s2",
"startingSample": "1250",
"numOfBatches": "5",
"workers": "w1,w2,w3,w4"
}
]
},
{
"phaseName": "prediction1",
"phaseType": "prediction",
"sourcePieces":
[
{
"sourceName": "s1",
"startingSample": "1500",
"numOfBatches": "5",
"workers": "w1,w2,w3,w4"
}
]

}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"experimentName": "synthetic_3_gausians",
"batchSize": 50,
"csvFilePath": "/tmp/nerlnet/data/NerlnetData-master/nerlnet/synthetic/synthetic_full.csv",
"numOfFeatures": "5",
"numOfLabels": "3",
"headersNames": "Norm(0:1),Norm(4:1),Norm(10:3)",
"Phases":
[
{
"phaseName": "training_phase",
"phaseType": "training",
"sourcePieces":
[
{
"sourceName": "s1",
"startingSample": "0",
"numOfBatches": "40",
"workers": "w1,w2,w3,w4"
}
]
},
{
"phaseName": "prediction_phase",
"phaseType": "prediction",
"sourcePieces":
[
{
"sourceName": "s1",
"startingSample": "2000",
"numOfBatches": "40",
"workers": "w1,w2,w3,w4"
}
]
}
]
}

1 change: 1 addition & 0 deletions src_cpp/common/common_definitions.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ namespace nerlnet
#define DIM_Z_IDX 2

#define NERLNIF_ATOM_STR "nerlnif"
#define NERLNIF_NAN_ATOM_STR "nan"

}
1 change: 1 addition & 0 deletions src_cpp/common/nerlWorkerFunc.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ static void parse_layer_sizes_str(std::string &layer_sizes_str, std::vector<int>
case LAYER_TYPE_DEFAULT:
case LAYER_TYPE_SCALING:
case LAYER_TYPE_UNSCALING:
case LAYER_TYPE_PROBABILISTIC:
case SIMPLE_PARSING:{
out_layer_sizes_params[i].dimx = std::stoi(layer_sizes_strs_vec[i]);
break;
Expand Down
8 changes: 5 additions & 3 deletions src_cpp/opennnBridge/openNNnif.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,17 @@ void* trainFun(void* arg)
// Stop the timer and calculate the time took for training
high_resolution_clock::time_point stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - TrainNNptr->start_time);
ERL_NIF_TERM loss_val_term;

if(isnan(loss_val) )
{
loss_val = -1.0;
loss_val_term = enif_make_atom(env , NERLNIF_NAN_ATOM_STR);
cout << NERLNIF_PREFIX << "loss val = nan , setting NN weights to random values" <<std::endl;
neural_network_ptr->set_parameters_random();
}
//cout << "returning training values"<<std::endl;
ERL_NIF_TERM loss_val_term = enif_make_double(env, loss_val);
else {
loss_val_term = enif_make_double(env, loss_val);
}
ERL_NIF_TERM train_time = enif_make_double(env, duration.count());
ERL_NIF_TERM nerlnif_atom = enif_make_atom(env, NERLNIF_ATOM_STR);

Expand Down
16 changes: 8 additions & 8 deletions src_erl/NerlnetApp/src/Bridge/nerlNIF.erl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
-include_lib("kernel/include/logger.hrl").
-include("nerlTensor.hrl").

-export([init/0,nif_preload/0,get_active_models_ids_list/0, train_nif/3,update_nerlworker_train_params_nif/6,call_to_train/3,predict_nif/3,call_to_predict/6,get_weights_nif/1,printTensor/2]).
-export([init/0,nif_preload/0,get_active_models_ids_list/0, train_nif/3,update_nerlworker_train_params_nif/6,call_to_train/5,predict_nif/3,call_to_predict/5,get_weights_nif/1,printTensor/2]).
-export([call_to_get_weights/2,call_to_set_weights/2]).
-export([decode_nif/2, nerltensor_binary_decode/2]).
-export([encode_nif/2, nerltensor_encode/5, nerltensor_conversion/2, get_all_binary_types/0, get_all_nerltensor_list_types/0]).
Expand Down Expand Up @@ -46,7 +46,7 @@ train_nif(_ModelID,_DataTensor,_Type) ->
update_nerlworker_train_params_nif(_ModelID,_LearningRate,_Epochs,_OptimizerType,_OptimizerArgs,_LossMethod) ->
exit(nif_library_not_loaded).

call_to_train(ModelID, {DataTensor, Type}, WorkerPid)->
call_to_train(ModelID, {DataTensor, Type}, WorkerPid , BatchID , SourceName)->
% io:format("before train ~n "),
% io:format("DataTensor= ~p~n ",[nerltensor_conversion({DataTensor, Type}, erl_float)]),
%{FakeTensor, Type} = nerltensor_conversion({[2.0,4.0,1.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0], erl_float}, float),
Expand All @@ -56,29 +56,29 @@ call_to_train(ModelID, {DataTensor, Type}, WorkerPid)->
{nerlnif , LossValue , TrainTime}->
% io:format("Ret= ~p~n ",[Ret]),
%io:format("WorkerPid,{loss, Ret}: ~p , ~p ~n ",[WorkerPid,{loss, Ret}]),
gen_statem:cast(WorkerPid,{loss, LossValue , TrainTime}) % TODO @Haran - please check what worker does with this Ret value
LossTensor = nerltensor_encode(1.0,1.0,1.0,[LossValue], erl_float), %% ALWAYS {[1.0,1.0,1.0,LOSS_VALUE] , <TYPE>}
gen_statem:cast(WorkerPid,{loss, LossTensor , TrainTime , BatchID , SourceName}) % TODO @Haran - please check what worker does with this Ret value
after ?TRAIN_TIMEOUT -> %TODO inspect this timeout
?LOG_ERROR("Worker train timeout reached! setting loss = -1~n "),
gen_statem:cast(WorkerPid,{loss, timeout}) %% Define train timeout state
gen_statem:cast(WorkerPid,{loss, timeout , SourceName}) %% Define train timeout state
end.

call_to_predict(ModelID, BatchTensor, Type, WorkerPid,CSVname, BatchID)->
% io:format("satrting pred_nif~n"),
call_to_predict(ModelID, {BatchTensor, Type}, WorkerPid, BatchID , SourceName)->
ok = predict_nif(ModelID, BatchTensor, Type),
receive

{nerlnif , PredNerlTensor, NewType, TimeTook}-> %% nerlnif atom means a message from the nif implementation
% io:format("pred_nif done~n"),
% {PredTen, _NewType} = nerltensor_conversion({PredNerlTensor, NewType}, erl_float),
% io:format("Pred returned: ~p~n", [PredNerlTensor]),
gen_statem:cast(WorkerPid,{predictRes,PredNerlTensor, NewType, TimeTook,CSVname, BatchID});
gen_statem:cast(WorkerPid,{predictRes,PredNerlTensor, NewType, TimeTook, BatchID , SourceName});
Error ->
?LOG_ERROR("received wrong prediction_nif format: ~p" ,[Error]),
throw("received wrong prediction_nif format")
after ?PREDICT_TIMEOUT ->
% worker miss predict batch TODO - inspect this code
?LOG_ERROR("Worker prediction timeout reached! ~n "),
gen_statem:cast(WorkerPid,{predictRes, nan, CSVname, BatchID})
gen_statem:cast(WorkerPid,{predictRes, nan, BatchID , SourceName})
end.

call_to_get_weights(ThisEts, ModelID)->
Expand Down
Loading
Loading