The recent development of technology and science has allowed a high rate of growth in the volume of data as has increased its availability. From 2005 to 2020, the digital universe is expected to grow by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes per man, woman and child). Part of this growth is due to the large volume of data generated by computer networks, smartphones, wearables and a wide range of sensors, which produce real-time data. All these data are only useful if they can be processed efficiently, so that individuals can make timely decisions based on them.
The problem with raw data is that it does not allow the identification of patterns, and for that, machine learning (ML) techniques are widely used. ML techniques are used in many sectors of society, from the daily life to national security. The use of ML techniques to solve problems, in addition to the efficiency in generating models, is due to the fact that people are prone to make mistakes when doing certain analyzes or, possibly, trying to find relationships between multiple characteristics, which usually does not occur with ML.
Last year Apple announces the CoreML. A framework that allows developers to embed pre-trained ML models into their own projects. The biggest drawback with this framework was that developers still need to use external software to collect their data, process it to make it suitable for mining, tune and train a model that fits its needs, to finally export the model into CoreML. This pipeline make most of iOS developers use only pre-trained models for image recognition, and almost none of them have create their own models for prediction, which means that almost all the power of this framework was being underused, since one of the major goals of ML is to provide custom solutions.
A year has passed and Apple announces the CreateML. This time, some of the aforementioned pipeline was simplified, allowing most of the work to be done locally and natively in swift. A very good job was made with Image and Text recognition, allowing iOS developers to train a model with just few “clicks”. Still, for general purpose ML tasks, not much was improved. And, as the name says, general purpose tasks are the most common. iDevices are constantly with their users and produce a lot of data, that can be used to learn with the user habits, and provide custom functionality.
Our framework is open source and completely written in Swift. The framework cover all the pipeline to generate custom ML models directly into iOS developers project, from the data processing and selection, training and tuning of ML algorithms, validation protocols and evaluation metrics suitable for each situation and finally. It is also possible to export the model to CoreML. Our framework is based on Scikit-learn, which is one of the most used frameworks nowadays. One of the major advantages of Scikit-learn is that it is mostly written in Python, which is accessible even for non-programmers. Also, the data processing incorporated in our framework is based in Pandas. Our main goal is to bring the benefits of machine learning to every iOS developer, with a framework focused on productivity, bringing richer and more immersive experiences for the end user.
let datasetName = "german.csv" let testFileUrl = playgroundSharedDataDirectory.appendingPathComponent(datasetName)
let dataset = ReadCsv(path: testFileUrl, separator:",") var dfHeader = Header(features: dataset.getCsvHeader())
var df = DataFrame(inputData: dataset.getData(), header: dfHeader, metaAttributeIndex: 24) var data = df.trainTestSplit(percent: 0.8)
var train = data["train"]!, test = data["test"]! var treino = train.splitMetaAttribute(dataframe: train), teste = train.splitMetaAttribute(dataframe: test)
let clf = Knn(k: 7) clf.fit(x_train: treino["x"]!, y_train: treino["y"]!) let pred = clf.predict(x_test: teste["x"]!, y_test: teste["y"]!)
let cf = ConfusionMatrix(y_pred: pred, y_real: teste["y"]!.showDf()[teste["y"]!.getMetaIndex()]!)
cf.matrix()
teste["x"]!.shape()
teste["x"]!.getValues(index: 1)["values"] teste["x"]!.getValues(index: 1, normalize: true) teste["x"]!.getValues(index: 1)["values"]