docs(glossary) Create a glossary folder with the .mdx files (#4165)

adap · Sep 13, 2024 · 2105a33 · 2105a33
1 parent 486ca70
commit 2105a33
Show file tree

Hide file tree

Showing 16 changed files with 414 additions and 0 deletions.
diff --git a/glossary/aggregation.mdx b/glossary/aggregation.mdx
@@ -0,0 +1,18 @@
+---
+title: "Aggregation"
+description: "Combine model weights from sampled clients to update the global model. This process enables the global model to learn from each client's data."
+date: "2024-05-23"
+author:
+  name: "Charles Beauville"
+  position: "Machine Learning Engineer"
+  website: "https://www.linkedin.com/in/charles-beauville/"
+  github: "github.com/charlesbvll"
+related: 
+  - text: "Federated Learning"
+    link: "/glossary/federated-learning"
+  - text: "Tutorial: What is Federated Learning?"
+    link: "/docs/framework/tutorial-series-what-is-federated-learning.html"
+---
+
+During each Federated Learning round, the server will receive model weights from sampled clients and needs a function to improve its global model using those weights. This is what is called `aggregation`. It can be a simple weighted average function (like `FedAvg`), or can be more complex (e.g. incorporating optimization techniques). The aggregation is where FL's magic happens, it allows the global model to learn and improve from each client's particular data distribution with only their trained weights.
+
diff --git a/glossary/client.mdx b/glossary/client.mdx
@@ -0,0 +1,17 @@
+---
+title: "Client"
+description: "A client is any machine with local data that connects to a server, trains on received global model weights, and sends back updated weights. Clients may also evaluate global model weights."
+date: "2024-05-23"
+author:
+  name: "Charles Beauville"
+  position: "Machine Learning Engineer"
+  website: "https://www.linkedin.com/in/charles-beauville/"
+  github: "github.com/charlesbvll"
+related: 
+  - text: "Federated Learning"
+    link: "/glossary/federated-learning"
+  - text: "Tutorial: What is Federated Learning?"
+    link: "/docs/framework/tutorial-series-what-is-federated-learning.html"
+---
+
+Any machine with access to some data that connects to a server to perform Federated Learning. During each round of FL (if it is sampled), it will receive global model weights from the server, train on the data they have access to, and send the resulting trained weights back to the server. Clients can also be sampled to evaluate the global server weights on the data they have access to, this is called federated evaluation.
diff --git a/glossary/docker.mdx b/glossary/docker.mdx
@@ -0,0 +1,22 @@
+---
+title: "Docker"
+description: "Docker is a containerization tool that allows for consistent and reliable deployment of applications across different environments."
+date: "2024-07-08"
+author:
+  name: "Robert Steiner"
+  position: "DevOps Engineer at Flower Labs"
+  website: "https://github.com/Robert-Steiner"
+---
+
+Docker is an open-source containerization tool for deploying and running applications. Docker
+containers encapsulate an application's code, dependencies, and configuration files, allowing
+for consistent and reliable deployment across different environments.
+
+In the context of federated learning, Docker containers can be used to package the entire client
+and server application, including all the necessary dependencies, and then deployed on various
+devices such as edge devices, cloud servers, or even on-premises servers.
+
+In Flower, Docker containers are used to containerize various applications like `SuperLink`,
+`SuperNode`, and `SuperExec`. Flower's Docker images allow users to quickly get Flower up and
+running, reducing the time and effort required to set up and configure the necessary software
+and dependencies.
diff --git a/glossary/edge-computing.mdx b/glossary/edge-computing.mdx
@@ -0,0 +1,40 @@
+---
+title: "Edge Computing"
+description: "Edge computing is a distributed computing concept of bringing compute and data storage as close as possible to the source of data generation and consumption by users."
+date: "2024-09-10"
+author:
+  name: "Chong Shen Ng"
+  position: "Research Engineer @ Flower Labs"
+  website: "https://discuss.flower.ai/u/chongshenng"
+  github: "github.com/chongshenng"
+related: 
+  - text: "IoT"
+    link: "/glossary/iot"
+  - text: "Run Flower using Docker"
+    link: "/docs/framework/docker/index.html"
+  - text: "Flower Clients in C++"
+    link: "/docs/examples/quickstart-cpp.html"
+  - text: "Federated Learning on Embedded Devices with Flower"
+    link: "/docs/examples/embedded-devices.html"
+---
+
+### Introduction to Edge Computing
+
+Edge computing is a distributed computing concept of bringing compute and data storage as close as possible to the source of data generation and consumption by users. By performing computation close to the data source, edge computing aims to address limitations typically encountered in centralized computing, such as bandwidth, latency, privacy, and autonomy.
+
+Edge computing works alongside cloud and fog computing, but each serves different purposes. Cloud computing delivers on-demand resources like data storage, servers, analytics, and networking via the Internet. Fog computing, however, brings computing closer to devices by distributing communication and computation across clusters of IoT or edge devices. While edge computing is sometimes used interchangeably with fog computing, edge computing specifically handles data processing directly at or near the devices themselves, whereas fog computing distributes tasks across multiple nodes, bridging the gap between edge devices and the cloud.
+
+### Advantages and Use Cases of Edge Computing
+
+The key benefit of edge computing is that the volume of data moved is significantly reduced because computation runs directly on board the device on the acquired data. This reduces the amount of long-distance communication between machines, which improves latency and reduces transmissions costs. Examples of edge computing that benefit from offloading computation include:
+1. Smart watches and fitness monitors that measure live health metrics.
+2. Facial recognition and wake word detection on smartphones.
+3. Real-time lane departure warning systems in road transport that detect lane lines using on-board videos and sensors.
+
+### Federated Learning in Edge Computing
+
+When deploying federated learning systems, edge computing is an important component to consider. Edge computing typically take the role of "clients" in federated learning. In a healthcare use case, servers in different hospitals can train models on their local data. In mobile computing, smartphones perform local training (and inference) on user data such as for next word prediction.
+
+### Edge Computing with Flower
+
+With the Flower framework, you can easily deploy federated learning workflows and maximise the use of edge computing resources. Flower provides the infrastructure to perform federated learning, federated evaluation, and federated analytics, all in a easy, scalable and secure way. Start with our tutorial on running Federated Learning on Embedded Devices (link [here](https://github.com/adap/flower/tree/main/examples/embedded-devices)), which shows you how to run Flower on NVidia Jetson devices and Raspberry Pis as your edge compute.
diff --git a/glossary/evaluation.mdx b/glossary/evaluation.mdx
@@ -0,0 +1,19 @@
+---
+title: "Evaluation"
+description: "Evaluation measures how well the trained model performs by testing it on each client's local data, providing insights into its generalizability across varied data sources."
+date: "2024-07-08"
+author:
+  name: "Heng Pan"
+  position: "Research Scientist"
+  website: "https://discuss.flower.ai/u/pan-h/summary"
+  github: "github.com/panh99"
+related: 
+  - text: "Server"
+    link: "/glossary/server"
+  - text: "Client"
+    link: "/glossary/client"
+---
+
+Evaluation in machine learning is the process of assessing a model's performance on unseen data to determine its ability to generalize beyond the training set. This typically involves using a separate test set and various metrics like accuracy or F1-score to measure how well the model performs on new data, ensuring it isn't overfitting or underfitting.
+
+In federated learning, evaluation (or distributed evaluation) refers to the process of assessing a model's performance across multiple clients, such as devices or data centers. Each client evaluates the model locally using its own data and then sends the results to the server, which aggregates all the evaluation outcomes. This process allows for understanding how well the model generalizes to different data distributions without centralizing sensitive data.
diff --git a/glossary/federated-learning.mdx b/glossary/federated-learning.mdx
@@ -0,0 +1,14 @@
+---
+title: "Federated Learning"
+description: "Federated Learning is a machine learning approach where model training occurs on decentralized devices, preserving data privacy and leveraging local computations."
+date: "2024-05-23"
+author:
+  name: "Julian Rußmeyer"
+  position: "UX/UI Designer"
+  website: "https://www.linkedin.com/in/julian-russmeyer/"
+related: 
+  - text: "Tutorial: What is Federated Learning?"
+    link: "/docs/framework/tutorial-series-what-is-federated-learning.html"
+---
+
+Federated learning is an approach to machine learning in which the model is trained on multiple decentralized devices or servers with local data samples without exchanging them. Instead of sending raw data to a central server, updates to the model are calculated locally and only the model parameters are aggregated centrally. In this way, user privacy is maintained and communication costs are reduced, while collaborative model training is enabled.
diff --git a/glossary/grpc.mdx b/glossary/grpc.mdx
@@ -0,0 +1,44 @@
+---
+title: "gRPC"
+description: "gRPC is an inter-process communication technology for building distributed apps. It allows developers to connect, invoke, operate, and debug apps as easily as making a local function call."
+date: "2024-09-10"
+author:
+  name: "Chong Shen Ng"
+  position: "Research Engineer @ Flower Labs"
+  website: "https://discuss.flower.ai/u/chongshenng"
+  github: "github.com/chongshenng"
+related: 
+  - text: "Federated Learning"
+    link: "/glossary/federated-learning"
+  - text: "Tutorial: What is Federated Learning?"
+    link: "/docs/framework/tutorial-series-what-is-federated-learning.html"
+  - text: "Protocol Buffers"
+    link: "/glossary/protocol-buffers"
+  - text: "Google: gRPC - A true internet scale RPC framework"
+    link: "https://cloud.google.com/blog/products/gcp/grpc-a-true-internet-scale-rpc-framework-is-now-1-and-ready-for-production-deployments"
+---
+
+### Introduction to gRPC
+
+gRPC is an inter-process communication technology for building distributed applications. It allows you to connect, invoke, operate, and debug these applications as easily as making a local function call. It can efficiently connect services in and across data centers. It is also applicable in the last mile of distributed computing to connect devices, mobile applications, and browsers to backend services. Supporting various languages like C++, Go, Java, and Python, and platforms like Android and the web, gRPC is a versatile framework for any environment. 
+
+Google first [open-sourced gRPC in 2016](https://cloud.google.com/blog/products/gcp/grpc-a-true-internet-scale-rpc-framework-is-now-1-and-ready-for-production-deployments), basing it on their internal remote procedure call (RPC) framework, Stubby, designed to handle tens of billions of requests per second. Built on HTTP/2 and protocol buffers, gRPC is a popular high-performance framework for developers to built micro-services. Notable early adopters of gRPC include Square, Netflix, CockroachDB, Cisco, and Juniper Networks.
+
+By default, gRPC uses protocol buffers - Google's language-neutral and platform-neutral mechanism for efficiently serializing structured data - as its interface definition language and its underlying message interchange format. The recommended protocol buffer version as of writing is `proto3`, though other formats like JSON can also be used. 
+
+### How does it work?
+
+gRPC operates similarly to many RPC systems. First, you specify the methods that can be called remotely on the server application, along with their parameters and return type. Then, with the appropriate code (more on this below), a gRPC client application can directly call these methods on the gRPC server application on a different machine as if it were a local object. Note that the definitions of client and server in gRPC is different to federated learning. For clarity, we will refer to client (server) applications in gRPC as gRPC client (server) applications.
+
+To use gRPC, follow these steps:
+1. Define structure for the data you want to serialize in a proto file definition. `*.proto`.
+2. Run the protocol buffer compiler `protoc` to generate to data access classes in the preferred language from the `*.proto` service definitions. This step generates the gRPC client and server code, as well as the regular protocol buffer code for handling your message types.
+3. Use the generated class in your application to populate, serialize, and retrieve the class protocol buffer messages.
+
+### Use cases in Federated Learning
+
+There are several reasons why gRPC is particularly useful in federated learning. First, clients and server in a federation rely on stable and efficient communication. Using Protobuf, a highly efficient binary serialization format, gRPC overcomes the bandwidth limitations in federated learning, such as in low-bandwidth mobile connections. Second, gRPC’s language-independent communication allows developers to use a variety of programming languages, enabling broader adoption for on-device executions.
+
+### gRPC in Flower
+
+gRPC's benefits for distributed computing make it a natural choice for the Flower framework. Flower uses gRPC as its primary communication protocol. To make it easier to build your federated learning systems, we have introduced high-level APIs to take care of the serialization and deserialization of the model parameters, configurations, and metrics. For more details on how to use Flower, follow our "Get started with Flower" tutorial here.
diff --git a/glossary/inference.mdx b/glossary/inference.mdx
@@ -0,0 +1,21 @@
+---
+title: "Inference"
+description: "Inference is the phase in which a trained machine learning model applies its learned patterns to new, unseen data to make predictions or decisions."
+date: "2024-07-12"
+author:
+  name: "Yan Gao"
+  position: "Research Scientist"
+  website: "https://discuss.flower.ai/u/yan-gao/"
+  github: "github.com/yan-gao-GY"
+related:
+  - text: "Federated Learning"
+    link: "/glossary/federated-learning"
+  - text: "Server"
+    link: "/glossary/server"
+  - text: "Client"
+    link: "/glossary/client"
+---
+
+Inference, also known as model prediction, is the stage in the machine learning workflow where a trained model is used to make predictions based on new, unseen data. In a typical machine learning setting, model inference involves the following steps: model loading, where the trained model is loaded into the application or service where it will be used; data preparation, which preprocess the new data in the same way as the training data; and model prediction, where the prepared data is fed into the model to compute outputs based on the learned patterns during training.
+
+In the context of federated learning (FL), inference can be performed locally on the user's device. A global model updated from FL process is deployed and loaded on individual nodes (e.g., smartphones, hospital servers) for local inference. This allows for keeping all data on-device, enhancing privacy and reducing latency.
diff --git a/glossary/iot.mdx b/glossary/iot.mdx
@@ -0,0 +1,48 @@
+---
+title: "IoT"
+description:  "The Internet of Things (IoT) refers to devices with sensors, software, and tech that connect and exchange data with other systems via the internet or communication networks."
+date: "2024-09-10"
+author:
+  name: "Chong Shen Ng"
+  position: "Research Engineer @ Flower Labs"
+  website: "https://discuss.flower.ai/u/chongshenng"
+  github: "github.com/chongshenng"
+related: 
+  - text: "Edge Computing"
+    link: "/glossary/edge-computing"
+  - text: "Run Flower using Docker"
+    link: "/docs/framework/docker/index.html"
+  - text: "Flower Clients in C++"
+    link: "/docs/examples/quickstart-cpp.html"
+  - text: "Federated Learning on Embedded Devices with Flower"
+    link: "/docs/examples/embedded-devices.html"
+  - text: "Cisco: Redefine Connectivity by Building a Network to Support the Internet of Things"
+    link: "https://www.cisco.com/c/en/us/solutions/service-provider/a-network-to-support-iot.html"
+---
+
+### Introduction to IoT
+
+The Internet of Things (IoT) describe devices with sensors, processing ability, software, and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks. IoT is often also referred as Machine-to-Machine (M2M) connections. Examples of IoT include embedded systems, wireless sensor networks, control systems, automation (home and building). In the consumer market, IoT technology is synonymous with smart home products. The IoT architecture bears resemblance to edge computing, but more broadly encompasses edge devices, gateways, and the cloud.
+
+### Use cases in Federated Learning
+
+From the perspective of federated learning, IoT systems provide two common configurations: first as a data source for training, and second as a point for running inference/analytics.
+
+Cisco's Global Cloud Index estimated that nearly 850 Zettabytes (ZB) of data will be generated by all people, machines and things in 2021 ([link](https://www.cisco.com/c/en/us/solutions/service-provider/a-network-to-support-iot.html) to article). In IoT, the data is different because not all of the data needs to be stored and instead, the most impactful business values come from running computations on the data. This positions IoT as an ideal candidate for implementing federated learning systems, where a model trained on a datastream from a single device may not be useful, but when trained collaboratively on hundreds or thousands of devices, yields a better performing and generalisable model. The key benefit is that the generated data remains local on the device and can even be offloaded after multiple rounds of federated learning. Some examples are presented below.
+
+Once a model is trained (e.g. in a federated way), the model can be put into production. What this means is to deploy the model on the IoT device and compute predictions based on the newly generated/acquired data. 
+
+Federated learning in IoT can be organized on two axes: by industry and by use cases.
+
+For industry applications, examples include:
+1. Healthcare - e.g. vital sign, activity levels, or sleep pattern monitoring using fitness trackers.
+2. Transportation - e.g. trajectory prediction, object detection, driver drowsiness detection using on-board sensors and cameras.
+
+For use cases, examples include:
+1. Predictive maintenance - e.g. using data acquired from physical sensors (impedence, temperature, vibration, pressure, viscosity, etc ...)
+2. Anomaly detection - e.g. using environmental monitoring sensors for predicting air, noise, or water pollution, using internet network traffic data for network intrusion detection, using fiber optic sensors for remote sensing and monitoring, etc ...
+3. Quality assurance and quality control - e.g. using in-line optical, acoustic, or sensor data during manufacturing processes to identify faulty products, etc ...
+
+### Using Flower for Federated Learning with IoT
+
+Flower is developed with a deployment engine that allows you to easily deploy your federated learning system on IoT devices. As a Data Scientist/ML Engineer, you will only need to write ClientApps and deploy them to IoT devices without needing to deal with the infrastructure and networking. To further help deployment, we provide [Docker images](https://hub.docker.com/u/flwr) for the SuperLink, SuperNode, and ServerApp so that you can easily ship the requirements of your Flower applications in containers in a production environment. Lastly, Flower supports the development of both Python and C++ clients, which provides developers with flexible ways of building ClientApps for resource-contrained devices.