SIP: 6
Title: Farmer Load Balancing Based on Reputation
Author: Braydon Fuller <braydon@storj.io>
Moby von Briesen <moby@storj.io>
Status: Active
Type: Standard
Created: 2017-06-08
This document details a proposal to change how to select farmers to store data. The new method of publishing storage contracts takes advantage of farmers' reliability and optional qualitative metrics while simultaneously ensuring that every farmer gets a chance to store files for the network.
- There is currently a lot of congestion on the Storj network caused by communication between bridges and farmers with respect to publishing storage offers. This communication can be simplified.
- Farmers are not selected to store shards based on reputation at the moment. A farmer that is less reliable currently has (roughly) an equal chance of being selected to store a shard as a farmer that is more reliable.
- Farmers are not selected to store shards based on geolocation at the moment. By implementing geolocation, file upload and download speeds could be noticeably improved for individual users. In addition, geolocation implementation would incentivise new farmers in areas where, for instance, there are a lot of users uploading files to the storage network, but where there are not a lot of farmers.
- Storj farmers are not currently load balanced. One farmer that is just as reputable as another does not necessarily have an equal chance of receiving a storage offer, since offers are propagated based on the contacts of a particular bridge, and the contacts of that bridge's contacts (this is related to the first point)
There are two groups of farmers for the purpose of selecting a farmer to store data:
- Active Farmers - Farmers that are considered fast and reliable
- Benchmarking Farmers - Farmers that are going through a benchmarking phase
Publication of contracts will be sent to each of these groups in different proportions. For example, 75% of the contracts could be sent to active farmers and 25% to benchmarking farmers. This provides an opportunity for farmers with poor metrics to improve, while farmers with better metrics would receive more contracts for a more stable network. To make sure that there is equal distribution of requests, farmers will be selected randomly by having a nodeID closest to 20 random bytes.
There is one metric that can be used to divide an active vs. a benchmarking farmer. This implementation and metric can change over time and could be based on:
- A farmer's estimated moving average response time (currently available)
- A reputation field based on several metrics
The exact threshold used to divide the two groups can use a constant BENCHMARK_THRESHOLD
that is updated periodically to represent the 25th percentile of all of the known contacts based on the metrics above. This way, the best 75% of farmers are considered active and the remainder are considered benchmarking. This is to keep the two pools of farmers proportional to the number of contract publications to the them, as described earlier.
The response time for a contact is calculated as an estimated moving average. Previously unknown contacts start at 10 seconds and build improved times with each response, this identify cost can be further expanded by ideas discussed in SIP2 [1].
The estimated moving average is calculated by:
// The number of requests used in calculating the moving average
const p = 1000;
const lastResponseTime = contact.responseTime || 10000;
// Calculate the exponential moving average
const k = 2 / (p + 1);
const newResponseTime = responseTime * k + lastResponseTime * (1 - k);
The following equations could be used to calculate such a reputation:
For fields i = 1...N
If higher = better reputation:
boundedFieldI = (fieldI - LOWER_BOUND_I)/(UPPER_BOUND_I - LOWER_BOUND_I)
If lower = better reputation:
boundedFieldI = 1 - (fieldI - LOWER_BOUND_I)/(UPPER_BOUND_I - LOWER_BOUND_I)
0 <= FIELD_I_WEIGHT <= 1
FIELD_1_WEIGHT + ... + FIELD_N_WEIGHT = 1
reputation = FIELD_1_WEIGHT * boundedField1 + ... + FIELD_N_WEIGHT * boundedFieldN
For instance, if a contact's reputation was based on responseTime
and timeoutRate
, reputation might look like this:
boundedResponseTime = 1 - responseTime / 10000
boundedTimeoutRate = 1 - timeoutRate
reputation = 0.75 * boundedResponseTime + 0.25 * boundedTimeoutRate
This specification adds a new command to the farmer JSON-RPC API called ALLOC
, that will replace the existing message flow for contract publication, including; PUBLISH
with a contract to farmers over quasar, OFFER
a farmer sends a signed contract back to a renter, and CONSIGNMENT
where the renter gets a token to upload the data. All of these will be encapsulated in a single request and response to a farmer from a renter (bridge).
The bridge will have a database of contacts and use the reputation algorithm described above to pick contacts, the bridge will then send a ALLOC
message with the format (example):
{
"method": "ALLOC",
"params": {
"contract": {
"version": 0,
"renter_id": "608939ffcaac7d8019ab43bd097124025d3c47e9",
"renter_hd_key": "xpub6AMzSjSPt3Dc3Z1nziingqQcbHFLtKMsVjYXW1HSPDFihcj...",
"renter_hd_index": 0,
"renter_signature": "3045022100de2e162d017a1e9d0ebfe2a94df3fc847b6828...",
"farmer_id": "",
"farmer_signature": "...",
"data_size": 8388608,
"data_hash": "e6114b9763d433808f6fdf632c00b919a439bc97",
"store_begin": 1477505878758,
"store_end": 1485281902724,
"audit_count": 3,
"payment_storage_price": 0,
"payment_download_price": 0,
"payment_destination": "1DL5N2ayoUxHwgD2NEZ4ifq1wVRVLVQyk2"
},
"contact": {
"address": "10.0.0.2",
"port": 1337,
"nodeID": "89cc3ddb4209c6e7e301c10c0257adf4fd85f253",
"hdKey": "xpub...",
"hdIndex": 12,
"protocol": "1.1.0"
}
}
}
The farmer should then respond with an error to reject the contract or response with a signed contract and a token to authorize the client to upload data (example):
{
"token": "7f518cb18f0e530f8719791d8219d239fc66fd72ece7e3f25de82274aa1e35de",
"contract": {
"version": 0,
"renter_id": "608939ffcaac7d8019ab43bd097124025d3c47e9",
"renter_hd_key": "xpub6AMzSjSPt3Dc3Z1nziingqQcbHFLtKMsVjYXW1HSPDFihcj...",
"renter_hd_index": 0,
"renter_signature": "3045022100de2e162d017a1e9d0ebfe2a94df3fc847b6828...",
"farmer_id": "8d225ed9aec6e2dfb74d51c84c58ff13d11cfa79",
"farmer_signature": "3045022100de2e162d017a1e9d0ebfe2a94df3fc847b6828...",
"data_size": 8388608,
"data_hash": "e6114b9763d433808f6fdf632c00b919a439bc97",
"store_begin": 1477505878758,
"store_end": 1485281902724,
"audit_count": 3,
"payment_storage_price": 0,
"payment_download_price": 0,
"payment_destination": "1DL5N2ayoUxHwgD2NEZ4ifq1wVRVLVQyk2"
}
}
Bridges (renters) will need to build a database of contacts to send contracts, this has previously been handled by the publication of contracts via quasar and anyone who joins the network. As detailed earlier there are issues with the reliability and efficiency in using this method. So instead farmers will contact bridges directly to update their current contact information and preferences.
There is currently a trusted keys configuration for each farmer that defines which bridges to accept contracts, this configuration can be extended to include a bridge URL to notify changes to a farmers contact data. This directory can be discovered via an Ethereum contract as described in SIP7.
A farmer will then be responsible for updating their contact details and preferences to each bridge that it wishes to receive contracts. To discourage a bridge from being Sybil attacked, any new contacts will be given a proof-of-work challenge to add their contact. Once added to the contacts of a bridge, the bridge will then start sending contracts to store data to build up a reputation.
A bridge will define a target difficulty to match an expected rate of total number of new contacts being added in a time period. The target can be recalculated based on the actual performance compared to the target performance, similar to:
target_contacts_per_minute = 100;
target = 0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff;
actual_contacts_per_minute = 200;
new_target = target / actual_contacts_per_minute / target_contacts_per_minute;
The farmer will first send an HTTP POST request to a bridge /contacts/challenges
endpoint, the response will include (example):
{
"challenge": "b8ef2693002ecdb838a2512fdee4281a84a94f50f339e9a7786ba648a5161546",
"target": "0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"
}
The farmer will then use challenge
and generate a hash with a nonce
as the salt until the resulting value is less than the target hash using the scrypt key derivation function. The specific parameters for the scrypt function are: N: 1024, r: 1, p: 1
. This will then be included with further requests to create contacts along with a signature to verify updates in future requests.
The signature should be an ECDSA signature of the sha256 hash of the following concatenated data:
<http-method><bridge-url><timestamp><http-body>
The farmer will then send an HTTP POST request to a bridge at /contacts
with the headers (example):
"x-challenge": "b8ef2693002ecdb838a2512fdee4281a84a94f50f339e9a7786ba648a5161546"
"x-challenge-nonce": "9ce38b16"
"x-node-id: "1bcb6e955f486780bae7eba12ea75175191fe599"
"x-node-timestamp": 1502150944050
"x-node-pubkey": "03919b139ed36a0ad2b709fc639d70c59f75445c0ac6b4adf7ad52b7321761d115"
"x-node-signature: "fd62894f0074c1c6ee56c7ec5e28a0d686e122f388610ddb8a7229e98d2052bc397333..."
"content-type": "application/json"
And with the JSON body (example):
{
"address": "2001:0db8:0000:0042:0000:8a2e:0370:7334",
"port": 4005,
"spaceAvailable": true
}
A bridge will then verify that:
- The challenge exists (used challenges are removed)
- The challenge and nonce meets the expected target difficulty
- The pubkey and node id correspond
- The timestamp is within an error threshold
- The signature is valid
- A contact does not already exist or the effects would be identical
Note: These contact details can be updated in the future to include other preferences such as STORJ prices. Another later improvement could be made to use an Ethereum contract as a farmer directory to share this state between decentralized applications, as described in #20
Farmers can later update their contact details, without a proof-of-work challenge, by sending an HTTP PATCH request to /contacts/<node-id>
with the fields that should be updated, these requests should also include signature headers (example):
{
"address": "2001:0db8:0000:0042:0000:8a2e:0370:7334",
"port": 4006,
"spaceAvailable": false
}
There can be other enhancements to farmer selection to give bias to some farmers over others to narrow the active and benchmarking pools.
One flaw with calculating reputation based on response time or similar metrics will give bias to farmers geographically closer to Storj bridge services, especially prevalent when clients are also further away from the bridge.
There are services that allow for converting IP addresses to approximate location (latitude/longitude) [2]. In addition, MongoDB has support for geospatial indexes and queries [3]. Using these services, the database queries described above can be refined to prioritize farmers that are geographically closer to the user uploading a file.
This benefits users, as upload and download speed will be better for geographically closer farmers. It also benefits farmers, since they will be competing on a more local scale for reputation. In addition, the existence of regions that are abundant in users but scarce in farmers will provide incentives for farmers to pop up in those regions.
At the moment, there is no specific distinction between mirrors (i.e. there is no distinction between a "main shard" and a "mirrored shard"). Contracts are awarded in the order that they arrive, meaning the ranking of different mirrors is based on responseTime
. In other words, the fastest responding farmer is awarded the first mirror, and that will be the first mirror the client attempts to use to download the corresponding shard.
While responseTime
is starting point metric to determine a history of successful responses, as discussed above, this might not be the case in the future, especially as the details of what defines the reputation of a farmer evolve. A useful optimization in the future would be ordering mirrors by reputation so that upon downloading a shard, the client is downloading it from the most reputable farmer available.
One useful metric to factor into a farmer selection is shard transfer speed, or bandwidth. Because of the decentralized nature of the network, this information requires some extra work to attain. Reports sent from both the client and the farmer with times of the transfers can be used to establish this information, a.k.a. Exchange Reports.
An exchange report for the purposes of this optimization would contain a client ID, a farmer ID, a reporter ID (to distinguish whether it was submitted by the farmer or the client), a shard hash, and the transfer time of the shard associated with the exchange report.
The trustworthiness of a farmer or client can be determined by grouping exchange reports associated with that farmer or client into pairs (each exchange report from a client will have a corresponding exchange report from a farmer). By calculating the number of exchange report pairs where a particular reporter (farmer or client) disagrees with its counterpart, and dividing this number by the total number of exchange report pairs, a trustworthiness percentage can be found for a particular farmer or client.
Once farmers and clients have a trustworthiness metric associated with them based on exchange reports, the bandwidth of a specific farmer can be determined based on its exchange reports. Transfer time and shard size can be combined to produce a bandwidth metric for each exchange report pair for a farmer. In cases where the farmer and client are in agreement about transfer time, that transfer time will be used. Otherwise, the transfer time reported by the more trustworthy of the two will be used.
This idea still needs a lot of development, especially since the bandwidth of a farmer should not be absolute. Bandwidth largely depends on physical distance between a client and a farmer. In addition, the method of calculating bandwidth described above is computationally heavy, so should not be done very frequently. With more research, these problems can likely be overcome.
In the future there may be additional transfer protocols supported. When these protocols are introduced, clients may wish to give bias to require the use of specific protocols. This could range from the added support of the Micro Transport Protocol or µTP as popularized by BitTorrent, to other protocols yet to exist or have more advance erasure encoding schemes. This will allow for the evolution and growth of protocol development without breaking compatibility.
- storj-archived/integration#5
- storj-archived/core#704
- storj-archived/complex#71
- storj-archived/service-storage-models#117
- storj-archived/bridge#464
- https://github.com/Storj/sips/blob/main/sip-0002.md
- https://www.maxmind.com/en/geoip2-databases
- https://docs.mongodb.com/manual/applications/geospatial-indexes/
- https://medium.com/@storjproject/how-to-ddos-yourself-dbcdc3625bd0
- https://en.wikipedia.org/wiki/Scrypt
- #20
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.