FIL Spark is a trustless proof of retrievability protocol for verifying the retrievability of public data stored with Filecoin Storage Providers.
Background
Historically, when Filecoin Storage Providers (SPs) accepted deals from the FIL+ program for large data sets (FIL+ LDN), there was an agreement to make all the data readily retrievable on the network.
More recently, with the addition of allocator pathways, certain allocators are requiring Storage Providers to provide retrievability for data in deals to which they allocate datacap.
In both of these cases, SPs are expected to store a “hot” unsealed copy of the deal data for fast retrievals, alongside the sealed copy.
However, the core Filecoin protocol does not enforce this agreement to make data readily retrievable. As a result, before Spark, only a tiny fraction of data stored in LDN FIL+ deals or through allocator pathways that required retrievability was retrievable. There needed to be a way to verify the retrievability once the relevant deals had been made.
Problem Statement
To improve this situation, we first and foremost needed more data about content retrievability. You can’t improve what you can’t measure. However, measuring whether Filecoin data was publicly retrievable posed several tough challenges:
- Large scale: Filecoin has over 1 million gigabytes worth of storage deals, over 47 million active deals, and over 15 million Piece CIDs. Measuring retrievals of a meaningful subset of such a large dataset requires a lot of computing power and network bandwidth.
- Trust: How can the community trust the measurements? Who operates the checker nodes, what protocol is followed by the network, and how well can the system detect fraud?
- Funding: Who is going to
- pay for running the nodes orchestrating the work?
- compensate for the resources consumed by the nodes performing retrieval checks?
- fund the teams building these tools?
Spark aims to solve these challenges, provide trusted metrics about retrievals performed from real user networks, and ultimately drive improvements in the retrievability of data stored on Filecoin.
Spark
Handle Large Scale
SPARK leans on a decentralised network of permissionless checker nodes. Anybody can run a SPARK node, perform retrieval checks and earn rewards for doing so. Running a SPARK node is super easy, thanks to Filecoin Station - a desktop app aimed at non-technical users. There’s nothing to configure, just launch the app, and you’re all set.
Instil Trust
We put many anti-fraud features into the SPARK protocol. Each one makes it more and more difficult for node operators to cheat.
- We use IPv4 addresses as a scarce resource, making Sybil attacks too expensive.
- The network performs retrieval checks with a high redundancy. This allows us to compare results submitted by different nodes and build committees large enough to give us strong confidence in an honest majority.
- The orchestration layer has a decentralised design, allowing multiple interested parties to run the network components.
- The SPARK protocol is highly verifiable. The data used for evaluation and fraud detection is publicly available and committed to on chain. The evaluation and fraud-detection process is also deterministic - anybody can re-run the algorithm on public data to get the same results as we did.
Provide Funding
SPARK leverages the Meridian Framework to handle rewards for node operators. Meridian is a smart contract running on FEVM, which evaluates the impact of individual nodes verifiably. To fund Spark, funders can drop rewards into the Spark smart contract, governed by the Meridian Framework.
Funding is arguably the most challenging part. There have been many attempts to build a permission-less network of incentivised nodes. Most (if not all) of them got minimal traction because they could not secure sustainable funding.
The Station (Spark) Team is looking into ways to link Spark to datacap allocation in order to incentivise Storage Providers to provide a good retrieval service, as well as create a long term and sustainable future for Spark funding. The team is also exploring the possibility of a retrieval SLA fund, where Storage Providers can stake to earn if they meet certain retrieval SLAs.
There are also early discussion about a Spark token.