From c0662a8bba6e480d74d43f0acb386e958e6607c1 Mon Sep 17 00:00:00 2001 From: Rebecca Valentine Date: Sat, 29 Jan 2022 20:08:29 -0800 Subject: [PATCH 1/8] Adds a preliminary guide content --- docs/guide/guide.css | 23 ++++++ docs/guide/guide.html | 163 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 186 insertions(+) create mode 100644 docs/guide/guide.css create mode 100644 docs/guide/guide.html diff --git a/docs/guide/guide.css b/docs/guide/guide.css new file mode 100644 index 00000000..51197c57 --- /dev/null +++ b/docs/guide/guide.css @@ -0,0 +1,23 @@ +* { + font-family: sans-serif; +} + +body { + display: flex; + justify-content: center; +} + +#content { + width: 500px; + background-color: #eeeeee; + padding: 5px; +} + +.section-toc, .subsection-toc { + list-style: none; +} + +.section-name { + font-weight: 600; + margin-bottom: 5px; +} \ No newline at end of file diff --git a/docs/guide/guide.html b/docs/guide/guide.html new file mode 100644 index 00000000..6132ab96 --- /dev/null +++ b/docs/guide/guide.html @@ -0,0 +1,163 @@ + + + + + + + Veilid Architecture Guide + + + + + + + + +
+ +

Veilid Architecture Guide

+ +
+ + + +
+ +

From Orbit

+ +

+ The first matter to address is the question "What is Veilid?" The highest-level description is that Veilid is a peer-to-peer network for easily sharing various kinds of data. +

+ +

+ Veilid is designed with a social dimension in mind, so that each user can have their personal content stored on the network, but also can share that content with other people of their choosing, or with the entire world if they want. +

+ +

+ The primary purpose of the Veilid network is to provide the infrastructure for a specific kind of shared data: social media in various forms. That includes light-weight content such as Twitter's tweets or Mastodon's toots, medium-weight content like images and songs, and heavy-weight content like videos. Meta-content such as personal feeds, replies, private messages, and so forth are also intended to run atop Veilid. +

+ +

Bird's Eye View

+ +

+ Now that we know what Veilid is and what we intend to put on it, the second order of business is to address the parts of the question of how Veilid achieves that. Not at a very detailed level, of course, that will come later, but rather at a middle level of detail such that all of it can fit in your head at the same time. +

+ +

Peer Network for Data Storage

+ +

+ The bottom-most level of Veilid is a network of peers communicating to one another over the internet. Peers send each other messages (remote procedure calls) about the data being stored on the network, and also messages about the network itself. For instance, one peer might ask another for some file, or it might ask for info about what other nodes exist in the network. +

+ +

+ The data stored in the network is segmented into two kinds of data: file-like data, which typically is large, and textual data, which typically is small. Each kind of data is stored in its own subsystem specifically chosen to optimize for that kind of data. +

+ +

Blockstore

+ +

+ File-like content is stored in a content-addressable block store. Each block is just some arbitrary blob of data (for instance, a JPEG or an MP4) of whatever size. The hash of that block acts as the unique identifier for the block, and can be used by peers to request particular blocks. Technically, textual data can be stored as a block as well, and this is expected to be done when the textual data is thought of as a document or file of some sort. +

+ +

Distributed Hash Table

+ +

+ Smaller, more ephemeral textual content generally, however, is stored in a distributed hash table (DHT). Things like status updates, blog posts, user bios, etc. are all thought of as being suited for storage in this part of the data store. DHT data is not simply "on the Veilid network", but also owned/controlled by peers, and identified by an arbitrary name chosen by the peers which owns the data. Any group of peers can add data, but can only change the data they've added. +

+ +

+ For instance, we might talk about Boone's bio vs. Boone's blogpost titled "Hi, I'm Boone!", which are two things owned by the same peer but with different identifiers, or on Boone's bio vs. Marquette's bio, which are two things owned by distinct peers but with the same identifier. +

+ +

+ DHT data is also versioned, so that updates to it can be made. Boone's bio, for instance, would not be fixed in time, but rather is likely to vary over time as he changes jobs, picks up new hobbies, etc. Versioning, together with arbitrary peer-chosen identifiers instead of content hashes, means that we can talk about "Boone's Bio" as an abstract thing, and subscribe to updates to it. +

+ +

Structuring Data

+ +

+ The combination of block storage and DHT storage together makes it possible to have higher-level concepts as well. A song, for instance, might be represented in two places in Veilid: the blockstore would hold the raw data, while the DHT would store a representation of the idea of the song. Maybe that would consist of a JSON object with metadata about the song, like the title, composer, date, encoding information, etc. as well as the ID of the blockstore data. We can then also store different versions of that JSON data, as the piece is updated, upsampled, remastered, or whatever, each one pointing to a different block in the blockstore. It's still "the same song", at a conceptual level, so it has the same identifier in the DHT, but the raw bits associated with each version differ. +

+ +

+ Another example of this, but with even more tenuous connection between the blockstore data, is the notion of a profile picture. "Marquette's Profile Picture" is a really abstracted notion, and precisely which bits it corresponds to can vary wildly over time, not just being different versions of the picture but completely different pictures entirely. Maybe one day its a photo of Marquette and the next day it's a photo of a flower. +

+ +

+ Social media offers many examples of these concepts. Friends lists, block lists, post indexes, favorites. These are all stateful notions, in a sense: a stable reference to a thing, but the precise content of the thing changes over time. These are exactly what we would store in the DHT, as opposed to in the blockstore, even if this data makes reference to content in the blockstore. +

+ +

Identity

+ +

+ As discussed above, peers talk to one another with RPCs, talk about one another by referencing each other on the network, own content stored in the DHT. This raises the question of how peers are identified and distinguished from one another. If the network was just an immutable blockstore, we could say that identity is just the IP address of the machine the peer is running on, since all that really matters is being able to get data from the peer. This would be like what BitTorrent or IPFS do, since they don't really have any concept of ownership and mutability of data. +

+ +

+ But because Veilid cares deeply about ownership of data and change over time, we chose a different approach: identity is a cryptographic keypair. This allows a peer to access the Veilid network from arbitrarily many different computers and IP addresses, over any communication medium. In practice, this means different devices (e.g. home machine vs smart phone), but in principle it could mean word of mouth and sneakernet. Veilid is agnostic to the particular substrate and communication medium. +

+ +

+ On the network, within the datastore, this means that a peer is identified by a public key, or a hash thereof. Changes to a peer's data in the DHT require that the peers attempting to make the change verify their identity as owners. Data can also, of course, be encrypted so that it can only be accessed by the owners, or by anyone else they choose. +

+ +

Privacy

+ +

+ In order to ensure that peers can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses. +

+ +

+ The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of two other peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that the first hop can see the second hop, but not the final destination, while the second hope can see the final destination. This is similar to a 2-hop Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each send. +

+ +

+ Receiver privacy is similar, in that we have a nesting doll of encrypted peer address, except because it's for incoming messages, the various addresses have to be shared ahead of time. We call such things Private Routes, and they are published to the DHT as part of a peer's public data. For full privacy on both ends, a Private Route will be used as the final destination of a Safety Route, so that a total of four intermediate hops are used to send a message so that neither the sender nor receiver knows the IP address of the other. +

+ +

On The Ground

+ +

+ The bird's eye view of things makes it possible to hold it all in mind at once, but leaves out lots of information about implementation choice. It's now time to come down to earth and get our hands dirty. +

+ +

+ TODO +

+ +
+ + \ No newline at end of file From 3ff46565fe5d2aa0b9df6053c4cc197314f6fb11 Mon Sep 17 00:00:00 2001 From: Rebecca Valentine Date: Sat, 29 Jan 2022 23:40:15 -0800 Subject: [PATCH 2/8] Removes detail about the DHT at the middle level of detail, makes some naming changes --- docs/guide/guide.html | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/guide/guide.html b/docs/guide/guide.html index 6132ab96..424c2910 100644 --- a/docs/guide/guide.html +++ b/docs/guide/guide.html @@ -30,10 +30,10 @@ Peer Network for Data Storage
  • - Blockstore + Block Store
  • - Distributed Hash Table + Key-Value Store
  • Structuring Data @@ -86,16 +86,16 @@ The data stored in the network is segmented into two kinds of data: file-like data, which typically is large, and textual data, which typically is small. Each kind of data is stored in its own subsystem specifically chosen to optimize for that kind of data.

    -

    Blockstore

    +

    Block Store

    File-like content is stored in a content-addressable block store. Each block is just some arbitrary blob of data (for instance, a JPEG or an MP4) of whatever size. The hash of that block acts as the unique identifier for the block, and can be used by peers to request particular blocks. Technically, textual data can be stored as a block as well, and this is expected to be done when the textual data is thought of as a document or file of some sort.

    -

    Distributed Hash Table

    +

    Key-Value Store

    - Smaller, more ephemeral textual content generally, however, is stored in a distributed hash table (DHT). Things like status updates, blog posts, user bios, etc. are all thought of as being suited for storage in this part of the data store. DHT data is not simply "on the Veilid network", but also owned/controlled by peers, and identified by an arbitrary name chosen by the peers which owns the data. Any group of peers can add data, but can only change the data they've added. + Smaller, more ephemeral textual content generally, however, is stored in a key-value-store (KV store). Things like status updates, blog posts, user bios, etc. are all thought of as being suited for storage in this part of the data store. KV store data is not simply "on the Veilid network", but also owned/controlled by peers, and identified by an arbitrary name chosen by the peers which owns the data. Any group of peers can add data, but can only change the data they've added.

    @@ -103,27 +103,27 @@

    - DHT data is also versioned, so that updates to it can be made. Boone's bio, for instance, would not be fixed in time, but rather is likely to vary over time as he changes jobs, picks up new hobbies, etc. Versioning, together with arbitrary peer-chosen identifiers instead of content hashes, means that we can talk about "Boone's Bio" as an abstract thing, and subscribe to updates to it. + KV store data is also versioned, so that updates to it can be made. Boone's bio, for instance, would not be fixed in time, but rather is likely to vary over time as he changes jobs, picks up new hobbies, etc. Versioning, together with arbitrary peer-chosen identifiers instead of content hashes, means that we can talk about "Boone's Bio" as an abstract thing, and subscribe to updates to it.

    Structuring Data

    - The combination of block storage and DHT storage together makes it possible to have higher-level concepts as well. A song, for instance, might be represented in two places in Veilid: the blockstore would hold the raw data, while the DHT would store a representation of the idea of the song. Maybe that would consist of a JSON object with metadata about the song, like the title, composer, date, encoding information, etc. as well as the ID of the blockstore data. We can then also store different versions of that JSON data, as the piece is updated, upsampled, remastered, or whatever, each one pointing to a different block in the blockstore. It's still "the same song", at a conceptual level, so it has the same identifier in the DHT, but the raw bits associated with each version differ. + The combination of block storage and key-value storage together makes it possible to have higher-level concepts as well. A song, for instance, might be represented in two places in Veilid: the block store would hold the raw data, while the KV store would store a representation of the idea of the song. Maybe that would consist of a JSON object with metadata about the song, like the title, composer, date, encoding information, etc. as well as the ID of the block store data. We can then also store different versions of that JSON data, as the piece is updated, upsampled, remastered, or whatever, each one pointing to a different block in the block store. It's still "the same song", at a conceptual level, so it has the same identifier in the KV store, but the raw bits associated with each version differ.

    - Another example of this, but with even more tenuous connection between the blockstore data, is the notion of a profile picture. "Marquette's Profile Picture" is a really abstracted notion, and precisely which bits it corresponds to can vary wildly over time, not just being different versions of the picture but completely different pictures entirely. Maybe one day its a photo of Marquette and the next day it's a photo of a flower. + Another example of this, but with even more tenuous connection between the block store data, is the notion of a profile picture. "Marquette's Profile Picture" is a really abstracted notion, and precisely which bits it corresponds to can vary wildly over time, not just being different versions of the picture but completely different pictures entirely. Maybe one day its a photo of Marquette and the next day it's a photo of a flower.

    - Social media offers many examples of these concepts. Friends lists, block lists, post indexes, favorites. These are all stateful notions, in a sense: a stable reference to a thing, but the precise content of the thing changes over time. These are exactly what we would store in the DHT, as opposed to in the blockstore, even if this data makes reference to content in the blockstore. + Social media offers many examples of these concepts. Friends lists, block lists, post indexes, favorites. These are all stateful notions, in a sense: a stable reference to a thing, but the precise content of the thing changes over time. These are exactly what we would put in the KV store, as opposed to in the block store, even if this data makes reference to content in the block store.

    Identity

    - As discussed above, peers talk to one another with RPCs, talk about one another by referencing each other on the network, own content stored in the DHT. This raises the question of how peers are identified and distinguished from one another. If the network was just an immutable blockstore, we could say that identity is just the IP address of the machine the peer is running on, since all that really matters is being able to get data from the peer. This would be like what BitTorrent or IPFS do, since they don't really have any concept of ownership and mutability of data. + As discussed above, peers talk to one another with RPCs, talk about one another by referencing each other on the network, own content that's in the KV store. This raises the question of how peers are identified and distinguished from one another. If the network was just an immutable block store, we could say that identity is just the IP address of the machine the peer is running on, since all that really matters is being able to get data from the peer. This would be like what BitTorrent or IPFS do, since they don't really have any concept of ownership and mutability of data.

    @@ -131,7 +131,7 @@

    - On the network, within the datastore, this means that a peer is identified by a public key, or a hash thereof. Changes to a peer's data in the DHT require that the peers attempting to make the change verify their identity as owners. Data can also, of course, be encrypted so that it can only be accessed by the owners, or by anyone else they choose. + On the network, within the datastore, this means that a peer is identified by a public key, or a hash thereof. Changes to a peer's data in the KV store require that the peers attempting to make the change verify their identity as owners. Data can also, of course, be encrypted so that it can only be accessed by the owners, or by anyone else they choose.

    Privacy

    @@ -145,7 +145,7 @@

    - Receiver privacy is similar, in that we have a nesting doll of encrypted peer address, except because it's for incoming messages, the various addresses have to be shared ahead of time. We call such things Private Routes, and they are published to the DHT as part of a peer's public data. For full privacy on both ends, a Private Route will be used as the final destination of a Safety Route, so that a total of four intermediate hops are used to send a message so that neither the sender nor receiver knows the IP address of the other. + Receiver privacy is similar, in that we have a nesting doll of encrypted peer address, except because it's for incoming messages, the various addresses have to be shared ahead of time. We call such things Private Routes, and they are published to the key-value store as part of a peer's public data. For full privacy on both ends, a Private Route will be used as the final destination of a Safety Route, so that a total of four intermediate hops are used to send a message so that neither the sender nor receiver knows the IP address of the other.

    On The Ground

    From 3ad59b05969f3612fe1a89da2e02c4559066f940 Mon Sep 17 00:00:00 2001 From: Rebecca Valentine Date: Tue, 1 Feb 2022 11:32:20 -0800 Subject: [PATCH 3/8] Fixes some conceptual errors, adds minor improvements to organization and markup --- docs/guide/guide.html | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/docs/guide/guide.html b/docs/guide/guide.html index 424c2910..3ebfbfb5 100644 --- a/docs/guide/guide.html +++ b/docs/guide/guide.html @@ -39,10 +39,10 @@ Structuring Data
  • - Identity + Peer and User Identity
  • - Privacy + Peer Privacy
  • @@ -70,6 +70,8 @@ The primary purpose of the Veilid network is to provide the infrastructure for a specific kind of shared data: social media in various forms. That includes light-weight content such as Twitter's tweets or Mastodon's toots, medium-weight content like images and songs, and heavy-weight content like videos. Meta-content such as personal feeds, replies, private messages, and so forth are also intended to run atop Veilid.

    +
    +

    Bird's Eye View

    @@ -95,15 +97,15 @@

    Key-Value Store

    - Smaller, more ephemeral textual content generally, however, is stored in a key-value-store (KV store). Things like status updates, blog posts, user bios, etc. are all thought of as being suited for storage in this part of the data store. KV store data is not simply "on the Veilid network", but also owned/controlled by peers, and identified by an arbitrary name chosen by the peers which owns the data. Any group of peers can add data, but can only change the data they've added. + Smaller, more ephemeral textual content generally, however, is stored in a key-value-store (KV store). Things like status updates, blog posts, user bios, etc. are all thought of as being suited for storage in this part of the data store. KV store data is not simply "on the Veilid network", but also owned/controlled by users, and identified by an arbitrary name chosen by the owner the data. Any group of users can add data, but can only change the data they've added.

    - For instance, we might talk about Boone's bio vs. Boone's blogpost titled "Hi, I'm Boone!", which are two things owned by the same peer but with different identifiers, or on Boone's bio vs. Marquette's bio, which are two things owned by distinct peers but with the same identifier. + For instance, we might talk about Boone's bio vs. Boone's blogpost titled "Hi, I'm Boone!", which are two things owned by the same user but with different identifiers, or on Boone's bio vs. Marquette's bio, which are two things owned by distinct users but with the same identifier.

    - KV store data is also versioned, so that updates to it can be made. Boone's bio, for instance, would not be fixed in time, but rather is likely to vary over time as he changes jobs, picks up new hobbies, etc. Versioning, together with arbitrary peer-chosen identifiers instead of content hashes, means that we can talk about "Boone's Bio" as an abstract thing, and subscribe to updates to it. + KV store data is also versioned, so that updates to it can be made. Boone's bio, for instance, would not be fixed in time, but rather is likely to vary over time as he changes jobs, picks up new hobbies, etc. Versioning, together with arbitrary user-chosen identifiers instead of content hashes, means that we can talk about "Boone's Bio" as an abstract thing, and subscribe to updates to it.

    Structuring Data

    @@ -120,34 +122,36 @@ Social media offers many examples of these concepts. Friends lists, block lists, post indexes, favorites. These are all stateful notions, in a sense: a stable reference to a thing, but the precise content of the thing changes over time. These are exactly what we would put in the KV store, as opposed to in the block store, even if this data makes reference to content in the block store.

    -

    Identity

    +

    Peer and User Identity

    - As discussed above, peers talk to one another with RPCs, talk about one another by referencing each other on the network, own content that's in the KV store. This raises the question of how peers are identified and distinguished from one another. If the network was just an immutable block store, we could say that identity is just the IP address of the machine the peer is running on, since all that really matters is being able to get data from the peer. This would be like what BitTorrent or IPFS do, since they don't really have any concept of ownership and mutability of data. + Two notions of identity are at play in the above network: peer identity and user identity. Peer identity is simple enough: each peer has a cryptographic key pair that it uses to communicate securely with other peers, both through traditional encrypted communication, and also through the various encrypted routes. Peer identity is just the identity of the particular instance of the Veilid software running on a computer.

    - But because Veilid cares deeply about ownership of data and change over time, we chose a different approach: identity is a cryptographic keypair. This allows a peer to access the Veilid network from arbitrarily many different computers and IP addresses, over any communication medium. In practice, this means different devices (e.g. home machine vs smart phone), but in principle it could mean word of mouth and sneakernet. Veilid is agnostic to the particular substrate and communication medium. + User identity is a slightly richer notion. Users, that is to say, *people*, will want to access the Veilid network in a way that has a consistent identity across devices and apps. But since Veilid doesn't have servers in any traditional sense, we can't have a normal notion of "account". Doing so would also introduce points of centralization, which federated systems have shown to be a source of trouble. Many Mastodon users have found themselves in a tricky situation when their instance sysadmins burned out and suddenly shut down the instance without enough warning.

    - On the network, within the datastore, this means that a peer is identified by a public key, or a hash thereof. Changes to a peer's data in the KV store require that the peers attempting to make the change verify their identity as owners. Data can also, of course, be encrypted so that it can only be accessed by the owners, or by anyone else they choose. + To avoid this re-centralization of identity, we use cryptographic identity for users as well. The user's key pair is used to sign and encrypt their content as needed for publication to the data store. A user is said to be "logged in" to a client app whenever that app has a copy of their private key. When logged in a client app act like any other of the user's client apps, able to decrypt and encrypt content, sign messages, and so forth. Keys can be added to new apps to sign in on them, allowing the user to have any number of clients they want, on any number of devices they want.

    -

    Privacy

    +

    Peer Privacy

    In order to ensure that peers can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses.

    - The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of two other peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that the first hop can see the second hop, but not the final destination, while the second hope can see the final destination. This is similar to a 2-hop Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each send. + The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of any number of other peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hope can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each send.

    Receiver privacy is similar, in that we have a nesting doll of encrypted peer address, except because it's for incoming messages, the various addresses have to be shared ahead of time. We call such things Private Routes, and they are published to the key-value store as part of a peer's public data. For full privacy on both ends, a Private Route will be used as the final destination of a Safety Route, so that a total of four intermediate hops are used to send a message so that neither the sender nor receiver knows the IP address of the other.

    +
    +

    On The Ground

    From d8144143355d540757eb1d473d6dab81df646b18 Mon Sep 17 00:00:00 2001 From: Rebecca Valentine Date: Wed, 2 Feb 2022 09:30:50 -0800 Subject: [PATCH 4/8] Fixes minor typo --- docs/guide/guide.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guide/guide.html b/docs/guide/guide.html index 3ebfbfb5..14714922 100644 --- a/docs/guide/guide.html +++ b/docs/guide/guide.html @@ -143,7 +143,7 @@

    - The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of any number of other peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hope can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each send. + The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of any number of other peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hop can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each send.

    From 2e76e54f59824e4d9ac260f15cfef24de54a8453 Mon Sep 17 00:00:00 2001 From: Beka Valentine Date: Sat, 23 Apr 2022 23:40:31 -0700 Subject: [PATCH 5/8] Adds ground level view --- docs/guide/guide.css | 7 +- docs/guide/guide.html | 178 +++++++++++++++++++++++++++++------------- 2 files changed, 130 insertions(+), 55 deletions(-) diff --git a/docs/guide/guide.css b/docs/guide/guide.css index 51197c57..c583a4bf 100644 --- a/docs/guide/guide.css +++ b/docs/guide/guide.css @@ -20,4 +20,9 @@ body { .section-name { font-weight: 600; margin-bottom: 5px; -} \ No newline at end of file +} + +code { + font-family: monospace; + font-weight: bold; +} diff --git a/docs/guide/guide.html b/docs/guide/guide.html index 14714922..3ef9d5b2 100644 --- a/docs/guide/guide.html +++ b/docs/guide/guide.html @@ -3,12 +3,12 @@ - + Veilid Architecture Guide - + @@ -16,9 +16,9 @@

    Veilid Architecture Guide

    - +
    - + - +
    - +

    From Orbit

    - +

    The first matter to address is the question "What is Veilid?" The highest-level description is that Veilid is a peer-to-peer network for easily sharing various kinds of data.

    - +

    Veilid is designed with a social dimension in mind, so that each user can have their personal content stored on the network, but also can share that content with other people of their choosing, or with the entire world if they want.

    - +

    The primary purpose of the Veilid network is to provide the infrastructure for a specific kind of shared data: social media in various forms. That includes light-weight content such as Twitter's tweets or Mastodon's toots, medium-weight content like images and songs, and heavy-weight content like videos. Meta-content such as personal feeds, replies, private messages, and so forth are also intended to run atop Veilid.

    - +
    - +

    Bird's Eye View

    - +

    Now that we know what Veilid is and what we intend to put on it, the second order of business is to address the parts of the question of how Veilid achieves that. Not at a very detailed level, of course, that will come later, but rather at a middle level of detail such that all of it can fit in your head at the same time.

    - +

    Peer Network for Data Storage

    - +

    - The bottom-most level of Veilid is a network of peers communicating to one another over the internet. Peers send each other messages (remote procedure calls) about the data being stored on the network, and also messages about the network itself. For instance, one peer might ask another for some file, or it might ask for info about what other nodes exist in the network. + The bottom-most level of Veilid is a network of peers communicating to one another over the internet. Peers send each other messages (remote procedure calls) about the data being stored on the network, and also messages about the network itself. For instance, one peer might ask another for some file, or it might ask for info about what other peers exist in the network.

    - +

    The data stored in the network is segmented into two kinds of data: file-like data, which typically is large, and textual data, which typically is small. Each kind of data is stored in its own subsystem specifically chosen to optimize for that kind of data.

    - +

    Block Store

    - +

    File-like content is stored in a content-addressable block store. Each block is just some arbitrary blob of data (for instance, a JPEG or an MP4) of whatever size. The hash of that block acts as the unique identifier for the block, and can be used by peers to request particular blocks. Technically, textual data can be stored as a block as well, and this is expected to be done when the textual data is thought of as a document or file of some sort.

    - +

    Key-Value Store

    - +

    Smaller, more ephemeral textual content generally, however, is stored in a key-value-store (KV store). Things like status updates, blog posts, user bios, etc. are all thought of as being suited for storage in this part of the data store. KV store data is not simply "on the Veilid network", but also owned/controlled by users, and identified by an arbitrary name chosen by the owner the data. Any group of users can add data, but can only change the data they've added.

    - +

    For instance, we might talk about Boone's bio vs. Boone's blogpost titled "Hi, I'm Boone!", which are two things owned by the same user but with different identifiers, or on Boone's bio vs. Marquette's bio, which are two things owned by distinct users but with the same identifier.

    - +

    - KV store data is also versioned, so that updates to it can be made. Boone's bio, for instance, would not be fixed in time, but rather is likely to vary over time as he changes jobs, picks up new hobbies, etc. Versioning, together with arbitrary user-chosen identifiers instead of content hashes, means that we can talk about "Boone's Bio" as an abstract thing, and subscribe to updates to it. + KV store data is also stateful, so that updates to it can be made. Boone's bio, for instance, would not be fixed in time, but rather is likely to vary over time as he changes jobs, picks up new hobbies, etc. Statefulness, together with arbitrary user-chosen identifiers instead of content hashes, means that we can talk about "Boone's Bio" as an abstract thing, and subscribe to updates to it.

    - +

    Structuring Data

    - +

    The combination of block storage and key-value storage together makes it possible to have higher-level concepts as well. A song, for instance, might be represented in two places in Veilid: the block store would hold the raw data, while the KV store would store a representation of the idea of the song. Maybe that would consist of a JSON object with metadata about the song, like the title, composer, date, encoding information, etc. as well as the ID of the block store data. We can then also store different versions of that JSON data, as the piece is updated, upsampled, remastered, or whatever, each one pointing to a different block in the block store. It's still "the same song", at a conceptual level, so it has the same identifier in the KV store, but the raw bits associated with each version differ.

    - +

    Another example of this, but with even more tenuous connection between the block store data, is the notion of a profile picture. "Marquette's Profile Picture" is a really abstracted notion, and precisely which bits it corresponds to can vary wildly over time, not just being different versions of the picture but completely different pictures entirely. Maybe one day its a photo of Marquette and the next day it's a photo of a flower.

    - +

    Social media offers many examples of these concepts. Friends lists, block lists, post indexes, favorites. These are all stateful notions, in a sense: a stable reference to a thing, but the precise content of the thing changes over time. These are exactly what we would put in the KV store, as opposed to in the block store, even if this data makes reference to content in the block store.

    - +

    Peer and User Identity

    - +

    Two notions of identity are at play in the above network: peer identity and user identity. Peer identity is simple enough: each peer has a cryptographic key pair that it uses to communicate securely with other peers, both through traditional encrypted communication, and also through the various encrypted routes. Peer identity is just the identity of the particular instance of the Veilid software running on a computer.

    - +

    - User identity is a slightly richer notion. Users, that is to say, *people*, will want to access the Veilid network in a way that has a consistent identity across devices and apps. But since Veilid doesn't have servers in any traditional sense, we can't have a normal notion of "account". Doing so would also introduce points of centralization, which federated systems have shown to be a source of trouble. Many Mastodon users have found themselves in a tricky situation when their instance sysadmins burned out and suddenly shut down the instance without enough warning. + User identity is a slightly richer notion. Users, that is to say, people, will want to access the Veilid network in a way that has a consistent identity across devices and apps. But since Veilid doesn't have servers in any traditional sense, we can't have a normal notion of "account". Doing so would also introduce points of centralization, which federated systems have shown to be a source of trouble. Many Mastodon users have found themselves in a tricky situation when their instance sysadmins burned out and suddenly shut down the instance without enough warning.

    - +

    To avoid this re-centralization of identity, we use cryptographic identity for users as well. The user's key pair is used to sign and encrypt their content as needed for publication to the data store. A user is said to be "logged in" to a client app whenever that app has a copy of their private key. When logged in a client app act like any other of the user's client apps, able to decrypt and encrypt content, sign messages, and so forth. Keys can be added to new apps to sign in on them, allowing the user to have any number of clients they want, on any number of devices they want.

    - -

    Peer Privacy

    - -

    - In order to ensure that peers can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses. -

    - -

    - The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of any number of other peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hop can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each send. -

    - -

    - Receiver privacy is similar, in that we have a nesting doll of encrypted peer address, except because it's for incoming messages, the various addresses have to be shared ahead of time. We call such things Private Routes, and they are published to the key-value store as part of a peer's public data. For full privacy on both ends, a Private Route will be used as the final destination of a Safety Route, so that a total of four intermediate hops are used to send a message so that neither the sender nor receiver knows the IP address of the other. -

    - +
    - +

    On The Ground

    - +

    - The bird's eye view of things makes it possible to hold it all in mind at once, but leaves out lots of information about implementation choice. It's now time to come down to earth and get our hands dirty. + The bird's eye view of things makes it possible to hold it all in mind at once, but leaves out lots of information about implementation choice. It's now time to come down to earth and get our hands dirty. In principl, this should be enough information to implement a system very much like Veilid, with the exception perhaps of the specific details of the APIs and data formats. This section won't have code, it's not documentation of the codebase, but rather is intended to form the meat of a whitepaper.

    - + +

    Peer Network, Revisited

    +

    - TODO + First, let's look at the peer network, since it's structure forms the basis for the remainder of the data storage approach. Veilid's peer network is similar to other peer-to-peer systems in that it's overlaid on top of other protocols. Veilid tries to be somewhat protocol-agnostic, however, and currently is designed to use TCP, UDP, WebSockets, and WebRTC, as well as various methods of traversing NATs so that Veilid peers can be smartphones, personal computers on hostile ISPs, etc. To facilitate this, peers are identified not by some network identity like an IP address, but instead by peer-chosen cryptographic key-pairs. Each peer also advertises a variety of options for how to communicate with it, called dial info, and when one peer wants to talk to another, it gets the dial info for that peer from the network and then uses it to communicate.

    - + +

    + When a peer first connects to Veilid, it does so by contacting bootstrap peers, which have simple IP address dial info that is guaranteed to be stable by the maintainers of the network. These bootstrap peers are the first entries in the peer's routing table -- an address book of sorts, which it uses to figure out how to talk to a peer. The routing table consists of a mapping from peer public keys to prioritized choices for dial info. To populate the routing table, the peer asks other peers what its neighbors are in the network. The notion of neighbor here is defined by a similarity metric on peer IDs, in particular an XOR metric like many DHTs use. Over the course of interacting with the network, the peer will keep dial info up to date when it detects changes. It may also add dial info for peers it discovers along the way, depending on the peer ID. +

    + +

    + To talk to a specific peer, it's dial info is looked up in the routing table. If there is dial info present, then the options are attempted in order of the priority specified in the routing table. Otherwise, the peer has to request the dial info from the network, so it looks through its routing table to find the peer who's ID is nearest the target peer according to the XOR metric, and sends it an RPC call with a procedure named find_node. Given any particular peer ID, the receiver of a find_node call returns dial info for the peers in its routing table that are nearest the given ID. This gets the peer closer to its destination, at least in the direction of the other peer it asked. If the desired peer's information was in the result of the call, then it's done, otherwise it calls find_node again to get closer. It iterates in this way, possibly trying alternate peers, as necessary, in a nearest-first fashion until it either finds the desire'd peer's dial info, has exhausted the entire network, or gives up. +

    + +

    User Privacy

    + +

    + In order to ensure that users can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses. +

    + +

    + The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of any number of peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hop can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each message being sent. +

    + +

    + Receiver privacy is similar, in that we have a nesting doll of encrypted peer addresses, except because it's for incoming messages, the various addresses have to be shared ahead of time. We call such things Private Routes, and they are published to the key-value store as part of a user's public data. For full privacy on both ends, a Private Route will be used as the final destination of a Safety Route, and the total route is the composition of the two, so that neither the sender nor receiver knows the IP address of the other. +

    + +

    + Note that the routes are user oriented. They should be understood as a way to talk to a particular user's peer, wherever that may be. Each peer of course has to know about the actual IP addresses of the peers, otherwise it couldn't communicate, but safety and private routes make it hard to associate the user's identity with their peer's identity. You know that the user is somewhere on the network, but you don't know which IP address is their's, even if you do in fact have their peer's dial info stored in the routing table. +

    + +

    Block Store

    + +

    + As mentioned in the Bird's Eye View, the block store is intended to store content-addressed blocks of data. Like many other peer-to-peer systems for storing data, Veilid uses a distributed hash table as the core of the block store. The block store DHT has as keys BLAKE3 hashes of block content. For each key the DHT associates a list of peer IDs for peers that have declared to the network that they can supply the block. +

    + +

    + If a peer wishes to supply the block, it makes a supply_block RPC call to the network with the id of the block. The receiver of the call can then store the information that the peer supplies the designated block if it wants, and also can return other peers nearer to the block's ID that should also store the information. Peers determine whether or not to store this information based on how close it is to the block's ID. It may also choose to cache the block, possibly also declaring itself to be a supplier as well. +

    + +

    + Supplier records are potentially brittle because peers leave the network, making their information unavailable. Because of this, any peer that wishes to supply a block will periodically send supply_block messages to refresh the records. Peers that are caching blocks determine when to stop caching based on how popular a block is, how much space or bandwidth it can spare, etc. +

    + +

    + To retrieve a block that has been stored in the blockstore, a peer makes a find_block RPC. The receiver will then either return the block, or possibly return a list of suppliers for the block that it knows about, or return a list of peers that are closer to the block. +

    + +

    + Unlike BitTorrent, blocks are not inherently part of a larger file. A block can be just a single file, and often that will be the case for small files. Large files can be broken up into smaller blocks, however, and then an additional block with a list of those component blocks can be stored in the block store. Veilid itself, however, would treat this like any other block, and there are no built-in mechanisms for determining which blocks to download first, which to share first, etc. like there are in BitTorrent. These features would be dependent on the peer software's implementation and could vary. Different clients will also be able to decide how they want to download such "compound" blocks -- automatically, via a prompt to the user, or something else. +

    + +

    + The mechanism of having blocks that refer to other blocks also enables IPFS-style DAGs of hierarchical data as one mode of use of the block store, allowing entire directory structures to be stored, not just files. However, as with sub-file blocks, this is not a built-in part of Veilid but rather a mode of use, and how they're downloaded and presented to the user is up to the client program. +

    + +

    Key-Value Store

    + +

    + The key-value store is a DHT similar to the block store. However, rather than using content hashes as keys, the KV store uses user IDs as keys (note: not peer IDs). At a given key, the KV store has a hierarchical key-value map that associates in-principle arbitrary strings with values, which themselves can be numbers, strings, datetimes, or other key-value maps. The specific value stored in at a user's ID is versioned, so that particular schemas of subkeys and values can be defined and handled appropriately by different versions of clients. +

    + +

    + When a user wishes to store data under their key, they send a set_value RPC to the peer's whose IDs are closest by the XOR metric to their own user ID. The value provided to the RPC is a signed value, so that the network can ensure only the designated user is storing data at their key. Those peers may return other peer IDs, and so on, similar to how the block store handles supply_block calls. Eventually, some peers will store the data. The user's peer should periodically refresh the stored data, to ensure that it persists. It's also good practice for the user's own peer to cache the data, so that client programs can use the user's own peer as a canonical source of the most-up-to-date value. +

    + +

    + Retrieval is similar to block store retrieval. The desired key is provided to a get_value call, which may return th value, or a list of other peers that are closer to the key. Eventually the signed data is returned, and the recipient can verify that it does indeed belong to the specified user by checking the signature. +

    + +

    + When storing and retrieving, the key provided to the RPCs is not required to be only the user's ID. It can include a list of strings which act as a path into the data stored at the user's key, targetting it specifically for update or retrieval. This lets the network minimize data transfer, because only the relevant information has to move around. +

    + +

    + The specific content of the user's keys is determined partially by the protocol and partially by the client software. Early versions of the protocol use a DHT schema version that defines a fairly simple social network oriented schema. Later versions will enable a more generic schema so that client plugins can store and display richer information. +

    + +

    + TODO How to avoid replay updates?? maybe via a sequence number in the signed patch? +

    + +

    Appendix 1: Dial Info

    + +

    Appendix 2: RPC Listing

    - \ No newline at end of file + From 949fc85a45c9669e8e32e37acd48b3374da29ac0 Mon Sep 17 00:00:00 2001 From: Beka Valentine Date: Sun, 24 Apr 2022 09:22:55 -0700 Subject: [PATCH 6/8] Adds some move info about routes, and about KV store RPCs --- docs/guide/guide.html | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/docs/guide/guide.html b/docs/guide/guide.html index 3ef9d5b2..49ce5df5 100644 --- a/docs/guide/guide.html +++ b/docs/guide/guide.html @@ -161,17 +161,21 @@

    User Privacy

    - In order to ensure that users can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses. + In order to ensure that users can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses. A user's peer will therefore be frequently issuing RPCs in a way that directly associates the user's identifying information with their peer's ID. Veilid provides privacy by allowing the use of an RPC relay mechanism that uses cryptography to similar to onion routing in order to hide the path that a message takes between its actual originating peer and its actual destination peer, by hopping between additional relay peers.

    - The approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, we use something called a Safety Route: a sequence of any number of peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hop can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are hidden from view. Additionally, the route can be chosen at random for each message being sent. + The specific approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, Veilid use something called a Safety Route: a sequence of any number of peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hop can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are encrypted for each hop. The route can be chosen at random for each message being sent.

    Receiver privacy is similar, in that we have a nesting doll of encrypted peer addresses, except because it's for incoming messages, the various addresses have to be shared ahead of time. We call such things Private Routes, and they are published to the key-value store as part of a user's public data. For full privacy on both ends, a Private Route will be used as the final destination of a Safety Route, and the total route is the composition of the two, so that neither the sender nor receiver knows the IP address of the other.

    +

    + Each peer in the hop, including the initial peer, sends a route RPC to the next peer in the hop, with the remainder of the full route (safety + private), forwarding the data along. The final peer decrypts the remainder of the route, which is now empty, and then can inspect the relayed RPC to act on it. The RPC itself doesn't need to be encrypted, but it's good practice to encrypt it for the final receiving peer so that the intermediate peers can't de-anonymize the sending user from traffic analysis. +

    +

    Note that the routes are user oriented. They should be understood as a way to talk to a particular user's peer, wherever that may be. Each peer of course has to know about the actual IP addresses of the peers, otherwise it couldn't communicate, but safety and private routes make it hard to associate the user's identity with their peer's identity. You know that the user is somewhere on the network, but you don't know which IP address is their's, even if you do in fact have their peer's dial info stored in the routing table.

    @@ -209,7 +213,7 @@

    - When a user wishes to store data under their key, they send a set_value RPC to the peer's whose IDs are closest by the XOR metric to their own user ID. The value provided to the RPC is a signed value, so that the network can ensure only the designated user is storing data at their key. Those peers may return other peer IDs, and so on, similar to how the block store handles supply_block calls. Eventually, some peers will store the data. The user's peer should periodically refresh the stored data, to ensure that it persists. It's also good practice for the user's own peer to cache the data, so that client programs can use the user's own peer as a canonical source of the most-up-to-date value. + When a user wishes to store data under their key, they send a set_value RPC to the peer's whose IDs are closest by the XOR metric to their own user ID. The value provided to the RPC is a signed value, so that the network can ensure only the designated user is storing data at their key. The peers that receive the RPC may return other peer IDs closer to the key, and so on, similar to how the block store handles supply_block calls. Eventually, some peers will store the data. The user's own peer should periodically refresh the stored data, to ensure that it persists. It's also good practice for the user's own peer to cache the data, so that client programs can use the user's own peer as a canonical source of the most-up-to-date value, but doing so would require a route to be published that lets other peers send the user's own peer messages. A private route suffices for this.

    @@ -224,6 +228,10 @@ The specific content of the user's keys is determined partially by the protocol and partially by the client software. Early versions of the protocol use a DHT schema version that defines a fairly simple social network oriented schema. Later versions will enable a more generic schema so that client plugins can store and display richer information.

    +

    + The stateful nature of the key-value store means that values will change over time, and actions may need to be taken in response to those changes. A polling mechanism could be used to periodically check for new values, but this will lead to lots of unnecessary traffic in the network, so to avoid this, Veilid allows peers to send watch_value RPCs, with a DHT key (with subkeys) as its argument. The receiver would then store a record that the sender of the RPC wants to be alerted when the receiver gets subsequent set_value calls, at which time the receiver sends the sending peer a value_changed RPC to push the new value. As with other RPC calls, watch_value needs to be periodically re-sent to refresh the subscription to the value. Additionally, also as with other calls, watch_value may not succeed on the receiver, which instead might return other peers closer to the value, or might return other peers that have successfully subscribed to the value and thus might act as a source for it. +

    +

    TODO How to avoid replay updates?? maybe via a sequence number in the signed patch?

    From afd5eeb080b81515e1fa58465086112179ce0ad3 Mon Sep 17 00:00:00 2001 From: Beka Valentine Date: Sun, 24 Apr 2022 09:28:26 -0700 Subject: [PATCH 7/8] Removes use of the term "relay" to avoid conflation with various RPC relaying stuff at the lower network level --- docs/guide/guide.html | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/guide/guide.html b/docs/guide/guide.html index 49ce5df5..fbc9af68 100644 --- a/docs/guide/guide.html +++ b/docs/guide/guide.html @@ -161,11 +161,11 @@

    User Privacy

    - In order to ensure that users can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses. A user's peer will therefore be frequently issuing RPCs in a way that directly associates the user's identifying information with their peer's ID. Veilid provides privacy by allowing the use of an RPC relay mechanism that uses cryptography to similar to onion routing in order to hide the path that a message takes between its actual originating peer and its actual destination peer, by hopping between additional relay peers. + In order to ensure that users can participate in Veilid with some amount of privacy, we need to address the fact that being connected to Veilid entails communicating with other peers, and therefore sharing IP addresses. A user's peer will therefore be frequently issuing RPCs in a way that directly associates the user's identifying information with their peer's ID. Veilid provides privacy by allowing the use of an RPC forwarding mechanism that uses cryptography to similar to onion routing in order to hide the path that a message takes between its actual originating peer and its actual destination peer, by hopping between additional intermediate peers.

    - The specific approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, Veilid use something called a Safety Route: a sequence of any number of peers, chosen by the sender, who will relay messages. The sequence of addresses is put into a nesting doll of encryption, so that each hop can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are encrypted for each hop. The route can be chosen at random for each message being sent. + The specific approach that Veilid takes to privacy is two sided: privacy of the sender of a message, and privacy of the receiver of a message. Either or both sides can want privacy or opt out of privacy. To achieve sender privacy, Veilid use something called a Safety Route: a sequence of any number of peers, chosen by the sender, who will forward messages. The sequence of addresses is put into a nesting doll of encryption, so that each hop can see the previous and next hops, while no hop can see the whole route. This is similar to a Tor route, except only the addresses are encrypted for each hop. The route can be chosen at random for each message being sent.

    @@ -173,14 +173,14 @@

    - Each peer in the hop, including the initial peer, sends a route RPC to the next peer in the hop, with the remainder of the full route (safety + private), forwarding the data along. The final peer decrypts the remainder of the route, which is now empty, and then can inspect the relayed RPC to act on it. The RPC itself doesn't need to be encrypted, but it's good practice to encrypt it for the final receiving peer so that the intermediate peers can't de-anonymize the sending user from traffic analysis. + Each peer in the hop, including the initial peer, sends a route RPC to the next peer in the hop, with the remainder of the full route (safety + private), forwarding the data along. The final peer decrypts the remainder of the route, which is now empty, and then can inspect the forwarded RPC to act on it. The RPC itself doesn't need to be encrypted, but it's good practice to encrypt it for the final receiving peer so that the intermediate peers can't de-anonymize the sending user from traffic analysis.

    Note that the routes are user oriented. They should be understood as a way to talk to a particular user's peer, wherever that may be. Each peer of course has to know about the actual IP addresses of the peers, otherwise it couldn't communicate, but safety and private routes make it hard to associate the user's identity with their peer's identity. You know that the user is somewhere on the network, but you don't know which IP address is their's, even if you do in fact have their peer's dial info stored in the routing table.

    -

    Block Store

    +

    Block Store Revisited

    As mentioned in the Bird's Eye View, the block store is intended to store content-addressed blocks of data. Like many other peer-to-peer systems for storing data, Veilid uses a distributed hash table as the core of the block store. The block store DHT has as keys BLAKE3 hashes of block content. For each key the DHT associates a list of peer IDs for peers that have declared to the network that they can supply the block. @@ -206,7 +206,7 @@ The mechanism of having blocks that refer to other blocks also enables IPFS-style DAGs of hierarchical data as one mode of use of the block store, allowing entire directory structures to be stored, not just files. However, as with sub-file blocks, this is not a built-in part of Veilid but rather a mode of use, and how they're downloaded and presented to the user is up to the client program.

    -

    Key-Value Store

    +

    Key-Value Store Revisited

    The key-value store is a DHT similar to the block store. However, rather than using content hashes as keys, the KV store uses user IDs as keys (note: not peer IDs). At a given key, the KV store has a hierarchical key-value map that associates in-principle arbitrary strings with values, which themselves can be numbers, strings, datetimes, or other key-value maps. The specific value stored in at a user's ID is versioned, so that particular schemas of subkeys and values can be defined and handled appropriately by different versions of clients. @@ -236,7 +236,7 @@ TODO How to avoid replay updates?? maybe via a sequence number in the signed patch?

    -

    Appendix 1: Dial Info

    +

    Appendix 1: Dial Info and Signaling

    Appendix 2: RPC Listing

    Date: Thu, 28 Apr 2022 19:10:58 -0700 Subject: [PATCH 8/8] Adds headers and TOC links --- docs/guide/guide.html | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/docs/guide/guide.html b/docs/guide/guide.html index fbc9af68..d1740cb0 100644 --- a/docs/guide/guide.html +++ b/docs/guide/guide.html @@ -41,15 +41,23 @@
  • Peer and User Identity
  • -
  • - Peer Privacy -
  • On The Ground
  • @@ -144,7 +152,7 @@ The bird's eye view of things makes it possible to hold it all in mind at once, but leaves out lots of information about implementation choice. It's now time to come down to earth and get our hands dirty. In principl, this should be enough information to implement a system very much like Veilid, with the exception perhaps of the specific details of the APIs and data formats. This section won't have code, it's not documentation of the codebase, but rather is intended to form the meat of a whitepaper.

    -

    Peer Network, Revisited

    +

    Peer Network, Revisited

    First, let's look at the peer network, since it's structure forms the basis for the remainder of the data storage approach. Veilid's peer network is similar to other peer-to-peer systems in that it's overlaid on top of other protocols. Veilid tries to be somewhat protocol-agnostic, however, and currently is designed to use TCP, UDP, WebSockets, and WebRTC, as well as various methods of traversing NATs so that Veilid peers can be smartphones, personal computers on hostile ISPs, etc. To facilitate this, peers are identified not by some network identity like an IP address, but instead by peer-chosen cryptographic key-pairs. Each peer also advertises a variety of options for how to communicate with it, called dial info, and when one peer wants to talk to another, it gets the dial info for that peer from the network and then uses it to communicate. @@ -180,7 +188,7 @@ Note that the routes are user oriented. They should be understood as a way to talk to a particular user's peer, wherever that may be. Each peer of course has to know about the actual IP addresses of the peers, otherwise it couldn't communicate, but safety and private routes make it hard to associate the user's identity with their peer's identity. You know that the user is somewhere on the network, but you don't know which IP address is their's, even if you do in fact have their peer's dial info stored in the routing table.

    -

    Block Store Revisited

    +

    Block Store Revisited

    As mentioned in the Bird's Eye View, the block store is intended to store content-addressed blocks of data. Like many other peer-to-peer systems for storing data, Veilid uses a distributed hash table as the core of the block store. The block store DHT has as keys BLAKE3 hashes of block content. For each key the DHT associates a list of peer IDs for peers that have declared to the network that they can supply the block. @@ -206,7 +214,7 @@ The mechanism of having blocks that refer to other blocks also enables IPFS-style DAGs of hierarchical data as one mode of use of the block store, allowing entire directory structures to be stored, not just files. However, as with sub-file blocks, this is not a built-in part of Veilid but rather a mode of use, and how they're downloaded and presented to the user is up to the client program.

    -

    Key-Value Store Revisited

    +

    Key-Value Store, Revisited

    The key-value store is a DHT similar to the block store. However, rather than using content hashes as keys, the KV store uses user IDs as keys (note: not peer IDs). At a given key, the KV store has a hierarchical key-value map that associates in-principle arbitrary strings with values, which themselves can be numbers, strings, datetimes, or other key-value maps. The specific value stored in at a user's ID is versioned, so that particular schemas of subkeys and values can be defined and handled appropriately by different versions of clients.