LoFiRe: Local-First Repositories for Asynchronous Collaboration over Community Overlay Networks

Initial design proposal & protocol specification

Introduction

LoFiRe is a decentralized, collaborative data repository with authentication, access control, and change validation. It is built on local-first data storage, synchronization, and change notification protocols that aim to protect privacy by minimizing metadata exposed to intermediaries. It enables local-first, asynchronous, collaboration and data storage within communities while respecting privacy and maintaining data ownership, and provides foundations for developing decentralized, local-first applications, data stores, and protocols (including semantic graph data models, and local-first search & discovery protocols).

Community members use local-first software [1] to collaborate around a partition-tolerant permissioned tamper-proof data repository that contains Directed Acyclic Graphs (DAG) of causally related transactions with operations on Conflict-free Replicated Data Types (CRDT) [2]. The DAG encodes a partial order of transactions through causality relations, and together with a reliable, causal publish-subscribe (pub/sub) protocol for change notifications and a DAG synchronization protocol, it provides strong eventual consistency of replicas, with persistence of transactions through a lightweight, quorum-based acknowledgement mechanism.

Each repository is synchronized within a private community overlay network that offers immutable block storage, data synchronization, and asynchronous publish-subscribe change notification services.

The two-tier network architecture consists of a stable core network and ephemeral edge networks. On edge networks, edge nodes synchronize locally and directly between each other, while when communicating with remote participants, they connect to a core node that stores and forwards encrypted objects and change notifications for them, thus acting as a pub/sub broker and object store for the edge nodes.

The system is composed of the following components:

Repository
Data structures, encryption, permissions, authentication and access control.
Network
Data synchronization, publish-subscribe change notification.
Applications
CRDT state machine & change validation.

In the following we describe the repository and network components, and the application protocol for interacting with the system.

Requirements

We set the following requirements to guide our design:

Local first
Store data locally, use local networks when available, and allow working even offline.c
Decentralized
No centralized services are required to access and modify data.
Asynchronicity & partition tolerance
Allow collaboration between users even if they are not online at the same time.
Identity & data ownership
Users should have full control over their data and identities.
Causality
Changes have causal relationships and are delivered in causal order.
Authenticity
Changes are signed by their author.
Access control
Rules that specify which users can perform what operations on which part of the data.
Strong eventual consistency
Replicas reach the same state after receiving the same set of changes.
Privacy
Respect user privacy and minimize the amount of user data and metadata exposed.
End-to-end security
Only the intended recipients should be able to read data stored and transmitted in the network.
Ephemerality
Data can have expiration date after which it should be deleted from all replicas.
Multiple devices per user
Allow users to use multiple devices, sync data, and share an identity among them

Local-first software [1] stores data locally, allows offline data access and modification, and uses the network for data synchronization with local and remote nodes. Conflict-free Replicated Data Types (CRDTs) allow concurrent changes and conflict-free merges of a shared data structure.

Local-first networking allows local collaboration on edge networks among local nodes on the LAN or other kind of proximity network, while also allows remote collaboration over the internet when needed.

Data locality ensures control over data location and distribution, such that data shared in a repository remains private and only distributed to authorized peers. Data should not be dispersed over the network to arbitrary nodes, even in encrypted form, and nodes should only store and forward data they are interested in, providing incentives for participating in the network, and limiting damages in the event of an encryption key or algorithm compromise.

End-to-end security ensures only the intended recipients can decrypt data, i.e. intermediary nodes used for routing and storage cannot see the contents of communications, which is only visible to end-users who hold encryption keys on their personal devices.

Identity and data ownership entails affording users full control over their data and identities, by using cryptographic identities and allowing them to store local copies of their data and private keys. This is not possible in centralized systems that use centralized identifiers such as user names or phone numbers, neither in federated communication services that use DNS-based identities tied to a provider and keep user data on the provider's servers with no easy way for users to acquire a full copy of their data, or to migrate their data and identities to another service provider.

Asynchronicity is a common requirement for communication, and it's provided by both centralized (e.g. CryptPad, Nextcloud) and federated services (e.g. SMTP, XMPP, Matrix). However P2P systems often fail to offer this functionality and only provide synchronous means of communication or implement asynchronicity by storing data at arbitrary nodes in the network that violates the data locality requirement (e.g. libp2p, Holochain, GNUnet).

Protocol design

Data repositories

A data repository contains branches, each with a Directed Acyclic Graph of commits linked by causal dependencies among them.

Fig. 1: Repository R with branches X, Y, Z, commits R*, X*, Y*, acks of X3, Y3, and data D*. Branch Y was forked from X3, and Y3 merges X6. deps are solid lines, acks are dashed.

The repository definition is the root of trust for the repository, and contains a list of published branches. Similarly, the branch definition is the root of trust for the commits in a branch. It specifies the permissions and validation rules that every replica has to follow when applying commits for a given branch.

Each commit refers to a list of earlier commits it depends on, and it is signed by its author, that we also refer to as the publisher. CRDT operations are grouped in a transaction, which is then placed inside a commit. Edge nodes of users validate all incoming commits according to the permissions and validation rules set in the branch definition, and apply the changes to their local replica of the repository only if they are valid. Valid commits are acknowledged by a quorum of publishers in subsequent commits. The causal dependencies between commits form a DAG that enables partial ordering of operations, and provide strong eventual consistency guarantees of valid commits in a branch.

Public keys are used to identify repositories, branches, and members, which allows the owner of a repository to grant permissions for adding & removing branches, and the owner of a branch to grant permissions for publishing commits.

The repository structure and storage objects are specified in the Data structures section.

Data storage

Data is chunked to max. 1 MB chunks and stored encrypted in a Merkle tree. Merkle tree nodes use convergent encryption with a convergence secret that allows deduplication inside the repository, while defending against confirmation attacks of stored objects.

Both commits of a branch and files outside branches are stored in immutable objects. An Object contains either a leaf node of the Merkle tree with an encrypted data chunk, or an internal node with an unencrypted list of ObjectID references to child nodes in the Merkle tree, and an encrypted list of keys for the referenced children. The unencrypted object header at the root of the Merkle tree further contains an optional list of dependencies of the object, which is used to list dependencies and acknowledgements of commit objects to allow efficient DAG synchronization, and an optional expiry time when replicas should delete the object and its children recursively, enabling ephemeral object storage.

Objects are stored in a content-addressed immutable object store where the object ID (ObjectId) is the BLAKE3 hash of the entire serialized object. In order to be able to decrypt and read an object's content, the user needs not only the ObjectId (to retrieve it) but also the object's key. These are combined in an ObjectRef, which enables its holder to retrieve and read the referenced object. To the contrary, sharing only the ID of an object does not provide a read capability for the recipient.

Implementations should employ an additional layer of encryption and padding before storing objects on disk in order to provide metadata-protection for data at rest, i.e. to hide stored object IDs and object sizes. Objects can be packed together into larger storage blocks before applying padding in order to reduce storage space requirements.

Branches & Commits

The repository is organized in branches, which consist of a DAG of commits that contain transactions of CRDT operations. Commits are stored chunked in encrypted objects, the same way as files. The commit body is stored in a separate object, in order to be able to reference it directly, and to allow deduplication of commit bodies.

The first Commit in a branch contains the Branch definition that specifies the commit validation rules and permissions for the branch, as well as the public key for the pub/sub topic used for publishing commits of the branch. A member list defines the public keys of members and their permissions for publishing commits in the branch. Each Commit references other commits it directly depends on (via deps) and acknowledges non-dependent branch heads (via acks) known by the publisher in order to reduce branching in the DAG, and may reference files it depends on (via refs). The referenced dependencies (deps) and acknowledgements (acks) in a commit implies that the publisher has validated those commits and all their dependencies recursively. The object IDs (but not the keys) of deps & acks of the Commit are listed in the unencrypted Object header in order to allow storage nodes to traverse branches for efficient synchronization, without letting them to decrypt and read the content of the objects.

To ensure durability of commits, a quorum of acknowledgements is needed before a commit can be considered valid. The quorum is specified in the branch definition, and is calculated as the number of commits depending on, or acknowledging the commit either directly or indirectly. Acknowledgements can be either implicit (dependencies and acknowledgements listed in a subsequent commit) or explicit (a CommitAck is sent, with a list of dependencies). Explicit acknowledgements are only needed if a commit hasn't been acknowledged implicitly within a certain time frame from reception, which is specified in the branch definition (ackDelay).

The end of a branch is marked with an EndOfBranch commit, after which no more commits are accepted, except the acknowledgements of this last commit. This commit can reference a fork where the branch continues. This allows for changing the branch definition, migration to a new data schema, history truncation by making a new snapshot that does not depend on any previous operations (compaction), changing permissions, and removing access from past members by excluding those members from the newly encrypted branch definition.

A forked branch can either depend on the previous branch with the heads listed in its deps, or on a Snapshot listed in its refs. A Snapshot contains the data structures resulting from applying all the operations contained in the commits reachable from the heads the snapshot refers to. A Snapshot can also be part of a Commit for the purpose of exporting the current branch state but not for compaction, in which case other publishers validate it via acks, but no other commits would depend on it.

The branch definition also allows adding application-specific metadata about each member and about the branch itself. This enables applications to specify roles and permissions for members (who can change which part of the data structure), define custom validation rules for commits in the branch (or reference an executable validation code that runs in a sandboxed environment, such as WebAssembly), and reference an application that provides a user interface to display and interact with the data.

The branch definition cannot be changed after it has been commited, except for adding new members and adding new permissions for an existing member, along with an adjusted quorum and ackDelay, via the AddMember commit. For any other changes the branch needs to be forked. After forking, the new branch can either depend on the commits of the old branch, or the branch owner can make a snapshot and compact the commits of the previous branch and thus remove dependencies on earlier commits, after which the commits of the previous branch can be garbage collected.

The root branch is identified by the public key of the repository, and has a pub/sub topic assigned that corresponds to the overlay ID. The root branch contains the Repository definition, which references the branches available in the repository. Branches can be added and removed via the AddBranch and RemoveBranch commit types. After a branch is removed, its commits can be garbage collected, unless it was replaced with a new branch that depends on commits from the removed branch. When forking the root branch, the fork reference must be present unencrypted in the EndOfBranch commit, and the forked branch must depend on the previous branch, since the root branch serves as the entry point to the repository, and applications need to be able to find the latest fork of it. It is not possible to publish commits containing transactions in the root branch, as the root branch is only used for managing the other branches. A branch can remain private if it is never added to the root branch.

Deletion of data

Data deletion is possible by setting an expiry time for the storage objects. The expiry field and child node references are not encrypted in order to allow storage nodes to perform garbage collection without being able to decrypt object contents. In case of expiring commits, in order to keep the DAG intact, the commit object itself never expires, only the object in the commit body, and all commits depending on an expiring commit must expire at the same time as, or earlier than the one they're depending on.

Data that has been removed by a commit remains in the branch, since all commits are kept in the branch. Permanently deleting data from a branch is possible by making a snapshot, compacting the operations from all the commits in the branch, then ending the current branch with an EndOfBranch commit, and creating a forked branch that depends only on the snapshot. After this the old branch can be removed from the published list of branches in the root branch, and its commits can be deleted at the expiry time set in the EndOfBranch commit.

User repositories

Each user has a repository shared among their devices that stores the addresses and encryption keys for repositories, branches, and user identities, as well as brokers to connect to for each repository. This allows a user to share configuration, data and identities across their different devices.

To access this repository, the Argon2 key-derivation function is used to derive the encryption key from a password. This key is used to encrypt a RepoKeys data structure that contains the private key for the user repository.

Network architecture

Fig. 2: Core and edge networks of a two-tier community overlay network

The network is organized as an independent overlay network for each community, composed of community members' edge and core nodes (we use the terms nodes and peers interchangeably). A data repository is associated with each community overlay network, that is used by community members to share and collaborate on data. Each community is responsible for their own networking and storage, as opposed to storing data and relaying messages via non-interested nodes, as it is the case when a global overlay is used. Relying on community members' nodes instead of arbitrary non-interested and untrusted nodes increases efficiency, reliability, privacy, and security of the network, and establishes incentives for node participation, since nodes only contribute resources towards the communities they participate in.

The network architecture follows a two-tier peer-to-peer design that consists of local edge networks composed of end-user devices, and a core overlay network composed of brokers. The core network facilitates communication among remote nodes in different edge networks, and enables asynchronous communication via store-and-forward message brokers that provide routing and storage services. Each user is responsible for acquiring access to one or more core nodes of their choice, which can be self-hosted or offered by service providers. Each core node provides services only to the communities their users are part of.

The two-tier design also enables location privacy and reduces load on end-user devices, since they only connect to their brokers of choice in the core network and do not establish direct connections with remote nodes. This allows mobile devices to participate in the network without quickly draining their batteries, since they only have to maintain a single connection to a message broker, and they're not responsible for routing or storing messages for other nodes, while they can still participate in P2P protocols on edge networks on an on-demand basis.

The overlay messages are specified in the Data structures section.

Global overlay

For community overlays that want to be discoverable by ID, we employ a global overlay among core nodes, which consists of a trust-aware peer sampling protocol and a privacy-preserving interest clustering protocol based on community overlay membership, as described in [11]. The global overlay is optional. In the absence of it, peers rely on an explicit list of community overlay peers in order to join the overlay for the first time (see OverlayJoin and RepoLink). Peers store the list of connected peers for each community overlay, including the ones later discovered, and use it to rejoin the overlay the next time.

Community overlay networks

Each community overlay hosts a data repository and provides content-addressed object storage and pub/sub event dissemination services for their members. The pub/sub service is used for publishing commits in branches, and the object store allows synchronization between replicas.

The overlay network is identified by a BLAKE3 keyed hash over the repository public key, using a BLAKE3-derived key from the repository public key & secret. This helps to avoid correlation of overlays across edge networks.

Messages in the overlay (OverlayMessage) are first padded then encrypted using ChaCha20 with a key derived from the repository public key & secret and a per-peer random session ID, such that only peers in the overlay can decrypt them. Message padding provides additional metadata protection against observing object sizes transmitted over the network.

Peers establish TLS or QUIC connections among each other when connecting over IP. However, the system does not depend on IP, and can function over other transports as well, such as overlay networks that provide public key routing.

Overlay construction & maintenance

A peer first establishes a connection following the reception of a RepoLink message, which contains the repository public key and secret, and the network addresses of one or more overlay peers to connect to.

Peers in the overlay discover each other via peer advertisement messages (PeerAdvert) that are sent periodically and follow random walks across the overlay with limited hops. Each node strives to reduce the number of connections by preferring to connect to peers with overlapping pub/sub topic subscriptions (interest clustering).

The PeerAdvert message contains a Bloom filter of a peer's topic subscriptions for the overlay it is sent to. This allows peers to find others with a similar subscription set, making pub/sub event dissemination in the overlay more efficient by reducing the number of connections required for a peer to cover all of its topic subscriptions, and by reducing the number of forwarding hops necessary from a publisher to subscribers.

Interest clustering also works across different overlays: when nodes receive multiple PeerAdvert messages from the same peer in different overlays, the subscriptions from all overlays are considered together in the interest clustering algorithm.

Publish-subscribe protocol

Pub/sub is used for sending low-latency push notifications about changes in a branch to subscribers of the corresponding pub/sub topic. Each commit and its referenced files are published via pub/sub.

The pub/sub protocol should satisfy the following requirements:

Causal delivery
Events should be delivered to the application in causal order.
Reliability
All published events should be delivered exactly once to subscribers (by detecting and retransmitting missed events and storing delivered events)
Fault tolerance
The system should tolerate and recover from faults in the event forwarding path (by establishing redundant paths)

An overview and detailed analysis of causal pub/sub can be found in [9].

A pub/sub topic is identified by a public key. Each published event must be signed with the corresponding private key in order to be routed by the pub/sub broker network. Events without a valid signature get dropped. This is necessary to enforce publishing permissions, to eliminate unsolicited events sent to a topic, and to avoid amplification attacks by unauthorized publishers that may result in denial of service.

Branches are mapped to pub/sub topics by the Branch definition. The topic private key is shared among all publishers, who publish each new commit as a signed event in the pub/sub topic. This is necessary in order to restrict the publishing of events to authorized publishers only, and at the same time hide the publisher's public key identities from the pub/sub brokers.

For each commit, change notifications are sent to the appropriate pub/sub topic as a Change inside a signed Event message. Each Change contains an encrypted object, which is either a chunk of a commit, or a chunk of a file that a commit references. In case of the first chunk of a commit, it also contains the encryption key for the commit object, encrypted with a key derived from the branch public key & secret, and the publisher's public key. Brokers cannot decrypt this object, only subscribers, who need to look up the branch secret and the publisher's public key matching the publisher hash in the branch definition.

The Event is signed by the topic's private key, and contains a publisher hash that is a BLAKE3 keyed hash over the commit author's public key, the publisher's event sequence number used as encryption nonce and to detect missed events.

A pub/sub broker when subscribing to a topic first issues a BranchHeadsReq request to its upstream peers in the topic, then uses the synchronization protocol to synchronize the DAG for the branch based on the object IDs of the heads. Pub/sub brokers store and forward events for their subscribers, and recover missing events when they notice a missing sequence number via an EventReq request sent to upstream peers in the pub/sub topic.

We use a pub/sub subscription and event routing protocol inspired by LoCaPS [9]. Here follows a description of the base protocol without optimizations for subscription coverage and interest clustering that we leave for future work.

Publishers flood topic advertisements (TopicAdvert) to the network creating subscription routing table entries. A subscription request is forwarded from a subscriber to all publishers along subscription routing table entries, and creates an event routing table entry at each broker on the path. In response, each publisher sends an acknowledgement (SubAck) to all subscribers in an Event.

Events are forwarded from publishers to subscribers along event routing table entries. The subscription algorithm can establish multiple event routing paths when a subscriber sends subscription requests to multiple peers.

Branch synchronization

Replicas perform branch synchronization using a DAG synchronization protocol described in [5][6], by exchanging branch heads and a Bloom filter of known commits since the last synchronization with the given peer (BranchSyncReq & BranchSyncRes).

Object requests

Objects requests either follow the reverse path of a pub/sub topic from a subscriber to publishers (ObjectReqTopic), or a random walk (ObjectReqRandom). The response (ObjectResponse) contains one or more objects, and are sent either directly to the requesting node, or along the reverse path of the request.

Requests along a pub/sub topic are effective when a publisher refers to an object in a commit that some subscribers want to fetch, which is likely hosted by the publisher and its broker. The response gets cached on the way back to the requestor, in order to serve subsequent requests from other subscribers.

The other method involves issuing multiple requests along random walks across the overlay, and routing the response back to the requestor.

Application protocol

The application protocol is used by applications to interact with the local node, and by edge nodes to interact with brokers in the core network.

It allows applications to request brokers to join & leave an overlay (OverlayJoin & OverlayLeave) , subscribe to & unsubscribe from topics (TopicSubReq & TopicUnsubReq), publish & receive events (Event), synchronize branches (BranchHeadsReq & BranchSynchReq), as well as interact with the object store to: download & upload objects (ObjectGet & ObjectPut), pin objects for permanent storage (ObjectPin), copy objects with a different expiry time before sharing (ObjectCopy), and delete objects from storage (ObjectDel).

External requests

Sharing data from a repository for external clients not part of the overlay and not members of the repository is possible by including a Message Authentication Code (MAC) in the request (ExtRequest). For this purpose we use a BLAKE3 keyed hash over the request content, keyed with the repository public key and secret. This functionality is optional and can be enabled by setting the allowExtRequests flag in the Repository definition.

Object requests from external clients (ExtObjectReq) contain a list of object IDs to request, and a flag whether or not to include all children dependencies of the object, which allows cloning a branch at a specific commit, and an expiry time of the request after which it becomes invalid and overlay peers won't serve the request anymore.

Branch synchronization for external clients is provided via ExtBranchHeadsReq and ExtBranchSync that work the same way as the similarly named requests in the application protocol.

Future work

As future work we intend to:

Data structures

In this section we document the data structures for repository objects and overlay messages as BARE [12] message schema. BARE is a compact binary encoding that does not encode schema information. An external schema is used instead that provides a human and machine-readable specification, which is the source of code generation for various programming language targets. Data structures are versioned in order to allow schema evolution with forwards and backwards compatibility of messages.

### COMMON DATA TYPES ###

# 32-byte BLAKE3 hash
type Blake3Hash32 data[32]

# 32-byte hash
type Hash32 union { Blake3Hash32 }

# 32-byte symmetric key
type Key32 data[32]

# Curve25519 public key
type Curve25519PubKey data[32]

# Curve25519 private key
type Curve25519PrivKey data[32]

# Public key
type PubKey union { Curve25519PubKey }

# Private key
type PrivKey union { Curve25519PrivKey }

# Ed25519 signature
type Ed25519Sig data[64]

# Signature
type Signature union { Ed25519Sig }

# Timestamp: absolute time in minutes since 2022-02-22 22:22 UTC
type Timestamp u32

type Seconds u8
type Minutes u8
type Hours u8
type Days u8

 # Relative time (e.g. delay from current time)
type RelTime union {
  Seconds | Minutes | Hours | Days
}

### STORAGE OBJECTS ###

# Object ID
# BLAKE3 hash over the serialized Object with encrypted content
type ObjectId Hash32

# Object reference
type ObjectRef struct {
  ObjectId                 # Object ID
  Key32                   # Encryption key
}

# Internal node of the Merkle tree
type InternalNode list<Key32>

# Data chunk at the leaf of the Merkle tree
type DataChunk data

# List of ObjectId dependencies
type DepList list<ObjectId>

# Immutable object with an encrypted chunk of data.
# Data is chunked and stored in a Merkle tree.
# - expiry: expiry time when the object should be deleted
#           by all replicas that store a copy of it
# - content: encrypted using convergent encryption with ChaCha20:
#   - convergence_key: BLAKE3 derive_key ("LoFiRe Data BLAKE3 key",
#                                         repo_pubkey + repo_secret)
#   - key: BLAKE3 keyed hash (convergence_key, plain_object)
#   - nonce: 0
type ObjectV1 struct {
  list<ObjectId>     # Objects IDs for child nodes in the Merkle tree
  union {                # Other objects this object depends on (e.g. Commit deps & acks)
    list<ObjectId> | ObjectRef # Either a list of Object IDs (max. 8)
  }                            # or an ObjectRef that contains a DepList
  optional<Timestamp>  # Expiry time of this object and all of its children
  union {
    InternalNode |             # Internal node with references to children
    DataChunk                  # Leaf node with encrypted data chunk
  }
}
type Object union { ObjectV1 }

# Repository definition
# Published in root branch, where:
# - branch_pubkey: repo_pubkey
# - branch_secret: BLAKE3 derive_key ("LoFiRe Root Branch secret",
#                                     repo_pubkey + repo_secret)
type RepositoryV1 struct {
  PubKey                   # Repo public key ID
  list<ObjectRef>    # List of branches
  bool       # Whether or not to allow external requests
  data               # App-specific metadata
}
type Repository union { RepositoryV1 }

# Add a branch to the repository
type AddBranchV1 ObjectRef
type AddBranch union { AddBranchV1 }

# Remove a branch from the repository
type RemoveBranchV1 ObjectRef
type RemoveBranch union { RemoveBranchV1 }

# Commit object types
type CommitType enum {
  REPOSITORY ADD_BRANCH REMOVE_BRANCH
  BRANCH ADD_MEMBERS END_OF_BRANCH
  TRANSACTION SNAPSHOT COMMIT_ACK
}

# Member of a branch
type MemberV1 struct {
  PubKey                    # Member public key ID
  list<CommitType> # Commit types the member is allowed to publish in the branch
  data                # Application-specific metadata (role, permissions, cryptographic material, etc)
}
type Member union { MemberV1 }

# Branch definition
# First commit in a branch, signed by branch key
# In case of a fork, the commit deps indicate the previous branch heads
type BranchV1 struct {
  PubKey                   # Branch public key ID
  PubKey                # Pub/sub topic for publishing events
  Key32                # Branch secret key
  list<Member>        # Members with permissions
  map<CommitType><u32> # Number of acks required for a commit to be valid
  RelTime            # Delay to send explicit acks if not enough implicit acks arrived by then
  list<data>             # Tags for organizing branches within the repository
  data               # App-specific metadata (validation rules, etc)
}
type Branch union { BranchV1 }

# Add members to an existing branch
# If a member already exists, it overwrites the previous definition,
# in that case this can only be used for adding new permissions,
# not to remove existing ones.
# The quorum and ackDelay can be changed as well
type AddMembersV1 struct {
  list<Member>        # Members to add, with permissions
  optional<map<CommitType><u32>> # New quorum
  optional<RelTime>  # New ackDelay
 }
type AddMembers union { AddMembersV1 }

# End of branch
# No more commits accepted afterwards, only acks of this commit
# May reference a fork where the branch continues
# with possibly different members, permissions, validation rules.
type EndOfBranchV1 struct {
  fork: optional<ObjectRef>    # (Encrypted) reference to forked branch (optional)
  Timestamp            # Expiry time when all commits in the branch should be deleted
}
type EndOfBranch union { EndOfBranchV1 }

# Transaction with CRDT operations
type TransactionV1 data
type Transaction union { TransactionV1 }

# Snapshot of a branch
# with a data structure computed from the commits at the specified head
type SnapshotV1 struct {
  list<ObjectId>        # Branch heads the snapshot was made from
  data                # Snapshot data structure
}
type Snapshot union { SnapshotV1 }

# Acknowledgement of another commit
type CommitAckV1 void
type CommitAck union { CommitAckV1 }

# Commit body, corresponds to CommitType
type CommitBody union {
  Repository | AddBranch | RemoveBranch |
  Branch | AddMembers | EndOfBranch |
  Transaction | Snapshot | CommitAck
}

# Commit object
# Signed by branch key, or one of the members authorized to publish this commit type
type CommitV1 struct {
  struct {
    PubKey             # Commit author
    u32                   # Author's commit sequence number in this branch
    ObjectRef          # Branch the commit belongs to
    list<ObjectRef>      # Direct dependencies of this commit
    list<ObjectRef>      # Not directly dependent heads to acknowledge
    list<ObjectRef>      # Files the commit references
    data             # App-specific metadata (commit message, creation time, etc)
    ObjectRef            # Object with a CommitBody inside
    optional<Timestamp># Expiry time of the body object
  }
  Signature               # Signature over the content by the author
}
type Commit union { CommitV1 }

# A file stored in an Object
type FileV1 struct {
  data
  data
  data
}
type File union { FileV1 }

type Data union { Commit | CommitBody | File }


### COMMON DATA TYPES FOR MESSAGES ###

type PeerId PubKey

# IP address
type IPv4 data[4]
type IPv6 data[16]
type IP union { IPv4 | IPv6 }
type MacAddr data[6]

type IPTransportProtocol enum { TLS QUIC }

type IPTransportAddr struct {
  IP
  u16
  IPTransportProtocol
}

# Network address
type NetAddr union { IPTransportAddr }

# Bloom filter (variable size)
type BloomFilter struct {
  u8                        # Number of hash functions
  data                      # Filter
}

# Bloom filter (128 B)
# (m=1024; k=7; p=0.01; n=107)
type BloomFilter128 data[128]

# Bloom filter (1 KiB)
# (m=8192; k=7; p=0.01; n=855)
type BloomFilter1K data[1024]

### OVERLAY MESSAGES ###

# Overlay connection request
# Sent to an existing overlay member to initiate a session
type OverlayConnectV1 void
type OverlayConnect union { OverlayConnectV1 }

# Overlay disconnection request
# Sent to a connected overlay member to terminate a session
type OverlayDisconnectV1 void
type OverlayDisconnect union { OverlayDisconnectV1 }

# Topic advertisement by a publisher
# Flooded to all peers in overlay
# Creates subscription routing table entries
type TopicAdvert struct {
  struct {
    PubKey              # Topic public key
    PeerId               # Peer public key
  }
  Signature               # Signature over content by topic key
}

# Topic subscription request by a peer
# Forwarded towards all publishers along subscription routing table entries
# that are created by TopicAdverts
# Creates event routing table entries along the path
type SubReq struct {
  u64                      # Random ID generated by the subscriber
  PubKey                # Topic public key
}

# Topic subscription acknowledgement by a publisher
# Sent to all subscribers in an Event
type SubAck struct {
  u64                     # SubReq ID to acknowledge
}

# Topic unsubscription request by a subscriber
# A broker unsubscribes from upstream brokers
# when it has no more subscribers left
type UnsubReq struct {
  PubKey                # Topic public key
}

# Topic unsubscription acknowledgement
# Sent to the requestor in response to an UnsubReq
type UnsubAck struct {
  PubKey                # Topic public key
}

# Branch change notification
# Contains a chunk of a newly added Commit or File referenced by a commit.
#
# - key: encryption key for the Commit object in payload,
#   the key is encrypted using ChaCha20:
#   - key: BLAKE3 derive_key ("LoFiRe Event ObjectRef ChaCha20 key",
#                             branch_pubkey + branch_secret +
#                             publisher_pubkey)
type ChangeV1 struct {
  Object            # Object with encrypted content
  optional<Key32>       # Encrypted key for the Commit object in content
}
type Change union { ChangeV1 }

type EventBody union {
  SubAck |
  Change
}

# Pub/sub event published in a topic
# Forwarded along event routing table entries
#
# - publisher: BLAKE3 keyed hash over branch member pubkey
#   - key: BLAKE3 derive_key ("LoFiRe Event publisher BLAKE3 key",
#                             repo_pubkey + repo_secret +
#                             branch_pubkey + branch_secret)
# - payload: Object with encrypted content
# - nonce: commit_seq
type EventV1 struct {
  struct {
    PubKey              # Pub/sub topic
    Hash32          # Publisher pubkey hash
    u32                   # Commit sequence number of publisher
    EventBody            # Event body
  }
  Signature               # Signature over content by topic key
}
type Event union { EventV1 }

# Object request by ID along the reverse path of a pub/sub topic
# from a subscriber to all publishers
type ObjectReqTopicV1 struct {
  PubKey                # Topic to forward the request in
  list<ObjectId>          # List of Object IDs to request
  bool              # Whether or not to include all children recursively in the response
  list<PeerId>           # List of Peer IDs the request traversed so far
}
type ObjectReqTopic union { ObjectReqTopicV1 }

# Object request by ID using a random walk
type ObjectReqRandomV1 struct {
  list<ObjectId>          # List of Object IDs to request
  bool              # Whether or not to include all children recursively in the response
  u8                   # Number of random nodes to forward the request to at each step
  list<PeerId>           # List of Peer IDs the request traversed so far
}
type ObjectReqRandom union { ObjectReqRandomV1 }

# Response to an Object request
# Follows request path with possible shortcuts
type ObjectResponse struct {
  list<PeerId>           # Response path
  list<Object>        # Resulting Object(s)
}

# Request latest events corresponding to the branch heads in a pub/sub topic
# In response an Event is sent for each commit chunk that belong to branch heads
# that are not present in the requestor's known heads
type BranchHeadsReqV1 struct {
  PubKey                # Topic public key of the branch
  list<ObjectId>   # Known heads
}
type BranchHeadsReq union { BranchHeadsReqV1 }

# Branch synchronization request
# In response a stream of Objects are sent
# that are not present in the requestor's known heads and commits
type BranchSyncReqV1 struct {
  list<ObjectId>        # Heads to request, including all their deps
  list<ObjectId>   # Fully synchronized up until these commits
  BloomFilter    # Known commit IDs since knownHeads
}
type BranchSyncReq union { BranchSyncReqV1 }

# Request missed events for a pub/sub topic
# for the specified range of publisher sequence numbers
# In response an EventRes then a stream of Events are sent
type EventReqV1 struct {
  PubKey                # Topic public key
  list<struct {
    Hash32          # Publisher ID
    u32                  # First sequence number to request
    u32                    # Last sequence number to request
  }>
}
type EventReq union { EventReqV1 }

# Response to an EventReq
type EventResV1 struct {
  list<struct {
    Hash32          # Publisher ID
    u32                  # First sequence number to send
    u32                    # Last sequence number to send
  }>
}
type EventRes union { EventResV1 }

type OverlayRequestV1 struct {
  u64                      # Request ID
  union {
    EventReq |
    BranchHeadsReq |
    BranchSyncReq
  }
}
type OverlayRequest union { OverlayRequestV1 }

type OverlayResponseV1 struct {
  u64                      # Request ID
  u8                   # Result
  optional<union {
    EventRes |
    Object
  }>
}
type OverlayResponse union { OverlayResponseV1 }

# Peer advertisement
# Sent periodically across the overlay along random walks
type PeerAdvertV1 struct {
  struct {
    PeerId               # Peer ID
    BloomFilter128       # Topic subscriptions
    list<NetAddr>     # Network addresses
    u16               # Version number
    data             # App-specific metadata (profile, cryptographic material, etc)
  }
  Signature               # Signature over content by peer's private key
  u8                      # Time-to-live, decremented at each hop
}
type PeerAdvert union { PeerAdvertV1 }

# Overlay ID
# - for public overlays that need to be discovered by public key:
#   BLAKE3 hash over the repository public key
# - for private overlays:
#   BLAKE3 keyed hash over the repository public key
#   - key: BLAKE3 derive_key ("LoFiRe OverlayId BLAKE3 key", repo_secret)
type OverlayId Hash32

# Overlay session ID
# Used as a component for key derivation.
# Each peer generates it randomly when (re)joining the overlay network.
type SessionId u64

type OverlayMessageContent struct {
  union {
    OverlayConnect | OverlayDisconnect |
    PeerAdvert | TopicAdvert |
    SubReq | UnsubReq | UnsubAck |
    Event |
    ObjectReqTopic | ObjectReqRandom |
    OverlayRequest | OverlayResponse
  }
  data                # Optional padding
}

# Overlay message
# - overlay_secret: BLAKE3 derive_key ("LoFiRe Overlay BLAKE3 key",
#                                      repo_pubkey + repo_secret)
# - content: encrypted with ChaCha20
#   - key: BLAKE3 derive_key ("LoFiRe OverlayMessage ChaCha20 key",
#                             overlay_secret + session_id)
#   - nonce: per-session message sequence number of sending peer
# - mac: BLAKE3 keyed hash over the encrypted content
#   - key: BLAKE3 derive_key ("LoFiRe OverlayMessage BLAKE3 key",
#                             overlay_secret + session_id)
type OverlayMessageV1 struct {
  OverlayId           # Overlay ID
  SessionId           # Session ID
  OverlayMessageContent # Encrypted content
  Hash32                  # BLAKE3 MAC
}
type OverlayMessage union { OverlayMessageV1 }


### APPLICATION PROTOCOL ###

# Server hello sent upon a client connection
type ServerHelloV1 struct {
  data                  # Nonce for ClientAuth
}
type ServerHello union { ServerHelloV1 }

# Client authentication by a user's device
type ClientAuthV1 struct {
  struct {
    PubKey               # User pub key
    PubKey             # Device pub key
    data                # Nonce from ServerHello
  }
  Signature               # Signature by device key
}
type ClientAuth union { ClientAuthV1 }


# Add user account
type AddUserV1 struct {
  struct {
    PubKey               # User pub key
  }
  Signature               # Signature by admin key
}
type AddUser union { AddUserV1 }

# Remove user account
type DelUserV1 struct {
  struct {
    PubKey               # User pub key
  }
  Signature               # Signature by admin key
}
type DelUser union { DelUserV1 }

# Authorize device key
type AuthorizeDeviceKeyV1 struct {
  struct {
    PubKey             # Device pub key
  }
  Signature               # Signature by user key
}
type AuthorizeDeviceKey union { AuthorizeDeviceKeyV1 }

# Revoke device key
type RevokeDeviceKeyV1 struct {
  struct {
    PubKey             # Device pub key
  }
  Signature               # Signature by user key
}
type RevokeDeviceKey union { RevokeDeviceKeyV1 }

# Request to join an overlay
type OverlayJoinV1 struct {
  Key32                # Overlay secret
  list<PeerAdvert>      # Peers to connect to
}
type OverlayJoin union { OverlayJoinV1 }

# Request to leave an overlay
type OverlayLeaveV1 void
type OverlayLeave union { OverlayLeaveV1 }

# Request an object by ID
type ObjectGetV1 struct {
  list<ObjectId>          # List of Object IDs to request
  bool        # Whether or not to include all children recursively
  optional<PubKey>      # Topic the object is referenced from
}
type ObjectGet union { ObjectGetV1 }

# Request to store an object
type ObjectPutV1 struct {
  Object
}
type ObjectPut union { ObjectPutV1 }

# Request to pin an object
# Brokers maintain an LRU cache of objects,
# where old, unused objects might get deleted to free up space for new ones.
# Pinned objects are retained, regardless of last access.
# Note that expiry is still observed in case of pinned objects.
# To make an object survive its expiry it needs to be copied with a different expiry time.
type ObjectPinV1 struct {
  ObjectId
}
type ObjectPin union { ObjectPinV1 }

# Request to unpin an object
type ObjectUnpinV1 struct {
  ObjectId                 # Object ID
}
type ObjectUnpin union { ObjectUnpinV1 }

# Request to copy an object with a different expiry time
type ObjectCopyV1 struct {
  ObjectId                 # Object ID
  optional<Timestamp>  # Expiry
}
type ObjectCopy union { ObjectCopyV1 }

# Request to delete an object
type ObjectDelV1 struct {
  ObjectId                 # Object ID
 }
type ObjectDel union { ObjectDelV1 }

# Request subscription to a topic
# For publishers a private key also needs to be provided
type TopicSubV1 struct {
  PubKey                # Topic public key
  optional<PrivKey>       # Topic private key for publishers
}
type TopicSub union { TopicSubV1 }

# Request unsubscription from a topic
type TopicUnsubV1 struct {
  PubKey
}
type TopicUnsub union { TopicUnsubV1 }

# Application request
type AppRequest struct {
  u64                      # Request ID
  union {
    OverlayJoin | OverlayLeave |
    TopicSub | TopicUnsub |
    Event |
    ObjectGet | ObjectPut |
    ObjectPin | ObjectUnpin |
    ObjectCopy | ObjectDel |
    BranchHeadsReq | BranchSyncReq
  }
}

# Result codes
type Result enum { OK ERROR }

# Response to a request
type AppResponse struct {
  u64                      # Request ID
  u8                   # Result (incl but not limited to Result)
  optional<union {
    Object
  }>
}


# Application message for an overlay
type AppOverlayMessage struct {
  OverlayId
  union {
    AppRequest | AppResponse |
    Event
  }
}

# Application message
type AppMessage struct {
  union {
    ServerHello | ClientAuth |
    AddUser | DelUser |
    AuthorizeDeviceKey | RevokeDeviceKey |
    AppOverlayMessage
  }
  data                # Optional padding
}

### EXTERNAL REQUESTS ###

# Link to object(s) by ID from a repository that can be shared to non-members
# The request is sent to a node with a replica of the repository,
# The response includes the requested objects and all their children recursively,
# and optionally all object dependencies recursively.
#
# - mac: BLAKE3 keyed hash over content
#   - key: BLAKE3 derive_key ("LoFiRe ExtObjectReq BLAKE3 key",
#                             repo_pubkey + repo_secret)
type ExtObjectReqV1 struct {
  PubKey                 # Repository to request the objects from
  list<ObjectId>          # List of Object IDs to request, including their children
  bool        # Whether or not to include all children recursively
  optional<Timestamp>  # Expiry time after which the link becomes invalid
}
type ExtObjectReq union { ExtObjectReqV1 }

# Branch heads request
type ExtBranchHeadsReq union { BranchHeadsReqV1 }

# Branch synchronization request
type ExtBranchSyncReq union { BranchSyncReqV1 }

# External request authenticated by a MAC
# - mac: BLAKE3 keyed hash over content
#   - key: BLAKE3 derive_key ("LoFiRe ExtRequest BLAKE3 key",
#                             repo_pubkey + repo_secret)
type ExtRequest struct {
  u64                      # Request ID
  union {
     ExtObjectReq |
     ExtBranchHeadsReq |
     ExtBranchSyncReq
  }
  Hash32                  # BLAKE3 MAC over content
}

# Response to an external request
type ExtResponse struct {
  u64                      # Request ID
  u8                   # Result
 }


### DIRECT MESSAGES ###

# Link/invitation to the repository
type RepoLinkV1 struct {
  PubKey                   # Repository public key ID
  Key32                # Repository secret
  list<PeerAdvert>      # Peers to connect to
}
type RepoLink union { RepoLinkV1 }

# Owned repository with private key
type RepoKeysV1 struct {
  PrivKey                 # Repository private key
  Key32                # Repository secret
  list<PeerAdvert>      # Peers to connect to
}
type RepoKeys union { RepoKeysV1 }

# Link to object(s) or to a branch from a repository
# that can be shared to non-members
type ObjectLinkV1 struct {
  ExtRequest              # Request to send to an overlay peer
  list<ObjectRef>        # Keys for the requested objects
}
type ObjectLink union { ObjectLinkV1 }

### BROKER STORAGE ###

# A topic this node subscribed to in an overlay
type TopicV1 struct {
  PubKey                   # Topic public key ID
  optional<PrivKey>   # Topic private key for publishers
  list<ObjectId>        # Set of branch heads
  u32                   # Number of local users that subscribed to the topic
}
type Topic union { TopicV1 }

# An overlay this node joined
type OverlayV1 struct {
  OverlayId                # Overlay ID
  Key32                # Overlay secret
  list<PeerAdvert>      # Known peers with connected flag
  list<Topic>          # Topics this node subscribed to in the overlay
  u32                   # Number of local users that joined the overlay
  Timestamp        # Last access by any user
}
type Overlay union { OverlayV1 }

# User accounts
# stored as user_pubkey -> Account
type AccountV1 struct {
  list<PubKey> # Authorized device pub keys
  bool                  # Admins can add/remove user accounts
  list<Overlay>      # Overlays joined
  list<Topic>         # Topics joined, with publisher flag
}
type Account union { AccountV1 }

Acknowledgements

Thank you to Niko Bonnieure for feedback on a draft of this document.

References

[1]
M. Kleppmann, A. Wiggins, P. Van Hardenberg, and M. McGranaghan, “Local-first software: You own your data, in spite of the cloud,” in Proceedings of the 2019 ACM SIGPLAN international symposium on new ideas, new paradigms, and reflections on programming and software, 2019, pp. 154–178 [Online]. Available: https://www.inkandswitch.com/local-first/
[2]
C. Baquero, P. S. Almeida, and A. Shoker, “Making operation-based CRDTs operation-based,” in IFIP international conference on distributed applications and interoperable systems, 2014, pp. 126–140 [Online]. Available: https://haslab.uminho.pt/sites/default/files/ashoker/files/opbaseddais14.pdf
[3]
K. Karlsson et al., “Vegvisir: A partition-tolerant blockchain for the internet-of-things,” in 2018 IEEE 38th international conference on distributed computing systems (ICDCS), 2018, pp. 1150–1158 [Online]. Available: https://vegvisir.cs.cornell.edu/html/files/papers/vegvisir-paper.pdf
[4]
M. Kleppmann and A. R. Beresford, “Automerge: Real-time data sync between edge devices,” in 1st UK mobile, wearable and ubiquitous systems research symposium (MobiUK 2018). Https://mobiuk. Org/abstract/S4-P5-kleppmann-automerge. pdf, 2018, pp. 101–105 [Online]. Available: https://mobiuk.org/abstract/S4-P5-Kleppmann-Automerge.pdf
[5]
M. Kleppmann and H. Howard, “Byzantine eventual consistency and the fundamental limits of peer-to-peer databases,” CoRR, vol. abs/2012.00472, 2020 [Online]. Available: https://arxiv.org/abs/2012.00472
[6]
M. Kleppmann, “Using bloom filters to efficiently synchronise hash graphs,” 2020 [Online]. Available: https://martin.kleppmann.com/2020/12/02/bloom-filter-hash-graph-sync.html
[7]
pukkamustard, “Distributed mutable containers” [Online]. Available: https://inqlab.net/projects/dmc/
[8]
pukkamustard, “Encoding for robust immutable storage (ERIS)” [Online]. Available: https://inqlab.net/projects/eris/
[9]
F. S. R. Pedrosa, “LoCaPS: Localized causal publish-subscribe,” 2020 [Online]. Available: https://www.gsd.inesc-id.pt/~ler/reports/filipapedrosamsc.pdf
[10]
V. Santos and L. Rodrigues, “Localized reliable causal multicast,” in 2019 IEEE 18th international symposium on network computing and applications (NCA), 2019, pp. 1–10 [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8935065
[11]
T. G. x Thoth, “UPSYCLE: Ubiquitous publish-subscribe infrastructure for collaboration on edge networks,” 2021 [Online]. Available: https://p2pcollab.net/design/upsycle
[12]
“BARE message encoding.” [Online]. Available: https://baremessages.org/