🚀 November 12th Live Webinar: Debunking Serverless Misconceptions
Register here ->
Fauna logo
FeaturesPricing
Learn
Customers
Company
Support
Log InContact usStart for free
Fauna logo
FeaturesPricing
Customers
Start for free
© 2024 Fauna, Inc. All Rights Reserved.
<- Back
postgres-to-fauna-part3

Modernizing from PostgreSQL to Serverless with Fauna Part 3

Brecht De Rooms & Shadid Haque|Feb 22nd, 2021|

Categories:

Serverless

Introduction

In part one and part two, we looked at how Fauna can support basic and advanced data modeling regarding migrating an existing Postgres system to Fauna. This chapter will cover more general concepts such as referential integrity in Fauna, trade-offs in data modeling strategies, query optimization concepts, and indexes. We’ll also deeply investigate Fauna’s ability to support document and relational data types. We’ll also look at how Fauna’s zero-downtime schema migrations, user-defined functions, indexes, and hybrid schema enforcement enable developers to build modern enterprise applications, ensuring flexibility and data integrity.

PostgreSQL Modernization Series

Catalogue

Explore the other parts of the PostgreSQL to Fauna Modernization Series below:

Modernizing from PostgreSQL Part One

Modernizing from PostreSQL Part Two

Referential integrity

Referential integrity ensures the validity and consistency of relationships between tables in a relational database. It is a constraint that requires any foreign key in a table to reference a valid, existing primary key in another table or be null. Referential integrity ensures that all references between tables are valid and prevents data inconsistencies and orphan records, which reference non-existent entries in another table.

Fauna can implement many methods of referential integrity. As there are several methods, we’ll discuss each.

Unique constraints

Fauna can add unique constraints on single, multiple fields, or computed fields.

Foreign keys

Fauna not only supports foreign keys but goes one step further. In Fauna, the foreign keys are direct document references that allow the engine to navigate directly to the foreign document without requiring an optimizer or custom index. This is at the base of how joins work in Fauna and how Fauna can deliver low and predictable latency.

Cascading deletes in Fauna

Any operation that could interfere with another operation or result in large-scale efforts must be done with intent. The following example demonstrates how to accomplish a cascading delete by encapsulating parent and child deletes into a function that includes a looping function designed to handle large-scale processing.

collection Product {
  ...
  category: Ref<Category>
  ...

  // Defines the `byCategory()` index.
  // Use the index to get `Product` collection
  // documents by `category` value. In this case,
  // `category` contains `Category` collection documents.
  index byCategory {
    terms [.category]
  }
}
// Use the index and forEach() to delete the category and any related products:

// Gets a `Category` collection document.
let category = Category.byId("<CATEGORY_DOCUMENT_ID>")
// Gets `Product` collection documents that
// contain the `Category` document in the `category` field.
let products = Product.byCategory(category)

// Deletes the `Category` collection document.
category?.delete()
// Deletes `Product` collection documents that
// contain the `Category` document in the `category` field.
products.forEach(.delete()) // Returns `null`

Advocates for scalable databases typically say that referential integrity doesn’t scale, while advocates for traditional databases will strongly advise you to use them since integrity is more important than speed. Performance in traditional databases can even benefit from foreign keys since the query planner can use them to optimize execution plans.

The reality is more subtle. In most cases, the overhead of foreign keys is offset by the optimization advantages. Still, if that’s not the case, the performance impact can be noticeable (especially if one forgets to place indexes on foreign keys). A potential pitfall where it can become a bottleneck is when a highly connected item has to be deleted. Having cascading deletes in place could trigger a chain reaction that results in a very slow query and a big transaction; the bigger your data grows, the bigger the risk.

First, Fauna would not benefit from the potential performance gain since the procedural query equals the query plan for price and performance predictability. Due to a potential chain reaction, supporting cascading referential actions in a scalable database can be quite delicate. Fauna ensures that transactions stay relatively small, and introducing cascading deletes could circumvent these efforts. Although we could prematurely kill the transaction if it runs too long, it would not be a great user experience if a delete functionality works fine and suddenly stops working due to increased data size. Therefore, at this point, Fauna does not provide native cascading referential actions. Fauna is designed to always work predictably at scale. But does that mean you need to abandon data integrity? No! Data integrity is crucial, as you would expect from a database that provides ACID guarantees. Let’s take a look at possible ways that you could ensure integrity within FQL:

  • Verify the existence of a reference
  • Check references in transactions
  • Denormalization

Verify the existence of a reference

Verifying the existence of a reference or set of references is easy in FQL. You can do this with a simple if statement and exists() function.

let category = Category.byId('111') 

// check if the category document exists
if (category.exists()) {
  Product.create({
    name: "New Product",
    category: category,
    stock: 12,
    description: "some description",
    price: 23
  })
} else {
  "Category does not exist"
}

Check references in transactions

Instead of creating all documents separately, we can write a transaction that creates a product and links the new product to the corresponding category. If a category of that name doesn’t exist, it creates a new category.

let category = Category.byName("<CATEGORY_NAME>").first() ??
 Category.create({
   name: "<CATEGORY_NAME>",
   description:  "some description"
 })

Product.create({
 name: "New Product Name",
 category: category,
 stock: 12,
 description: "some description",
 price: 23
})

Denormalization

Related data is often embedded directly within a document or record rather than stored in separate tables (as in relational databases). For example, in Fauna, an order document might include an embedded array of items rather than referencing an item's collection via foreign keys. Normalizing is excellent when the child data can change a lot or would otherwise get repeated a lot. But embedding is fantastic when the data is small, won’t change, and is commonly read when querying the parent. Since Fauna supports both patterns, you can choose what works best for you in each situation.

Alternative modeling strategies

Fauna is a document-relational database that can handle patterns from both types. This flexibility allows you to choose the patterns that best fit your usage patterns and priorities.

In the previous article, we listed a variety of different implementation strategies for many-to-many relations but only implemented the association table as it was the most likely pattern deployed in Postgres.

Here, we’ll go into those additional patterns, their use cases, and their pros/cons. The remaining modeling options are under the “embedded” category and are common in document databases. You’ll note as we go through that Fauna’s advanced features of indexing on embedded objects & arrays allows for the implementation of any chosen pattern or even a hybrid.

Data modeling options for many-to-many relationships:

  1. Embedding documents
  2. Embed array of objects on one side of the relation
  3. Embedding an array of references on one side of the relation
  4. Embedding on both sides of relations
  5. Hybrid embedding

Considerations

As we go through the additional options in modeling many-to-many patterns, there are some considerations to help drive decisions. The following are the key items to evaluate and prioritize to help determine which modeling pattern is best for each of your application’s relationships:

  1. Read versus write optimization. Based on your application’s usage patterns, which operations are a higher priority? Fauna allows for both read-optimized, write-optimized, and hybrid approaches.
  2. Query flexibility. What are all the ways you require to query the related data? Will the joins always go one way? Will the related data frequently be returned with any query of the parent document? Are the access patterns dynamic?
  3. Number of operations. How many read and write operations will be required to store, query, and maintain the data? For example, normalized joins are an on-read operation that requires a read operation of both documents to return the requested data. While this allows for dynamic queries, potentially storing less data, etc, it also requires queries to do 2 IO operations, which could impact performance and cost.
  4. Size of data. How large is the data being related? If the data is small, then it’s possible that it’s fine to just store it inside the parent document. If it is large and the same data is stored repeatedly, it could be better to store a foreign key to reduce the amount of net overall data. Since data storage has become cheaper over the years, this has become less of a concern. But it’s still on the list of influential items to consider.
  5. Duplication of data. How often is the related data repeated (the same data stored in different parent documents)? If the same data is stored a lot, then it could be better to store it just once and put foreign keys inside the parent documents.
  6. Frequency of change. If the related data’s value changes after initial writing, placing it in a related document could be easier than making the change once. If embedded, you could have to update every version of the data, leading to lengthy, costly, or delayed operations.
  7. Concurrency of changes. If both sides of a relationship are independently changing with high concurrency, then isolating them into separate documents could be a good idea.

Given the above concepts, as we go through the embedded options, we’ll cover how each modeling option affects these considerations as an alternative to using an association table.

part 3

Let’s observe the diagram above, which represents a many-to-many relationship. In the above data relationship, actors perform in many films, and films have many actors. As discussed in the Considerations section, it is very important when choosing the implementation model to review things like the access patterns, how often the data changes, how large the data sets are, etc.

Given this example, let’s go over various strategies to model these relationships in Fauna.

Embedding documents

Regarding data modeling for relationships, instead of placing related data in separate tables, like in Postgres, Fauna enables direct data storage inside the parent record (document). In this pattern, “Embedding” is made possible by Fauna’s full-featured JSON, array, and index support. Several options exist for building complex document shapes that can potentially optimize access patterns, performance, and cost.

Embed array of objects on one side of the relation:

The first option for embedding is the most straightforward implementation. Here, you embed the related data straight into the parent document. The data can be a complex object, an array, or a simple field.

An example would be modeling the relationship of a film’s actors. With the basic embedded pattern, we would store the actors' names directly in the parent document in an array, with each array element being an object containing the actor’s first and last names.

Here is an example of this pattern:

film.create({
  title: "Academy Dinosaur",
  actors: [
      { 
        name: {
          first: "Penelope",
          last: "Guinness"
        }
      },
      { 
          name: {
            first: "Johnny",
            last: "Lollobrigida"
          }
      },
  ]
})

By replacing an association table with an embedded array, the querying of the data becomes rather simple:

film.byTitle("Academy Dinosaur").first() {
  actors
}

Going further, the abilities of the basic embedded pattern to satisfy a many-to-many with Fauna, you can optimize for queries starting with the other side of the joins (start with actors, not films). Here, Fauna allows ad-hoc filtering of the fields inside the array of objects by introducing an index on a computed field.

Unoptimized query (no index):

film.where(.actors.map(.name.first).includes("Penelope"))

Optimized query (indexes on computed field):

  • Add computed field & index to film collection:
collection film {

  compute actorsByFirstName = (.actors.map(item => item.name.first))

  index filmsByActor {
    terms [mva(.actorsByFirstName)]
  }
}

Query to find all the films by an actor:

film.filmsByActor("Penelope") {
  name
}

The basic embedded modeling pattern is efficient for reads (one for all data) and writes (one for all data). It allows for simple & dynamic querying as Fauna provides for indexing of nested objects and MVA indexes. Via the MVA indexes, you can also query starting from the other side of the join (with the actor’s names). But, this pattern also has drawbacks, including data duplication. This can affect the overall amount of data stored and make any future changes to actor names harder to apply. However, since the amount of data (first and last names) is small and very unlikely to change (the actors in a movie shouldn’t change in the future), this could be a good optimization to make over using association tables.

We will get into more optimized versions of the embedded pattern in later sections. But for now, hopefully, you’re starting to see that the overall concept of embedding can be a powerful modeling tool that can replace association tables as the go-to pattern for many-to-many relationships.

Embedded data modeling advantages

  • Efficient for both reads & writes. Lowest latency and simplified operations for both.
  • Querying is flexible. You can start query from either side (film or actor). You can filter by either. All fields for the return set are available from both sides.
  • The least number of operations for both reads and writes (1), and the least compute effort.

Embedded data modeling disadvantages

  • Storage likely increases due to data duplication. Although storage is cheap and the duplicated fields are small.
  • Increased effort for updating values. You’d need to apply that to all locations if an actor's name were to change.
  • Write concurrency. Updating both actors and films with high concurrency could cause contention.

While this modeling pattern has a lot of advantages, it also has a few drawbacks. Continue reviewing the additional options to see which is best for you.

Embedding an array of references on one side of the relation:

This pattern is a direct adaptation of the embedded data modeling pattern above. In this version, we still embed values directly into documents. But, this time, instead of embedding the raw data, we create a normalized table with the raw data and store references in the parent document. In the film example, this would mean moving the actors’ names into a normalized Actors document and embedding foreign key references instead of the actors’ raw data.

For example:

film.create({
  title: "Giant",
  actors: [
    actor.byId(406683323649228873),
    actor.byId(416683323649229986)
   ]
})

With this pattern we have eliminated the association collection and moved the references directly into the film document. We can do this thanks to the excellent support for arrays and joins in Fauna.

In addition to querying based upon the film, you can also query all the films based up an actor. A query would look something like this:

film.where(.actors.map(a => a == actor.byId("406683323649228873")).first()) {
  title
}

When querying a film, you can get all the actor’s information with projection. The following is an example.

film.byTitle("Academy Dinosaur").first() {
  actors {
    name, 
    bio  
  }
}

Since actors is a reference, Fauna can project all the fields from the actor document.

You can optimize this query with an index as follows. Add the following index to your schema:

collection film {
  index actorsInFilm {
    terms [mva(.actors)]
  }
}

Then query using the index:

film.actorsInFilm(actor.byId("406683323649228873"))

Advantages

  • Less modeling complexity than association tables.
  • Overall storage should be about the same as an association table.
  • Updating the foreign record (actor in this example) is independent and fast.
  • Data duplication is less than the basic embedded pattern
  • Changing the array of values in the parent document (list of actors in a film in our example) is optimized as it would be far less data to transfer and update.
  • Allows for the foreign data to change in the future (if we wanted to add fields to the actor’s data, like middle name, place of birth, etc) compared to the basic embedded pattern.

Disadvantages

  • Requires more read IOs to gather query data. In this case each of the actors selected into the result set would need an additional IO to gather their data. This would inflate the number of reads for a query from 1 per film to 1 per film plus 1 per actor.
  • Indexing is no longer available on the embedded items’ raw values, increasing query complexity. In this case a query starting from the actor side would need to use a nested query pattern (sub-queries).

This pattern is a great compromise between the fully document-pattern of basic embedding and the relational pattern of normalizing. In general, the basic embedding is better when the related data is static and small, like a shipping address but the embedded array of references is more common when one or both sides mutate frequently, or the size of the embedded elements would be significant.

Embedding on both side of relations

Another potential pattern for modeling many-to-many relationships with Fauna is to embed arrays of references into the documents on both sides of the relationship. An example of this would be to embed all actors in an array in the film document as well as embed all films in an array in the actor document. While this is a possible pattern, the overhead of maintaining data on both sides of the related documents with arrays is likely not worth the effort. Especially when there are better options that can achieve the same outcomes. With that said, the following is a walk through on how this pattern could be deployed into Fauna.

Your actor and film documents would potentially look like the following:

// film document
{
 "id": "12323",
 "title": "The Great Adventure",
 "release_year": 2024,
 "genre": "Adventure",
 "actors": [
   {
     "actor_id": "222",
     "name": "John Smith",
     "role": "Protagonist"
   },
   {
     "actor_id": "333",
     "name": "Jane Doe",
     "role": "Antagonist"
   }
 ]
}
// actor document
{
 "id": "222",
 "name": "John Smith",
 "birthdate": "1980-05-20",
 "films": [
   {
     "film_id": "12323",
     "title": "The Great Adventure",
     "release_year": 2024,
     "role": "Protagonist"
   },
   {
     // ... more films
   }
 ]
}

This approach is most suitable when relationships are relatively static. One main drawback of this pattern is data redundancy. Data needs to be updated in multiple places in this pattern, this may lead to inconsistency. A better alternative is to embed the references in both sides.

// film document
{
 "id": "12323",
 "title": "The Great Adventure",
 "release_year": 2024,
 "genre": "Adventure",
 "actors": [
   Ref<actor>("122"),
   Ref<actor>("123"),
 ]
}

// actor document
{
 "id": "222",
 "name": "John Smith",
 "birthdate": "1980-05-20",
 "films": [
   Ref<film>("12323"),
   Ref<film>("12324"),
 ]
}
// Able to project on actor document from reference
film.byId("12323") {
   actors {
      name
   } 
}

Hybrid embedding

Note that you can use a combination of the two embedded patterns. You can embed a combination of the document reference and some of the raw values. Here, you would embed one or a few frequently accessed elements to optimize accessing data with the reverse query flow (accessing the film collection given an actor’s name, etc). This allows for indexing the desired fields, building computed indexes and sophisticated filtering while still having the reference embedded when you need to traverse to get more of the foreign data.

For example, if all of the actor’s searches start with the first or last name and most queries only return their names, then you can embed the names along with a reference to their document. And then, on the infrequent times when the date of birth is also needed, you can use the reference to retrieve that data too.

Advanced Query Optimization Strategies

Let’s look at some query optimization strategies. We briefly introduced computed fields in an earlier section, but let’s discuss how they can help optimize further.

You can use computed fields to create relationships between documents based on a read-only query.

collection Customer {
// Computed field definition for the `cart` field.
// `cart` contains an `Order` collection document.
// The value is computed using the `Order` collection's
// `byCustomerAndStatus()` index to get the first order
// for the customer with a `cart` status.
compute cart: Order? = (customer => Order.byCustomerAndStatus(customer, 'cart').first())
// ...
}

You call the computed field as follows:

Customer.byId("<CUSTOMER_ID>").cart

You can also use a computed field to create a relationship between a document and a set of documents.

collection Customer {
// ...
// Computed field definition for the `orders` field.
// `orders` contains a set of `Order` collection documents.
// The value is computed using the `Order` collection's
// `byCustomer()` index to get the customer's orders.
compute orders: Set<Order> = ( customer => Order.byCustomer(customer))
// ...
}

Computed fields in Fauna are dynamically calculated when queried but offer some key performance benefits:

  • Values are cached and only recalculated when underlying data changes.
  • It can be indexed, allowing for efficient querying and filtering
  • Reduce the need for complex joins across collections.

Learn more about data access patterns and optimization strategies here.

User-defined functions (UDF) as equivalents to stored procedures

Fauna UDFs encapsulate business logic in the database, enabling instant, serverless execution of custom operations on your data.

For example, with Fauna the combination of creating a product and category documents can be turned into a UDF for reusability. A number of document database has challenges with multi-document transactions or require special features to implement. With Fauna multi-document ACID transactions are fast & scalable without any need for specialized features or configurations. The following is the Fauna Schema Language (FSL) definition of the function in a Fauna schema.

function createProductWithCategory(
 productName,
 stock,
 description,
 price,
 categoryName,
 categoryDescription
) {

 let category = Category.byName(categoryName).first() ??
 Category.create({
   name: categoryName,
   description:  categoryDescription
 })

 Product.create({
   name: productName,
   category: category,
   stock: stock,
   description: description,
   price: price
 })
}

The following is an example function call for this UDF.

createProductWithCategory(
 "New Product Name",
 12,
 "some description",
 23,
 "<CATEGORY_NAME>",
 "some description"
)

You can learn more about UDFs here.

UDFs are similar to Postgres stored procedures. However, UDFs offer several advantages over Postgres stored procedures, particularly in modern, serverless, and distributed application architectures.

UDFs are optimized for Fauna's cloud-native, distributed environment, which can handle significant traffic loads and sub-second query performance. This makes them well-suited for applications that require high scalability and low latency.

In all Fauna operations, including UDFs, the Identity document of the user is available. This allows a function coder to implement deeper ABAC-level access controls and logic to functions beyond role-based access control (RBAC).

Click here to learn more about Fauna’s security model.

Fauna’s schemaless document model offers flexibility in data structure and adapts easily to changing requirements. This flexibility extends to UDFs, allowing them to handle diverse data structures without enforced schema constraints.

Hybrid schema enforcement

Fauna is natively schemaless at the document level but enables simple but powerful tools for schema enforcement. Thus, enforcement is both a choice in Fauna and a means to implement self-documenting and changing models over an application's life.

Postgres hard-enforces schemas at runtime for each table, meaning that the structure of the data (such as the types of columns and constraints) is strictly defined and enforced. This type of schema is hard to evolve as your application grows.

Fauna allows for more flexibility in the data structure and schema. Documents start schemaless to make initial development easier, can have a mix of enforced and dynamic fields, and, via migration blocks, fields can be added, deleted, or modified without altering a predefined schema.

Fauna’s schema feature blends the flexibility of document models with the strict data integrity controls typical of relational databases. This allows you to migrate your Postgres database to Fauna and make changes online over time without application interruption.

The following are some critical aspects of enforcement in Fauna:

Field Definitions

Collection definitions in Fauna allow developers to specify fields and their data types for documents within a collection. This provides for enforcing all documents conforming to a predefined structure, enhancing data integrity and consistency.

You can use field definitions to:

  • Ensure each document in a collection contains a specific field
  • Limit a field’s values to particular types
  • Set a default value for documents missing a field
  • Enumerate accepted values
  • Add business logic for more complex control
  • Create logical computed fields using data from the document or other documents.

The following is an example.

collection Product {
  // `name` is optional (nullable).
  // Accepts `String` or `null` values.
  name: String?  // Equivalent to `name: String | Null`

  // `price` is optional (nullable).
  // Accepts `Double` or `null` values.
  price: Double?

  // `quantity` is non-nullable.
  // Accepts only `Int` values.
  // If the field is missing on create, defaults to `0`.
  quantity: Int = 0

  // `creationTime` is non-nullable.
  // Accepts only `Time` or `Number` values.
  // If `null` or missing, defaults to the current time.
  creationTime: Time | Number = Time.now()

  // `category` is non-nullable.
  // Accepts only the enumerated "grocery",
  // "pharmacy", or "home goods" values.
  // If `null` or missing, defaults to "grocery".
  category: "grocery" | "pharmacy" | "home goods" = "grocery"
}

Progressive Enforcement

Fauna enables progressive enforcement of document types. Developers can start with a permissive schema and gradually introduce stricter type controls as application requirements evolve. This allows for a smooth transition from a schemaless approach to a more structured schema without disrupting the application.

Learn more about progressive environments in the documentation.

Wildcard Constraints

Fauna supports wildcard constraints, which permit ad hoc fields in documents while controlling the accepted data types for these fields. This offers flexibility by allowing new field definitions to be introduced dynamically and adjusting the degree of enforcement as needed.

When you add field definitions to a collection, the documents can only contain the defined fields. To accept arbitrary ad hoc fields, add a wildcard (*) constraint:

collection Order {
  status: String?

  // Wildcard constraint.
  // Allows arbitrary ad hoc fields.
  *: Any
}

Here is a use case illustrating the application of wildcard constraints: Imagine you are developing an order management system where each order can have a variety of optional attributes that are not known in advance. For example, some orders might include special instructions, while others might have additional metadata like delivery preferences or gift messages. Instead of defining every possible field in advance, you can use a wildcard constraint to allow these ad hoc fields.

collection Order {
  status: String?
  // Wildcard constraint allows arbitrary ad hoc fields
  *: Any
}

Zero-Downtime Migrations

Unlike Postgres, schema changes over an application's life are easy to implement. Fauna supports zero-downtime migrations, allowing developers to update field definitions and constraints without service interruptions. This feature is crucial for maintaining high availability and ensuring that schema changes are predictable and manageable.

To handle migrations, you include a migrations block in the collection schema. The block contains one or more imperative migration statements.

The statements instruct Fauna on migrating from the collection’s current field definitions and a wildcard constraint to the new ones.

collection Product {
  ...
  *: Any

  migrations {
    // Applied 2099-05-06
    add .typeConflicts
    add .quantity
    move_conflicts .typeConflicts
    backfill .quantity = 0
    drop .internalDesc
    move .desc -> .description
    split .creationTime -> .creationTime, .creationTimeEpoch

    // Applied 2099-05-20
    // Make `price` a required field.
    split .price -> .price, .tempPrice
    drop .tempPrice
    backfill .price = 10.00
  }
}

Learn more about zero downtime migration here.

Note that in Postgres, you are likely to apply schema changes to the existing database records. This usually requires additional efforts around building scripts, saving backups, verifying complete changes, synching schema deployments with applications versions, etc. In Fauna, the migration blocks are logically applied at runtime, allowing them to be immediately effective, require no data to be changed, and are self-documenting.

Check constraints

A check constraint is a user-defined rule that constrains the values of a document field. Check constraints are defined in the schema. Check constraints are implemented as predicates that control whether a document is written to a collection by validating that the value of a field is in an allowed set of values or business logic.

Let’s take a look at a simple example.

Given a hasFunds check constraint for the Customer collection, which validates that a customer balance is greater than or equal to zero, and a customer who currently has a positive balance:

We define this in the Customer schema as follows:

collection Customer {
  ...
  check hasFunds((doc) => doc.balance >= 0)
  ...
}

If you try to update a customer’s balance to a negative number, the query will fail.

Customer.byId("388093102421704737")!.update({ balance: -50 })

Indexes

When designing for application access patterns in Fauna, indexes are used in the same general ways as in Postgres. Fauna allows you to create as many as you need and provides for single-value, multi-value, range, and sorted lookups. All indexes are stored separately (GSIs), are strongly consistent, and only store the indexed values. Similar best practices apply for Fauna as Postgres for covering indexes, using sorted indexes to pre-sorted results, etc. Read here for more details.

Backup and restore

Backing up databases is typically done with restores in mind. As Fauna has three separate full replicas of the data stored in separate cloud regions, the need for a full restore is very low due to a primary server failure (lost copy of the data). Another common reason for restores is if data is corrupted and you want to restore data to a point in time. For this, Fauna leverages its MVCC storage of documents to allow for every document to be kept for a configurable TTL (after it’s been changed). This, along with temporal querying, allows someone to gather former states of the data. Thus, it’s likely that a full database restore is not only unnecessary but also a lot more effort. That all said, Fauna does allow for configurable backups and restores.

Massive datasets in a world of pagination

We have seen that pagination is mandatory for enforcing sane limits so that transactions take little time. Let’s look at the recommended strategies in Fauna to retrieve a lot of data or deal with big data migrations.

Retrieving or reasoning over a huge dataset

Although large data sets can be returned to clients, Fauna is optimized to process them within the database. You can efficiently step through sets using pagination or looping functions like forEach() or map(), enabling fast, incremental processing without disrupting other tasks.

When reasoning over a vast dataset, you are likely looking for a database focusing on analytics (OLAP). Fauna is a distributed operational database and keeps your data safe and correct with strong consistency features, but there are better choices for your analytic workloads. However, using Fauna's CDC features, we can easily stream data to a database or service that excels at analytical or other types of queries, such as ClickHouse, Rockset, or Snowflake.

In essence, you get the best of both worlds: you can use Fauna as the strongly consistent heart of your data layer alongside another database that is only eventually consistent but optimized for analytical workloads.

Conclusion

We hope you can now see that Fauna is a superset of document and relational databases. When migrating from Postgres to Fauna, you can support your existing data models and access patterns and implement several optimizations.

If you enjoyed our blog, and want to work on systems and challenges related to globally distributed systems, and serverless databases, Fauna is hiring

Share this post

TwitterLinkedIn

Subscribe to Fauna's newsletter

Get latest blog posts, development tips & tricks, and latest learning material delivered right to your inbox.

<- Back