Considerations when migrating from DynamoDB to Fauna
Is it time to explore DynamoDB alternatives?
Many companies either built or transitioned their latency-sensitive OLTP applications to DynamoDB, a serverless key-value database provided by AWS as a database management solution, first released over a decade ago. To get the most benefit of DynamoDB, however, many companies have had to completely change how they store keys and values and live with many compromises regarding querying, indexing, and evolving their databases to fit with the design paradigms enforced by the AWS service. Furthermore, most of these implementations either have to compromise transactional guarantees to fit the DynamoDB model while distributing their data, incur significant costs to retain some of this partial consistency, or both.
Fauna: A document-relational database alternative
Fauna is a compelling DynamoDB alternative. Like DynamoDB, it is also a distributed serverless database that is well-suited for latency-sensitive OLTP applications. Unlike DynamoDB, and more like relational databases you might be familiar with, it offers significantly better flexibility in querying and cost-efficient indexing. Unlike traditional NoSQL databases, like DynamoDB and MongoDB, Fauna excels at providing the highest data consistency levels, even when scaling out across global regions. This article outlines what you should take into consideration as you plan to migrate from DynamoDB to Fauna, while simultaneously highlighting the many productivity benefits you will realize along the way.
Migrating to Fauna: What to expect
Serverless databases as managed services are a relatively recent offering, and although many providers claim to be serverless services, here’s where you’ll start seeing significant differences. Both Fauna and DynamoDB are fault-tolerant managed services, and you won’t have to worry about instances and servers. However, with DynamoDB, you will still have to define how much capacity to provision to lower your costs and when to balance this with auto-scaling, which is not set by default. Because of the costs associated with DynamoDB, most users end up monitoring and planning for capacity on an ongoing basis.
DynamoDB | Fauna | Explanation |
---|---|---|
Item | Document | An individual record in the database. |
Table | Collection | A container for items/documents. |
Partition Key | Not Applicable | DynamoDB requires users to choose a partition key that determines how data is grouped and distributed among partitions. How you choose this key impacts DynamoDB’s scalability. In Fauna, optimal distribution is performed automatically for customers by automatically hashing a document’s ID, without impacting scale, ensuring one less thing for users to worry about. |
Partition Metadata System | Node | In DynamoDB, the Partition Metadata System contains a mapping of items to their respective partitions. In a Fauna cluster, every node has a consistent copy of this information. |
Transaction | Transaction | DynamoDB transactions are only ACID-compliant within a single region. Fauna supports transactions on all cluster configurations across multiple partitions. |
Read Capacity Unit (RCU) | Not Applicable | Each DynamoDB RCU allows for one strongly consistent read, or two eventually consistent reads, per second. RCUs are primarily relevant to Provisioned Mode tables, however, they’re still somewhat relevant to tables utilizing On-Demand Mode, as RCUs still operate under the hood and can limit burst scalability. |
Write Capacity Unit (WCU) | Not Applicable | A DynamoDB WCU reserves throughput capacity for one write per second. WCUs are primarily relevant to Provisioned Mode tables, however, they’re still somewhat relevant to tables utilizing On-Demand Mode, as WCUs still operate under the hood and can limit burst scalability. |
Read Request Unit (RRU) | Read Op | While DynamoDB RCUs vary in relevance to both capacity modes, RRUs are only relevant to On-Demand Mode tables, specifically in regards to the distinct pricing model. Simply put, RRUs are a unit of measurement for expended reads. Similar to RRUs, a Fauna read op is just a billing indicator and does not provision throughput. Where they differ, is how RRU expenditure can vary based on the desired level of consistency. |
Write Request Unit (WRU) | Write Op | DynamoDB WRUs, like RRUs, measure expended writes; though they do not have variable usage determined by strength of consistency. Like WRUs, a Fauna write op is just a billing indicator and does not provision throughput. |
Painless distributed partitioning
When designing and evolving your model, selecting partitioning and sorting keys is up to you to properly lay out your primary keys, values, and indexes. Picking an ideal partition key is not trivial in DynamoDB, as covered here. With the turnkey auto-sharding functionality in Fauna, you will not have to concern yourself with these partitioning concerns, and your scaling is on-demand in all cases. As a managed service, Fauna manages more of your operational and performance concerns for you when compared to DynamoDB. Beyond the capacity planning and partitioning concerns mentioned above, there are a lot of other benefits relating to modeling, indexing, security, and global distribution. Fauna’s distributed transactions strength will likely be a welcome relief to your team who have contended with DynamoDB’s limited design and query functionality options. We’ll cover more considerations as we continue in this article.
Initial planning
Database migrations are often critical and challenging projects. These are relatively long projects, with a minimum duration counted in weeks and likely extending for a few months. Because of this, it makes sense to plan for an incremental migration. For example, suppose your application is a Software as a Service (SaaS) application. In that case, you may manage your risks by first migrating some of your smaller customers to work out any issues before migrating the higher volumes.
Creating a proof-of-concept
A good first step is to set up a proof-of-concept with a reduced set of data that covers all your entities and all your write and read access patterns. In many cases, the proof-of-concept storage requirements can be completed at a very low cost since Fauna provides a generous free tier to start your development with. You will also likely want to transform your data to take advantage of Fauna’s inherent flexibility as you migrate. For example, you may have had to reduce your item’s size in DynamoDB to fit the 400k item size limit. With Fauna’s 8MB document size, you will likely be storing more of your entity information together in a way that makes sense for future queries and writes. You will also have opportunities to normalize your entities to make your resulting Fauna collections much easier to manipulate and change over time. Since you are able to export your DynamoDB keys and values as JSON-like documents and import them as full JSON documents into Fauna, it is useful to use this proof-of-concept phase to start building transformation scripts as you move the initial portion of your storage to Fauna.
Preparing your development team
With this subset in place, the next step is cataloging all your write and read access patterns to transition them to Fauna. You will likely have these database-related persistence calls in a few libraries in your codebase. Your index configurations will likely be set up in the DynamoDB admin tool, so you should extract these configuration details from there. Also, if you are having to do item-related calculations inside your codebase, it would be useful to catalog these as well since it is likely that you will have opportunities to set up Fauna indexes and functions to execute said calculations on the server side in a strongly consistent way — think of the benefits you can get with stored procedures which are not possible in DynamoDB.
Tools
Like DynamoDB, Fauna is considered a NoSQL database. However, unlike DynamoDB, it has a feature-rich, Turing-complete query language called FQL. Many code examples are available to guide users on how to write FQL and properly write tests against your code, as well as how to leverage Fauna’s user-defined functions (UDFs) to optimize the performance of manipulations and calculations with power equivalent to stored procedures in traditional relational databases. As an alternative, Amazon provides PartiQL, which can be used by various NoSQL database services in AWS like QLDB, S3 Select, EMR, and other Amazon-specific services. Still, it has yet to gain further traction outside of these AWS services. It is similar in principle to Microsoft’s U-SQL language, which has also not been widely adopted. In short, the DynamoDB alternatives to query data are limited at best. Next, set up your staff with Fauna accounts and have them set up a development environment. Initially, they will be able to get started quickly with the Fauna web-based tools to do a lot of the initial development there. Later on, you can have them download Fauna as a Docker container and do local coding and testing against it.
Application security controls
Finally, it is not too early here to start leveraging some of Fauna’s fine-grained security controls. In DynamoDB, you have the granularity offered by AWS’s Identity and Access Management (IAM), with granular access to specific actions (read, update, batch update, etc) and resources (tables, indexes, and streams). With Fauna, you can also support API keys and identity-based access tokens, with native integration options with third-party identity providers that support OAuth (like Auth0 and Okta). You can further leverage the support of robust attribute-based access control (ABAC) in Fauna. It lets you control access to collections, indexes, and user-defined functions. This also includes the capability to add custom business logic that creates dynamic rules to control resource retrieval all the way down to specific documents in your collections. As you should expect, Fauna will store and manipulate your data in a secure way, both at rest and in transit. The result of these initial phases of proof-of-concept and developer immersion should give your team an idea of the kind of flexibility they will enjoy with Fauna. Rather than being limited by the single table design concepts that make DynamoDB tables feel more like machine code than a single spreadsheet, Fauna allows you to regain functionality in your data models that was lost when you transitioned to DynamoDB in the past. Even leading DynamoDB proponents warn against the inflexibility of adding new access patterns, slowing down developer productivity.
Dynamodb alternatives for indexing
In both Fauna and DynamoDB, carefully designing indexes to access your data is critical to your success. There are two types of indexes available to you in DynamoDB: local secondary indexes and global secondary indexes. Indexes are constructed for a single table and can only match one attribute, along with the ability to sort on one attribute. Local secondary indexes are limited to five local indexes and must be defined when creating a table; should changes be required, you will likely have to transition your data over time should different local indexes become necessary. Global secondary indexes are soft-limited to twenty indexes and can become significantly expensive; every global secondary index becomes a full replication of your table, so every write to the main table has to be duplicated to the indexes. Beyond the additional cost of writes, note that these asynchronous writes are only eventually consistent, so some additional application logic may be required to handle stale data. Finally, you also have to account for additional provisioned throughput considerations for those global secondary indexes as well.
Indexing support: The Fauna way
Here is where your developers will start leveraging the power of Fauna indexes to run more complex queries at scale, leveraging features like foreign keys, views, and joins. Compared with DynamoDB, many single Fauna queries can performantly handle requests that would take several request and response loops in DynamoDB. Fauna indexes can perform and persist computations, combine data from multiple collections, ensure strict serializability for read/write transactions, handle multiple sources, sort fields, match fields, and return values. You can directly query indexes for data since indexes contain data similar to consistent ordered views. Beyond this flexibility, the Fauna indexes' data storage cost is lower while providing superior data consistency. The cost savings of Fauna indexes are most apparent when you have complex, mission-critical application use cases where strong consistency and transactional predictability are of paramount importance.
Considerations when converting and migrating your data
Once you have completed a proof-of-concept, including redesigning your data model, coding your write and read access code and optimizing indexes, and leveraging UDFs, it is time to plan your data migration. In most cases, you will want to handle this data migration in two phases: first, you will want to massively export and import most of your data. After this phase is done, you will want to catch up on your data and plan for your cutover with minimal or no downtime.
Bulk migration phase
For the first part, you will want to revise and update your original data transformation scripts used for the proof-of-concept. As mentioned before, your best bet is to do the data transformation from DynamoDB exports in JSON to a JSON document format that fits your new Fauna model. Here you will begin to leverage Fauna’s abilities to handle scale even when using multiple collections. If you have multiple customers, you can leverage both multiple and child databases, which may significantly improve your security posture. You may also choose to evict older data without migrating it, especially if it has already been transferred to your OLAP warehouse. You will be able to take advantage of Fauna’s integrations with such cloud services to do the same from Fauna.
High availability without global table limitations
Suppose you used global tables in DynamoDB for optimal latency across regions and could live with the compromises for using them — since features like DynamoDB streaming, time to live (TTL), and on-demand capacity mode are not available for global table users. In that case, you will significantly simplify your business logic and operational concerns since Fauna distributes your data across regions with high data consistency out of the box. All the features mentioned above (streaming, TTL, and on-demand global scaling) are also supported in Fauna. So, once you are done moving your databases in Fauna, you are already experiencing the benefits of auto-sharding, replication, and scale-out infrastructure in fully-managed, zero-configuration clusters, without having to explicitly set anything up. Further, if you have customers whose data requirements dictate geographical isolation, you can leverage features like Virtual Private Fauna to deal with locality constraints, should you need to. Finally, since Fauna is available across all major hyper-scale clouds (AWS, Google Cloud, and Microsoft Azure), you can strategically transition to those additional providers at this time.
Data persistence catch-up phase
Multiple options exist for the second part of the data migration to transfer and catch up to the latest data generated by your application’s traffic. Depending on your level of expertise and preference, you can leverage services like AWS Lambdas and DynamoDB streams, set up data pipelines if you have the infrastructure in place to do so, or some other custom approach. It may involve writing data in both data stores for a limited window of time until your data writes are caught up, and then use feature toggles or flags in your code to cut over to send all your writes to Fauna once caught up. Fauna is delivered as a connectionless API to your applications, leveraging global endpoints with intelligent routing to make the process of accessing your new data as efficient as possible. Once you have validated that all your data has been transitioned, you can finish the process of removing duplicate code from your application and deleting your DynamoDB tables.
Post-migration usage
Once migrated, your developer teams will welcome immediate productivity improvements in terms of flexibly changing their application access patterns, leveraging multiple indexing options, and not having to tolerate the complexity of living with eventually consistent and possibly stale data. Storing data in Fauna will be easy for your developers to understand and flexibly manipulate as your application requirements change. Fauna’s serverless approach will continue to provide you with very low latency reads and writes while providing predictable data integrity and consistency, even when globally distributing your data, with a much more efficient cost structure and much less ongoing capacity planning required by most DynamoDB alternative implementations.
Summary
Fauna is a robust DynamoDB alternative as a serverless, scalable, ACID-compliant document-relational database that provides low latency transactional performance without compromising data integrity. Migrating from DynamoDB to Fauna will help you realize significant short- and long-term benefits while preserving many of the values that had you consider a serverless distributed database in the first place.
Further reading
To explore more details about compromises and pain points that DynamoDB users encounter, check out DynamoDB pain points: How to address them and exploring possible alternatives. You can also learn more about how Fauna compares to DynamoDB and other competitors in this space. If you want to learn more about Fauna or start planning a migration project, get in touch with us.
About the Author Luis Colon is a data scientist that focuses on modern database applications and best practices, as well as other serverless, cloud, and related technologies. He currently serves as a Senior Technology Evangelist at Fauna, Inc. You can reach him at @luiscolon1 on Twitter and Reddit.
If you enjoyed our blog, and want to work on systems and challenges related to globally distributed systems, and serverless databases, Fauna is hiring
Subscribe to Fauna's newsletter
Get latest blog posts, development tips & tricks, and latest learning material delivered right to your inbox.