is now [learn more]
PODCAST

A DynamoDB deep dive with Alex DeBrie

Welcome to Episode 76 of the Coding Over Cocktails podcast, my name is David Brown, CEO and Founder of Toro Cloud.

Our guest for today is an AWS Hero and the author of The DynamoDB Book, a comprehensive guide to data modelling with DynamoDB. He is an independent consultant who works with companies of all sizes to assist with DynamoDB data modelling and serverless AWS architecture implementations.

Joining us today for a round of cocktails is Alex DeBrie.

Transcript

Alex DeBrie

Hey, David! Thanks for having me, excited to be here.

David Brown

Yeah, great! Well look, I guess the most obvious question to begin with is what is DynamoDB?

Alex DeBrie

Sure, yeah. So DynamoDB is first of all, my favourite database. But in describing what it is, I'd say [there are] sort of three main factors that I use to describe it. So, it's a fully-managed, NoSQL database provided by Amazon Web Services.

So, NoSQL databases. A lot of people have at least heard of something like this. You can compare it in some ways to MongoDB, Cassandra, and a bunch of other ones. Although, you know, there's a lot of differences between those really. But then those two other points are kind of interesting as well. So, it's provided by Amazon Web Services. It's proprietary. You can only get it from Amazon. You can't sort of run it yourself or anything like that. You have to get it as a service from them. But because of that, it's also fully-managed and in a way that's even unique among databases. You know, I think more databases are being like “managed databases” but it's often just, you pick an instance size and that [you] sort of install the software on it and run it for you. Where Dynamo is like this enormous distributed system service they have for you, and it's managed in a different way than other relational databases. You don't have to think about upgrades. You don't have your own instance. It's sort of on this massive shared storage fleet that Amazon is running and operating for you behind the scenes and things like that. So, it's interesting in that way compared to even other NoSQL databases.

Okay, well, where did you get involved in DynamoDB? What inspired you to get so interested in the product that you would write a book about it?

Alex DeBrie

Yeah, I would say it was totally backwards in an accident. And I think it was around 2018, 2017. Somewhere around there, I started working for a company called Serverless Inc. And what they make is a deployment tool called the Serverless Framework that makes it easy to deploy serverless applications using AWS Lambda. There's like this whole new serverless movement that was really kind of taking off. I have used the service framework, loved it, went and joined the team and was doing stuff there.

And a lot of people that were using the Serverless Framework, building with serverless, some of them were learning AWS for the first time, so I was helping them out a lot, just being like, “Hey this is how you use AWS.” And a lot of people in particular were using DynamoDB because of how well it worked with Lambda, because Lambda is like this hyper-ephemeral compute environment and is sort of autoscaling up and down. It doesn't work well with relational databases where you have connection limits, when you have a VPC, there's all sorts of issues using relational databases. 
So then people are using DynamoDB and they don't know how to use it. So, I'm trying to help people use it. So, more people use the framework and I just get into it. I'm using it a little bit and then I watched a Reinvent talk in December. So AWS Reinvent, it's like their giant conference where they have just a bunch of sessions and talks and things like this. And there was this talk from this guy named Rick Houlihan who works for AWS at the time. He helped migrate a bunch of internal Amazon use cases to DynamoDB.

He knows all this stuff and he just talked about Dynamo in this way that just blew my mind and I realised I was using it completely wrong. I watched his talk, I don't know, five or six or seven times that holiday break and took all these notes and then I made my own website. It's called DynamoDBGuide.com and it was basically like, “Hey I tried to learn Dynamo. I couldn't learn it very well, I didn't understand it. This is what I understand from this one-hour talk from this guy Rick,” and tried to distil all that down.

And that guy kind of got popular and then people reached out to me with Dynamo questions and I would be like, “Well I don't know,” but I sort of think about it or research and things like that. And I sort of just by accident became a person in the Dynamo community that knew these sorts of things. And I started working with AWS and doing talks for them and things like that. And I was like, you know, people still don't get how Dynamo works and why it's different and how you need to model it and just all the different stuff from relational databases. I was like, “I think a book can actually work here.” So, I quit my job at Serverless and spent like four months writing the book, and released it and mostly been doing Dynamo stuff ever since.

David Brown 
That's a leap! Well I guess some of the key differences will be related to working with a NoSQL database versus a SQL database. But I'm guessing some are also proprietary to DynamoDB. So, run us through some of the unique features of using DynamoDB and the considerations. We're going to get into the data modelling aspects of it as well, but I guess what are some of the considerations and use cases for DynamoDB.

Alex DeBrie

Yeah, so I think when DynamoDB was initially created, what AWS was creating, what they wanted to provide was a database that provided consistent performance at any scale. And basically it doesn't matter how many concurrent operations you're running against DynamoDB. So, like for example, everyday, the biggest user of DynamoDB is Amazon.com Retail, other AWS stuff. And every year for Prime Day, it's like their biggest traffic days of the year. They release all the stats on how they're pushing AWS to the limits. And this year, for Prime Day, I think they max out at like 110 or 119 million requests per second for Amazon Retail use cases right? That’s an absurd amount of requests per second for that, and it’s still the consistent performance that entire time.

There's also a case they have where Snapchat has like 400 TB of data in DynamoDB and their use case is much bigger than that. They’re petabyte scale tables in Dynamo.  And in both of those situations where you have all these requests or all this data in there, it’s still giving you the same consistent performance. It's not going to degrade over time. So, that's like the promise that DynamoDB is going forward in every way. 
And to deliver that, they built it on just really solid fundamentals and they said, “Okay, this is what we can do.” These are certain things that can scale well and there are certain things that that can't scale well that you might be used to in relational databases or even other NoSQL databases. And they said, “Okay, we're not going to support those features because they don't scale well.” If you have a use case that needs those features, you sort of need to model around that in different ways. But then if you sort of model it for DynamoDB, what you're going to get is that consistent performance at any scale, which is amazing.

It's really freeing as a developer to just be like, “Hey, if I make this work, it's gonna work the same in my test environment on day one when I release it and 10 years down the road.” No matter how much data is in there, it's not going to get slower and slower over time. So it's pretty interesting.

David Brown

You mentioned Amazon Retail and the number of users they were able to serve using DynamoDB. But what aspect of Amazon Retail are we talking about, this using that data repository? Because I'm imagining an ecommerce system, you have relationships; customer and order and invoice and payment and all sort of stuff. So what data are we talking about?

Alex DeBrie

I mean basically everything. So, just to understand how Amazon the company, Amazon Retail and AWS use DynamoDB, there's an edict. I don't know how many years ago now, but basically saying, “Hey, we want every single application a tier zero application,” which means if it goes down, Amazon is losing money. If there's a tier zero application within amazon, it has to use NoSQL and specifically, DynamoDB. And if you think you need an exception for that, [like] you need to use relational or some other database or whatever, you actually need to write that up and get it approved by a pretty high-level person. So they basically said everything needs to move to DynamoDB.
In terms of like relationships and how that works, Amazon has a very, very microservice-oriented architecture, whatever-you-want-to-say type architecture, where it's split up very much. So, if you make a call to the Amazon.com frontend, and if you go to Amazon.com, it'll hit that frontend service, but it's going to devolve into 150 different services behind the hood. They're all aggregating pieces of data. So, very much split out, very small services. Small in terms of functionality but you know, large in terms of the number of requests they're doing.

And so they'll have some small relationships there and there'll be some relationships that are even across services, which gets tricky and you got a model for that. So it's definitely doable. Amazon is kind of a unique place in terms of architecture and size and scale and all that stuff. But yeah, they make it work pretty well there.

David Brown 
It seems like a good segue into some of the data modelling considerations we should be taking a look at. So like I said, I'm guessing some are just pure NoSQL data modelling considerations and others might be something more specific to DynamoDB. So, run us through when we're designing our applications, what we should be considering with our data modelling.

Alex DeBrie

Yeah, I think the biggest thing with NoSQL databases that's different from relational databases is you have to think about access patterns first rather than thinking about your generic abstract data model first. So, if you have a relational database, you often create your ERD, your entity relationship diagram, you understand how your objects relate to each other and then you sort of create normalised tables based on that saying, “Here's my my customers table. Here's my orders table. Here's my inventory table.” Whatever it is, you hook up those relationships between them.

And then you think, “How am I going to query this? What are the queries I need to write? What are the indexes I need to add?” and things like that. With Dynamo and most NoSQL, you want to think about access patterns first. So, you think, “Hey, I need to be able to fetch this customer item by the customer email or by a username or a first name,” or something like that.

And you model that customer item for that because NoSQL databases are usually centred around like a primary key, a primary way you're going to access that data. They're gonna use that to sort of shard that across multiple machines to give you horizontal linear scaling as you grow. That gives you that more consistent performance as you scale. So you need to think like, “Hey, how am I going to actually access this data and model it and optimise it for that?” And sometimes that can mean, “How am I going to access this customer item?”

It can also mean,  “How do I access related items?” because Dynamo doesn't have a join op but sometimes you need join-like operations. Sometimes, you might need to fetch the customer's order history but also get that customer item because you want to enrich those orders with something about the customer. So what you can do is sort of model that data in a way that [in] those disparate items, both the customer and order items are located near each other. So you can fetch them in like a single efficient request rather than having in different tables where you're doing joins or multiple requests or things like that.

David Brown

What does that mean, “near each other”?

Alex DeBrie

So, it's a little tricky without a visual. I love to do this visually but what Dynamo is going to do is it's going to take your database and it's going to split it into what are called partitions that are gonna be a maximum of 10 GB in size. So, imagine if you have like a 25 to 30 GB table behind the scenes you don't see this. But behind the scenes, you're going to have three different partitions across which your data is going to spread and each one's going to hold about 10 gigs of data.

So, the way it sort of assigns data to those different partitions is based on what's called a partition key. This is something you'll specify on your table. It's going to be required for every item on your table. It's usually going to be something that you're going to be accessing that data on. So, if you had a customers table, the email is probably gonna be your partition key because that's what you're going to access that data on. So, when that data comes into Dynamo, it's going to hit sort of a Dynamo frontend called the “request router.” It's going to see your partition key – the email address – and it's going to sign that data to the right partition based on that partition key you've given. 
So, now if you have a 400 TB table or whatever, and you've got 40,000 or however many partitions that would be and you come in and make a query, immediately you can figure out which partition it needs to go to and now you're playing around in 10GB of data rather than 400 TB of data. So that's how that partition scheme is working, it’sIpartitioning your data across multiple different partitions using that partition key to sort of assign these items.

Now what you can do is, say, if I want to fetch my customer item with the order items for that customer. I can give them the same partition key and sort of arrange what's called a “sort key” in a particular way where all items with the same partition key are going to be ordered according to that sort key. I can lay them out next to each other so that I can make a very efficient operation. It goes to one partition, it reads a continuous set of items. Very efficient. You can think of it as like, going to a dictionary and saying, “Give me all the words that start with…” I don't know, whatever sort of prefix. Right? That's very efficient to do that rather than having to scan over all the items in your sort of physical dictionary of words there.

David Brown

Okay. So, where are the limits? So, we just talked about what I would think is a very relational use case in terms of ecommerce. And you're saying, “Well actually, we can break that down into distinct entities and objects and group them together with these unique keys and make sure they're on the same partition so we can query them in one payload.” So, where are the limits?

Alex DeBrie

Yeah. So, I mean, I think the biggest thing with Dynamo is you're going to spend more time upfront thinking about your data model, thinking about how to arrange your data, think about your access patterns, how you’re going to handle those access patterns. And that's going to take some time upfront. I would also say it's going to be less flexible down the line, right?

And it's weird because people have this conception that NoSQL is way more flexible because it's schema-less and all that stuff. But if you don't have a schema, whether it's in your database or your application, you're just totally messed up. So, it helps in some way with a schema because you don't need to do an official migration, which could be kind of slow and costly on a relational database if you're adding a new default column or something like that.
But you need a schema somewhere. And Dynamo is not flexible about new ways to access your data. You can absolutely do that, but it's a little more involved than just like adding an index like you would in a relational database. I mean it's tricky because people ask me what if I want to add more access patterns, what if I want to migrate? And you absolutely can do that. It's sort of like an in-place ETL process where you sort of scan your whole table, update the items that need updates, and different things like that.

So it's not terrible but it is less flexible in the sense of like, “Hey, if your data model is really shifting a lot right now, it's going to be a little more work than sort of just adding an index or adding a new column in a relational database.” So, that's some of it. There's certain access patterns that are tricky in Dynamo too. I always say complex filtering, like imagine if you have 15 different columns you want to filter by and all of them are optional. That can be pretty tricky in Dynamo. Dynamo wants you to have exact matches or like range queries, if you can. And if you have 15 distinct things and all of them are optional, it's tricky to set up the indexes correctly to handle that. So that's a tricky one with dynamo.

There are some hard ones as well. Also, some sort of updates can be difficult. You can't change your primary key for an item. Like, if we had set up those ecommerce accounts with customer emails and now someone wants to update their email, it's not impossible to change it, but it adds more complexity than just doing a single update operation to that one item.

David Brown

Right. If I'm migrating from an existing SQL-based application and I have joins and primary keys, that sort of stuff, does that help in some respects, in identifying the keys you're talking about in DynamoDB?

Alex DeBrie

Potentially. I mean, I think if you have an existing application, the nice thing is you already have all your access patterns planned out. You’ve just got to look through your code and say, “Hey, where am I writing a SQL query? I gotta list that as an access pattern, then I gotta figure out how to do that in DynamoDB.”

If you're doing an actual migration, you know, you'll have to probably do some shifting and munching of your data in certain ways and you're gonna have to make code changes to make that migration happen. But yeah, the nice thing about having an existing one is you already know your access patterns, they're in your codebase at that point.

David Brown

What is consistency and in terms of a database consistency model?

Alex DeBrie

Consistency – that is super tricky. I wrote this post about this recently because they're basically like at least three different notions of consistency depending on what you're talking about, if you're talking about databases or like distributed systems or different things like that. So, a lot of people think of consistency as in the CAP Theorem. Like, there's this tradeoff between consistency and availability if you have a distributed system with replication. And if there happens to be a network partition between some of the nodes and distributed system, you sort of have to choose between consistency and availability.

So that's one thing. There's also consistency in ACID. If you hear about ACID transactions, not that consistency is in there. That's a pretty weak thing that's given. And then also if, you know, Jepsen or Kyle Kingsbury, he does a lot of distributed systems where he talks about consistency, which is these two sort of complex topics.

So, it's tricky to say. I think the thing that comes up the most with Dynamo and consistency is people say, “Dynamo has eventual consistency and I can't use it because of that.” And that eventual consistency, it almost fits into some of those categories I talked about before, but it's almost in other models as well. So, to understand eventual consistency, this is a feature of replication. So with Dynamo, I talked earlier about how it's partitioning your data into all these different partitions, but for each partition it doesn't just have one copy of your data, it's going to have three copies of your data. So each partition is going to be a replica group of three different partition instances or something.

And those are going to be in different availability zones. So now, what happens when a right  comes into your database, a distributed system needs to make a choice. Do I write it to one node and sort of asynchronously replicate it to the other ones? Do I write it to all the nodes, wait until they all accept it, which takes a little longer? But now I know that they've all committed it or something in between.

What Dynamo does is that when a write comes in, it's going to go to the leader for that partition. So, of those three replicas, one of them is going to be the leader. That leader is going to take the write, it's going to write it locally and it's also going to send it off to those other two replicas. As soon as one of those replicas comes back and accepts the right, the leader is going to send it back to the client that wrote the write and say, “Hey, this write has been accepted. We're going to go.” So in this sense, there's one replica that hasn't definitively committed that write yet.

Now if a read came in, right after that Dynamo is going to assign the read to one of those three replicas at random and there's a chance it could hit that third replica that's lagging behind by a millisecond or two milliseconds or something. And because of that, it might not have the very latest version of the data on this particular read operation. So it will get there. It will eventually have a consistent view of that data. No more updates come to that item. Eventually they all reflect the same values for that data. But there's a chance you could get slightly stale data.

David Brown

to deal with a problem like that.

Alex DeBrie

Yeah. So sorry, hold on and take a drink. I talk too fast. Sometimes I get up on this stuff.

David Brown

Just watch your desk as well. You have rolled your desk desk desk rocking. Don't come a knockin.

Alex DeBrie

Exactly. All right,

David Brown

let me ask the question again.

David Brown

How do you deal with eventual consistency?

Alex DeBrie

Yeah, so that's a great question and there's sort of two different issues. First of all, people think, “When I'm writing to Dynamo, it's possible I'll be writing to an old version of the data and maybe there'll be conflicts between different things,” something like that. That's actually not an issue with Dynamo. So Dynamo, as I mentioned, has that strong leader where one of those three nodes will be the lead and all writes will go through that leader. So, write operations will go against the latest version of that data and you can assert conditions on that, right?

So, you can say, “Hey, only write this if this is true about the existing item that's there and that will be a strongly consistent view of that data,” which in most cases, I think that's what people want. Like the example people always give is like, “I can't do consistency because what if I'm doing a banking application, and I double deduct or give money to somebody twice,” or something like that. It's actually like, no, you can make an assertion on that write that says, “Hey, make this deduction from this account as long as the account has at least that much money in it.” So you can still do that. That's going to be a strongly consistent write. You're not going to lose anything there. 
The second thing is like, “Well, what if I need my reads to be strongly consistent?” And another point with that is like, “Hey, remember that the writes to those replicas, they're both sent out at the same time and only waited for one, but that second replica isn't going to be that far behind,” right? They probably were both acknowledged pretty close to the same time. So the chance that you'll actually get an inconsistent read is not that high. But if you do need strongly consistent reads, you can opt into strongly consistent reads on your request and it basically just costs more. They're going to that to the leader for that replica group to make sure you get the strongly consistent reads in that case.

So, I would say in those cases, eventual consistency isn't that bad. The one other thing I want to mention is Dynamo has a feature called secondary indexes. And this is where you can go and if you have that primary access pattern on your data and getting that user by its email address, that's great. But what if I have an additional access pattern? You can set up a secondary index on your data and Dynamo is going to replicate data from your main table into the secondary index with a new primary key.

So, maybe you want to access that customer by their last name or something like that. Now, all of that, that's also eventually consistent replication and that's happening a little later than like the main write to your table. So, that one is more likely to be an eventually consistent read and you can't get a strongly consistent read on that one in most cases. So there's a little bit of read lag there. But again, as long as you know that and understand that and you say, “Hey, my writes are going to be strongly consistent. If I have certain reads that need to be strong and consistent, I just put that on my main table and handle it that way.” Otherwise, hey, it's usually not that big a deal. And I think when you talk to people about that, it's not as bad as I think.

David Brown

It doesn't sound too bad. I know you're the DynamoDB guy and you've written The DynamoDB Book. But for a lot of people, when you talk about NoSQL, it'll be MongoDB which comes to mind and you have written about the differences between the two on DynamoDBGuide.com. Run us through the differences between the two applications. Are there different use cases where just one is hosted and one is not?

Alex DeBrie

Yeah, it's a good question. I would say, you know, in a lot of use cases they both will work, I would say they both build on that core concept of horizontally scaling, using something to partition or shard the data. So, in DynamoDB, it's that partition key splitting into partitions and in MongoDB, it's a shard key that's going to split it into shards and it's a very similar concept there. So, a lot of the data modelling principles you do in Dynamo, they also work for MongoDB. I think the biggest difference between them is philosophical and I always like to say, “Hey, DynamoDB is authoritarian, MongoDB is libertarian.”

So, DynamoDB puts up all these guardrails and just says, “Hey, we don't offer this type of feature because it's not going to scale consistently. And again, what we want is consistent performance at any scale. That's our promise to you. And because of that, we're not going to  give you something you can shoot yourself in the foot with.” Mongo, on the other hand, says, “Hey, we have all these other features. You can do aggregations. You can do joins. You can do full text search.” You can do what are called “scatter gather queries,” and all sorts of different things with MongoDB.

And I'm not saying that's necessarily wrong. I think they're saying, “Hey, you know, with great power comes great responsibility. We're giving you all these things and you can decide for your use case.” Like do you want to use these features with the understanding that “Hey, some of these might not scale well.” And which one is better? You know, I like Dynamo for that. But you know, the power of Mongo is nice. The big thing I worry about there is just, I don't want to say ignorance on people's part, but just being unaware of  which things scale well and which things don't in Mongo.

And, you can sort of run something locally in Mongo or run something that works really well the first month you deploy something in Mongo. But then two years down the road when you have a lot more data, it's not going to scale as well, potentially, depending on that feature. So it’s really on you to understand those scaling properties more with Mongo.

You can do all sorts of things with MongoDB. And I’m not saying that’s necessarily wrong. But with great power comes great responsibility. They’re giving you all these things and you can decide for your use-case… The big thing I worry about there is just, I don't want to say ignorance on people's part, but just being unaware of  which things scale well and which things don't in Mongo. – Alex DeBrie, Author, The DynamoDB Book

David Brown

I mean AWS does offer a MongoDB-compatible service offering, with the emphasis on “compatible” because it is an older version in terms of compatibility and stuff like that. So presumably, they're offering scalability for that as well, but in a different sense.

Alex DeBrie

In a different sense because you know, they're still going to support most of those features that MongoDB supports. And there are just certain things that don't scale super well, you know, [like] aggregations. I mean the problem with the aggregations and joins or whatever is that they could be unbounded, right? And so you know, when you have 100 records that you're aggregating, hey, that's not that much to do.

But what if you do an aggregation that runs over two million records or a billion records or something like that? It doesn't matter how well you've indexed that or anything. That's just going to be an expensive operation. You’ve got to read a lot of data, you’ve got to submit in memory and send it back. And so just having those APIs results in things that could potentially scale in unpredictable ways.

Whereas Dynamo says, “Hey we're not going to let you scan 15 gigs of data in a single request. Well, you can do a mega data max on a query operation and then we're going to send you a pagination token back and you can get the next mega data if you want to paginate over that.” But in one single request, hey, it's not going to take longer than you know, x milliseconds.

David Brown

I mean you give me the impression there's not a lot of downsides to using DynamoDB, it's just design considerations you need to have up front. Like you said, a lot of people are going to NoSQL thinking they can do whatever they like and change the design as they're developing because it's a NoSQL database. But there must be specific use cases which suit DynamoDB particularly well. What are those use cases?

Alex DeBrie

Yeah, sure. And if I'm giving the impression that like there are no downsides to DynamoDB, I'm doing a bad job because like I think, database systems specifically, but all software, there are different tradeoffs you're getting, right? And with Dynamo, it really is taking away some features you're used to; joins and aggregations that are going to make your life easier in certain ways, and you need to model around that. You need to learn a new way of data modelling because a lot of people have learned relational database modelling, there's absolutely a learning curve there, So, and and then again you lose some of that flexibility if you want to add new access patterns, it can be more difficult than in a relational database. So there are absolutely challenges there. Um in terms of like where does it best work with dynamo with, what, what works best with dynamo? I would say the core use case that came out for was super high scale applications, right? They build a database that could work for amazon dot com, retail, uber Snapchat, Disney plus all these things and and you know, if you're doing hundreds of thousands millions of requests per second, lots of data and want consistent performance. That's like what it was really aimed at. Um I was mentioned earlier how I got into dynamo in the service world and and lambda really just works a lot better with dynamodb than with relational databases and I think that's where you see this sort of second group of customers really picking it up because that now you're getting more knowledge sharing around dynamo and what those patterns are like. And the real thing is I think, you know, the vast majority of O L. T. P applications can use dynamodb, they can use mongo db, they can use relational databases things like they can use whichever one they want and in certain ways you're just picking different trade offs, you know, which one do I already know which one do you like, what kind of operational burdens do I want or not want? You know, what kind of flexibility do I want, Different things like that? And you're mostly just choosing, I would say there are certain ones that that aren't gonna work well for dynamo. Again, that that complex filtering one is gonna be a tricky one for dynamo, you're probably gonna need to use some sort of external state system to augment their, depending on your specific needs to make that work. Um If

David Brown

the external system where you're talking about a search index or in memory database or

Alex DeBrie

I would say like the most common one I saw early on is elasticsearch, a lot of people feeding their data into elasticsearch and it could be for full text search operations but like elasticsearch is, is technically search, but man, it's really just like a distributed compute engine, right? And you can do all kinds of things with it, you can do aggregations and time series, you can do exact match filtering on just a huge distribute set of data, you can do that full text search type thing, so it handles a lot of access patterns. man, I just, I always tell people elasticsearch is like the complete opposite operational dynamo db where it can just blow up at any time. It scares the heck out of me. So I always say if you're gonna use elastic like use as small of it as you can and don't just start dumping everything into it and clearing it because it just becomes an operational nightmare. But if you keep it small maybe you have to overpay for the instance a little bit, but you're not like pushing the limits of what elastic can do, you can do that. Um so elastic was like the default one, another one that I love is, and I point a lot of people to is called rock set and it's by a bunch of people from facebook um that created rocks DB which is like this embedded key value stores, super efficient use like, you know, in database engines all over the place, but basically what they are, is like they will hook into your operational database, whether that's dynamodb streams where that's mongo, whether it's like your postgres or my sequel replication log and they'll just ingest the data that's coming from your database and then re index it in a couple different ways. In the secondary index, you can get like an inverted index, you get a columnar index, different things like that and then you can run sequel queries across this and it's, it's operationally very fast, it's very low maintenance um and and it gives you a lot of flexibility, so they're really good at that complex filtering piece that I talked about before, they're good at like customer facing aggregation piece and they do it in a way that, that doesn't scare me quite so much like, like elastic

David Brown

alright, I'd like to finish off with a wish list from you. Do you have a wish list of things you would like to see incorporated in dynamodb in the future?

Alex DeBrie

Yeah, I actually wrote up something and gave it to the dynamo team and they were great and a bunch of it they'd heard before so none of its huge some of its api improvements. I think the two big like feature ones for me, one would be, hey, what if you, what if you added some of these secondary indexing features to dynamo, like, like the inverted index or maybe even a search index just automatically on the dynamo. It's tricky because like those sort of indexes, whatever that add on would be, it would break that promise of dynamo of its consistent performance, guaranteed you sort of have to market that in a way to make sure that people understand the differences between core dynamo and sort of these add ons. So I will turn on that but I wish there was something easier um with that again, check out rocks and I think that's awesome to um the other one I, I recommend or I'd love to see something, I call table functions where I just want to, if I'm doing a migration often I have to do that whole process where I scan my table, I look at the items, I'm getting in my scan and maybe update them um to do that and I have to like maintain and operate all this stuff and a bunch of people have this use case, I wish I could just give them a lambda function and say run a scan a managed scan over my table and feed those items into my lambda function and I can operate on those items, however I want to do, update them in there, send them to some other system whatever, but just all you're gonna do is manage the scan for me, guarantee that you're gonna get every item in my table. Um and I'm good to go on that. I would love to see like I call it table functions, but just a way to sort of operate on my entire table. In a, in a managed way,

David Brown

great ideas, Alex I understand your book is self published. So can you tell our listeners where they can find and buy your
Alex DeBrie

book? Yeah, sure. So I publish on gum road, but you can search for dynamodb book, go to dynamodb book dot com to find that and buy it there. Yeah, if you have any questions. Hit me up on twitter email, my my my contact information is available. If you, what
David Brown

is your twitter handle?

Alex DeBrie

My twitter is Alex b as in b debris, Alex be debris,

David Brown

Alex DeBrie, thank you so much for your time today. Really interesting talk on dynamodb,

Alex DeBrie

awesome, thanks for having me.


Listen on your favourite platform


Other podcasts you might like