Transcript
Kevin Montalbo
NoSQL databases are known to provide agility and flexibility. When compared to traditional SQL databases, they can be scaled across thousands of servers, making them well suited for working with large sets of data. As a result, they're often used for big data and data analytics. Deciding on using no SQL is one thing, but choosing which one to use is a vital factor for any organization as these databases can vary in architecture functionality and more. Joining us from Australia is Toro Cloud CEO and founder and Cocktail’s co-host David Brown. Good morning, David.
David Brown
Good morning.
Kevin Montalbo
In this episode of cocktails, we discuss no SQL databases with Joe Karlsson, developer advocate for MongoDB. Joe shares his insights on NoSQL and SQL databases, their similarities and differences and which you should choose for your next project.
All right, Joe Karlsson. Welcome to the show.
Joe Karlsson
Hi, welcome, welcome. Thanks for having me.
Kevin Montalbo
Thank you very much for being here., absolutely. What an energetic start to our podcast? OK. So let's jump right in. So can you tell us what is no SQL database and why should developers consider it?
Joe Karlsson
Yeah, absolutely. I'm just gonna speak really broadly about it. But no, no SQL is a, what's the word, an acronym for not only SQL emerged just when we're like beginning of starting to look at other database models that aren't SQL or relational. No SQL is a massive term and encompasses basically anything that isn't a a relational database that includes time series databases, graph databases, key value stores, document based databases. I'm sure I'm forgetting a couple but a lot.
David Brown
And, and in what category does MongoDB take?
Joe Karlsson
Yeah, absolutely. So Mongo DB is a document based database and basically what that means is instead of saving your data in like a relational rows and columns, table type format, you're saving it in documents. And the key differentiator there is like you like programmers, we're used to saving and working with data in JSON like objects or dictionaries, you know, so like we can save the data the way that we think about it without having to use an orm to map that back and forth between data. And
David Brown
So why is that a good thing? Why is a good thing to lose the concept of tables and relationships? And why would someone want to move towards a document based database instead?
Joe Karlsson
Well, you don't have to get rid of relationships which we'll, I think we'll talk about that today a little bit more. But yeah, it, so there's a lot of benefits to it, but like not being able to work directly with the database and not using an orm actually, like it removes the whole layer of abstraction. It can actually increase query performance and make it easier work with the data. And yeah, like we're, I could save the data the way that I'm thinking about it. I don't have to like map it back and forth. There's also like some, there's some key performance gains you get to from embedding your data directly in a document as opposed to joins on a foreign key.
David Brown
I guess a lot of developers you know, they, they spend a lot of time thinking about entity design. So in a no SQL database, does it mean they don't need to worry about design anymore and then they can, you know, basically build their contact entity or whatever on the fly and add, add extra fields as they need them, that sort of thing.
Joe Karlsson
Totally. Yeah. Actually quite the opposite. I think it's a common misconception with no SQL databases and particularly document based databases. I'm a developer advocate and software engineer at Mongo DB. So I'll talk about it from a Mongo to B document based perspective. But yeah, no you still need to worry about it. Schema design just like with SQL development is one of the key parts about increasing query and right performance.
Um I think it's one of the things that people don't give enough time and energy to when they're developing or like working on a no skill database. Actually I just spoke at a conference today about that. Yeah, it can like hurt performance. I think a lot of people who like complain about their Mongo to be database, not like scaling. Well, it's nine times out of 10. It's a, it's a scheme design problem. And kind of making sure that the fact reconsider it.
David Brown
And so how does that affect future modifications of your document design? So if you do want to change the you know, the fields in your entities and that sort of stuff, should you rebuild your documents or in order to maintain performance? How does that work?
Joe Karlsson
You could I honestly wouldn't recommend it. Just like you would, but like like data requirements and and future requirements are always changing. They're always growing and even with an SQL database, even like day one, when you launch it rarely, you know, six months to a year down the line, is that database still the exact same schema that you need to follow at the beginning. And the same thing with MDB database too, like you, it's the right software never, it never ends software development ever change. It's always being updated, change, expanded on. So, schema design changes are, that's just part of life. You can't avoid it, right? Even if you day one, you have the perfect, perfect schema, you know, which doesn't exist, right. But like, um, it's going to change.
So, and typically we do an SQL database is you run migrations and you make changes to it and those are, but you can do the same thing with the no SQL database with Mongo DB. That's definitely not an anti pattern. Yet you could pause it and dump it and restart it again. I wouldn't recommend that most of the time you could probably have some downtime and it's not really necessary. Yeah, I would probably just like copy that data over to another like DEV database and start running some migration queries on it or like migration updates on all your data. And then run the query. OK.
David Brown
So what, what should developers consider in terms of the schema design and optimizing it for performance, you know? No SQL. Is it the same considerations as a relational database or is it?
Joe Karlsson
Yeah, it's confusing. So I'll talk about it from an SQL perspective which we might, people might be familiar with or not. But there's very prescribed and well researched approaches to SQL scheme design and we typically do that with normalization. Most developers normalize to the third form. Basically what that means is that like, hm with a relational database, your concern is not how that's gonna be used. It's what data you have and I'm not saying that's always true. But like I have a users, I have some user data and I have some professions, maybe they have a class schedule and like,, I should just split that up and we'll, we'll do some joints and that with the foreign key. So normalization is like typically what we're doing with M go to B scheme of design and document based scheme of design.
There's no rules, there's no process, there's no algorithm. The only thing that matters, the only thing that matters is designing a schema based on the needs of your database. So like a schema might work for you, but it, you know, in a very similar application, it may be totally different for someone else. I'll give you an example. So actually I just recently built an IOT kitty litter box. So it measures my cat's weight over time and how often he uses the bathroom. But so I designed the schema based on how I'm gonna be using and reading that data. And typically we use with IOT data is you do like you chart sensor data over time. So I'm designing my scheme on a time series type schema that's gonna be optimized for reading out of a chart really quickly in real time. And that made sense in my application, it doesn't make sense for everyone. And also but yeah, you could still map relationships and model things however you want
David Brown
A IOT device for kitty kitty monitoring. Is that a new career path, new product we're gonna see, we can see on Amazon marketplace.
Joe Karlsson
Yeah, I haven't quit my job yet but, um, I was actually just looking on wire cut or, um, I think it was on Wired. Yeah, wired. But they, there's a robotic IOT litter box on the market today. Wired gave an eight out of 10 and it's selling for $500 a pop right now. So if anyone wants to steal that idea from me, all the code is open source. You could totally go and take that and monetize it. I am not doing that. I just did it for fun and I talk about it at conferences and whatever,
David Brown
There would be plenty of people that would be interested in knowing that sort of stuff.
Joe Karlsson
I found that to be true too. It's been my most popular talk and blog posts by, by far the most popular open source project. All right,
David Brown
Let's talk about relationship database design. So, yeah, relational databases are good at relationships, right? So that's, that's what that's how they originated. So you you have a database scheme of design which is highly relational like AC RM system or perhaps transactional systems like a water management system, SQL databases typically in the database of choice. So what would you say to those that argue that a relational database is better suited to data that is highly relational.
Joe Karlsson
Yeah, totally, the two biggest misconceptions about Mongo DB are one that does not support acid transactions, which is false. And two that it doesn't support relational joints, which is also false. Actually in our aggregation pipeline, you can do a joint or we call it a look up and you can join data from separate collections of databases. No problem. Yeah. relation relationship building is not a problem, right? So when you're designing a scheme of man to be, there's two things you can do, there's only two things, two choices you have to make for every piece of data, either embed this directly in the document or I reference it using a foreign key just like you would with relational database. I think there's been like, and it's II I will admit it's a newer feature and it's something we've listened the community on they've been asking for for years.
So we built it, I think it's been out, it's like version 4.2 or whatever. But yeah, you could any, any relationship you can model with a SQL database, you can totally model with a MongoDB database. And you actually in fact had additional flexibility because you can start embedding that data directly in it, which increases performance and whatever, right? But I don't have to do look up and, and even in an SQL database joins are really expensive and I don't know if you know how it works, right? But if I have data in two separate tables, I do join in them. It basically pulls all those tables into memory and it runs an sql query on that join data set in memory, that's expensive time wise and memory wise and that can become a blocking operation at scale. But if you want to do that, that's a massive gain.
David Brown
And I think a lot of people would be surprised that Mongo DB supports acid transactions as well. Now
Joe Karlsson
That is my, that is my, yeah, number one misunderstanding about MongoDB doesn't support asset and like you could, yeah, we, we just like, I think even like at six months ago, we, you could start, you can do acid transactions on charted on data clusters. You have data distributed all over the world. You can still run an acid transaction on that and you can control the amount of the right concerns how many like replicated shards it goes to or whatever you can, you can control that.
David Brown
I noticed on one of your blog posts you have like a like for like in terms of terminology between Mongo DB and SQL databases. So it's got to join in SQL and something else in Mongo DB and, and the like, so it seems to be like that there is a like for like now in terms of Mongo DB and, and SQL database. Is that true?
Joe Karlsson
100%. Yeah. Like we call ourselves a general purpose database. And I get asked all the time too, like, where is, what's, what do I use? Do I use this or this? And, like this database of this database and, like, 90% 99% of use cases would be worked just fine at a Mongo DB database.
David Brown
So this is probably the wrong question to ask the advocate for Mongo DB. So where, where is the downside then? What is that? 10%?
Joe Karlsson
Totally. I mean, it depends. I like if anyone tells you that a piece of tech is a silver bullet, they're lying, they're full of shit, right? Like that doesn't exist. And it's not, it's not gonna and that and like, I hate in tech too, it's like we're using this thing and someone comes in and be like, you gotta use this language or this framework, you know, like a stack overflow. So it's like that's not helpful, you know what I mean? So like if you're already in SQL shop? Cool, great. I post grads QL is awesome. I've been using it for years. I still use it. I think it's great, right? There's a lot of great use cases for it and like we're a document based database, like maybe a key value store like REDIS is a better fit for like saving user session I DS or like some like men cash for even faster lookups, right. We write to disk. So, I mean, there's lots of, it depends on the problem you're trying to solve. Um, and the types of data, the data structure you're working with. Um, but, and what you're already using, you know what I mean?
David Brown
And the skill sets you have in house that you can already leverage
Joe Karlsson
100%. Yeah, totally. If you're like an SQL master. cool. That's great. But, but like I think the database should be used, like a database should help make your life easier and if it's making it harder, that's probably a problem. But yeah, if you're already superficial with a piece of tech, cool. Yeah, go for it.
David Brown
You mentioned post Post press, you know, supports JSON B now. So how is that seen as a threat where relational databases are sort of crossing the boundary and supporting you know, no SQL type functionality?
Joe Karlsson
Yeah, totally. Yeah. And honestly, it's flattering because I think we're seeing a lot of companies now like Amazon has document DB and Azure just dropped Cosmo DB, which is, and it's based on like they've totally ripped off the MQL, the M DB queer language syntax which is great. It's like super flattering, right? It's like we're doing something, right? And the industry is moving towards the career language that we're designing. Awesome. And post post SQL is the same way I'm actually, I'm generally seeing a trend where SQL databases are becoming more like no SQL databases and no SQL database become more like SQL databases, right? Like we're supporting as the transactions all this stuff. But job I think, and I get asked that question a lot too and it, it's important to understand what working with Json B documents looks like compared to Mongo DB documents. So for example, like querying a Json B document is a lot harder to do than with MQL or MB query language. So like you have to use proprietary SQL, you have to make sure you like and it's usually pretty complicated SQL query to like get the data you could get with a much simpler MQ or MDB query, right? You're also gonna have to have like all of the legacy relational overhead that you wouldn't have to.
So like you still have to do the mapping, you still have to have an orm to help interact with that, which is additional abstraction, which is gonna be an additional performance hit on you too. And there is natively, there's no data governance within the JSON B document. So you have to have to have a client side like data governance model to like protect what you can a can or cannot access or the scheme of design within that, that JSON document. It's basically a blob right? With MQL or MDD, you can enforce the schema on a database level, you can just control the structure and you can add indexes to deeply nested components of that JSON document with MongoDB. Yeah. So there's, there's it's very similar but the feature completeness and like the additional overhead you have with an SQL database is a lot bigger. And it's something you should be taken into account too if you're, if you want to go that route. But I've used it, it's great and it's, if we're not complicated JSON Blobs and you're using post grads, that makes sense. Like go for it,
David Brown
You use for it. You mentioned Azure and, and the like are coming out within an AWS for documentdb. There was a bit of controversy a couple of years ago when MongoDB changed its licensing model. It was one of the first to change his license model because as you say, the public cloud providers were using his tech without paying for it. And and so it changed his things to get some revenue stream for those that you're gonna be using it in a, in a public cloud environment. You've now got the Atlas service where you can host with BDB. You know, good decision working out well for the company. Is it? Yeah. Is it, is it, have you seen that as a general trend in the open space?
Joe Karlsson
I think we're seeing that more and more. And I agree like there was a lot of backlash on that when it first came out. I also think there was a lot of misunderstanding about it. So like, and the SSPL hasn't been endorsed by the open source foundation, but the key is right, like if you're selling MongoDB as a provider, either have to open your stack up or pay us for the license, everyone else is good to go. Like if you have an Ecommerce shop or whatever and using m to be that does not apply the license and apply to you. The open source rules are still the same, but it's only if like you're making money off of mon like of code that we paid to produce. And I think that the industry is softening to that too because I think developers are weird about it. I think like, like as a company, you need to make money, right? And like with open source, it's hard to do. And I think the SPL is a good way to like, do a thing without being like having a giant like Amazon Google or Microsoft rip you off or like make money off of your intellectual work.
I don't know, I think, I think that there's a lot of benefits to it too and I think if you're not one of those humongous companies trying to mon like, monetize a service, you're fine, right? And yeah, Atlas has been, it's been great for us. And it also allows us to be way more open, right? Like so you can run really one of the only databases like if you go to Cosmo or Documentdb, they are great databases. But you're locked into that vendor transferring around is super hard and we actually we just unveiled multi cloud last week. So you can install a mom gonna be cluster on Google, Azure and GCP all at the same time you can do replication for. So if the whole data center goes down, you're totally fine. You can't do that with anyone else because we don't care where you go
David Brown
Right? Cloud agnostic and a lot of companies are spending a lot of effort trying to be trying to be cloud agnostic, right?
Joe Karlsson
Totally. Right. Like, I mean, no one's no one's to be locked in, right? And that's how it gets you like cheap, they pull you in and they start jacking the prices up. But if you can be more flexible, that's gonna be, that's a huge, that's a huge win.
David Brown
So, so where do you see the competition mostly coming from? Is it from the cloud native solutions like your document DBs or is it more your open source solutions? Like a couch DB?
Joe Karlsson
I don't even think it's that either of those. So like the important thing to note too with the competition of like Cosmo and document is like let's just go document DB with AWS. So AWS, it's, they're based on the last fully open source release we did, which is version 2.4. We're in 4.4. Um, we're currently so that all of them are about like on independent testing we've seen is about 65% feature completeness with MongoDB. Our current releases. Um, so you're seeing like massively short on features. The other thing too is like document DB is based on,, I'm forgetting what it is. It's a SQL, it's a relational database backed on the back end.
So what typically happens is that like, they're copying the MQL cur language but putting it on top of a relational database and some of the, and you're gonna get the relational cons with that, which includes, like not being able to shard or split up the data. And typically what happens is you see that companies, like, it becomes massively expensive to run these data, these, these databases on their kind of competitors and like the companies are great, they're awesome products. And if you're already in that ecosystem, cool, makes sense. But it's important to understand like what you're sacrificing there. And I think people assume they're getting the full man good to be experienced, but they're not
David Brown
As I understand it. It's running out of post stress database where they've replicated the M DB API.
Joe Karlsson
Right. Exactly. Yeah, exactly. It's, and it's a flattering copy. We're like super happy about. It means, I mean, like I said, it means we're doing something right? Like the developer community like loves the MQL cur language. It's easy, it's intuitive. But you lose the benefit some of the benefits.
David Brown
What about the open source? Yeah, players like catch DB.
Joe Karlsson
Yeah, I mean, they're great, they're awesome. They, they're not, they're not seeing the huge, like the mass amount of growth that we're seeing. And the mass investments in it like in their platforms, I think that we're trying to become like a data platform, not just a database. So like we have a CUS platform built on top of it, we just unveiled a brand new like GraphQL endpoint because we know the JSON type like structure of your database. Like you can hit a button and we'll generate a graph, a cus GraphQL and point to make product on your data instantly.
David Brown
Also I wanted to ask you about that. So I, I actually did read some of your content on GraphQL and support. So what makes MongoDB a natural choice for, you know, when you're working with APIs? Yeah.
Joe Karlsson
Yeah. I mean, like we sending the JSON is the payload data of the web. That's all we're sending around, right? That's a GraphQL works. We're even querying now with that same data structure. And we're sending that data around. But like, like I said, like as developers, we're saving date, we think about things in terms of nested key value pairs in terms of documents J or JSON and objects and dictionaries, whatever. But it makes sense to save the data that you're passing around, right? Like if I can just save a document, that's an exact payload formation that I need to send back to a client That makes total sense, right? Yeah.
And you could be more, what's the word like detailed about the declaring what it is? Like we're not, we're not, we don't save things as JSON because if, if you know json, you'll see that the key value pairs are all strings and it's up to the browser, the client to kind of decode whether it's a string or an in or a float or a date. But with we're, we're B so which is binary objects notation instead of Javascript, right? So we can save things in more detailed things, but we can tell clients about this more detailed information as well. So TLDR, we're, we're sending data on the web the way that we're saving it in Mongodb, which makes sense.
David Brown
And, and that just if you could elaborate on that GraphQL support you were talking about. So, and that sort of native GraphQL query support. How does that work?
Joe Karlsson
I'm obsessed with it. I think it's a cool thing. And, I think we, we haven't talked about, I don't think enough people know about it. I like. So I'm a fte javascript developer. I'm gonna be honest though. I haven't really written a server for a long time. And what I'm, what I've been doing for the last couple of years is setting up a database in the cloud. I use a service provider on the front end to send transfer data to a static front end, right? And jam stacks totally bringing us that way. But it just makes it even easier to set that up. So just like have a database, you set up a service provider to send some data to your client, your static front end and then transfer around. But this graphQL thing it, graphql is-, I love graphQL.
I think it's expressive. I think it's interesting. I helped implement it at a large ecommerce store at my last gig. And it saved us a ton of time. It just makes front end developer at like development mega mega mega faster, right? Because I can just ask for what I'm looking for and get it back. But now it's like even easier to get set up like you like I used to have to set up my own node microservices to like handle all the graphical end points and have to do all the design the scheme on there and deploy it and that was a massive pain in the butt. But now I don't have to do it. I just hit, I literally hit a button and it generates the whole thing for me.
David Brown
So that's the difference we're talking about here. The native graph support is that there's no middleware required to make that query to the database.
Joe Karlsson
Exactly. Right. Like we, we know your data better than anyone, right. Like we like we know the structure because it's stated as BSON. We know the data types. GraphQLis an opinionated schema based API provider. And we can we, we figured out, hey, we can actually make that for you for free, which is so cool
David Brown
And it was so cool. I love it. How you get so excited about it.
Joe Karlsson
I love it. I love it. I wish more people knew about it. It just from a time saving perspective. It's, it's so easy
David Brown
Now there's still a place for rest obviously. And I don't want to get into the whole debate versus rest versus there's a, there's a place for each, right? And, and you know, we're obviously big fans of restful APIs with in particular Open API support. But we are also increasingly supporting graphql and you'll see some big stuff coming out from us as well. GraphQL space. But um
Joe Karlsson
You can both, we did the same thing at my last company. You have both sitting side by side and make API request request, whatever.
David Brown
So in terms of rest support in M DB, are we just talking about, you know, there is does need to be some sort of in between to make those request.
Joe Karlsson
No, no, no. So we still have a service, it's called realm. So we have a whole service. We have a service front and that basically sits in front of your MD DB databases. You can set up like triggers. So based on data changes, you want to fire up some service function or make an api request to it. No problem. We like we could do all, we do all that for you for free. And again, like those are some things that you wouldn't get with a a competition. Like we want to be like you wanna make, we want to make handling data is ridiculously easy. We want to make it so easy. That's the thing. Most developers don't care about most of the stuff. No, they don't care about depth like replication and right concerns and Shard. No one cares, right? And some people do and it's important don't get me wrong, but like most people just want to like put some data somewhere and get it back that as easy as possible. Yeah, exactly. Let's make it. And trying to make that is like easy as a possible experience as possible. This is my wild speculation for the future, but I think that the winner, right, the next oracle is gonna be the one who makes working with the data easier than anyone else. We're seeing time over time, over time in the developer community, like the tools that are easier to use, like developers are smart but like you still gotta make it like if you have to jump through too many hoops you're gonna lose. And I think we're gonna see increased abstraction and use for just getting retrieving and scaling up your data with little to no downtime.
David Brown
I was gonna ask you something and I've totally lost my train of thought.
Joe Karlsson
I got one more thing to say about it while you think about it. I have another wild speculation for the future if anyone's listening to this in like 2026 or something, I may be way off on this but I'm my best guesstimates for the future of like data is gonna be machine learning, like having machine learning models, make automatic adjustments to indexes based on your queries, automatically charting, petitioning, distributing data based on use cases. Like, hey, you have a bunch of users in Hong Kong, let me replicate this data over to a Hong Kong data center. So it's super fast, right? And right now we all have to manage that as a human being.
We're get, we're now seeing like more alerts for doing that like, hey, we're seeing this, these recommendations because that's what we're data company, right? Like we can start making models to start massively analyzing and making it really smart recommendations for people. So it makes it even easier, right? We already have performance query monitors or like index suggestions like, hey, we see that this query is being made a ton and you could increase your average query time by blah, blah, blah milliseconds by implanting this index or whatever. Yeah, I don't know. I'm fascinated to see like how much automation is gonna like, help us develop databases in the future.
David Brown
Kevin you put in your diary 2026 back.
Kevin Montalbo
We'll do it and we'll revisit this podcast.
Joe Karlsson
You can, you can roast me on Twitter in 2026. This, these don't come true. I don't know when this is gonna happen. I just, I think that I, I don't know cats are the bag for data modeling and-
David Brown
It makes a lot of sense. I did, I did think about my train of thought. You were talking about how we can now query Mongodb directly by, by restful request for graphql. Is there any use case to have having some sort of proxy in front of M DB where you might want to transform data, manipulate data?
Joe Karlsson
Yeah. Yeah, totally. Yeah. Maybe you wanna get data from a separate data source like a relational database and you wanna massage those together before sending it off to who's ever requesting it or? yeah, maybe you're set up your own and the great thing about GraphQL. All right, like it's database agnostic. So you could set up a bunch of different databases and it's just querying a bunch of different things and consolidating all together. Go for it. Cool.
David Brown
That was my natural segue to Martini obviously facilitates that, our product.
Joe Karlsson
Wicked smart, wicked smart. Yeah. Totally. Yeah. Yeah. Yeah. No, absolutely. Absolutely. Again, like there's no silver bullet, right? Yeah. Go for it. Mix it up and I think, but what's the term polyglot? Just like using, use, use a bunch of stuff, whatever you, whatever you want. Most of my application send up you at least a key value store, you know and a document based database.
David Brown
Good stuff. Thank you so much, Joe. We've run out of time. It's been a pleasure having you on the show.
Joe Karlsson
Oh my gosh. I had so much fun. This has been a blast. Thanks for having me. hopefully we can do this again sometime.
David Brown
Awesome. I'd love to
Joe Karlsson
I was just in Kansas City last summer, but hopefully hopefully soon for Kansas City defcon. Let's be back.
David Brown
Good. And when, where can, where can our listeners follow you and find out more about you.
Joe Karlsson
Y'all can roast me on Twitter at Joe Karlsson one. I make tiktoks and funny videos on there. I also post programming tips too often. Some would say. But I have lots of great stuff, everyone should chat about whatever I would love to hang out there.
Kevin Montalbo
Thank you very much Joe Karlsson for being with us to our listeners. Are you working with databases? Do you have any stories that you'd like to share? Let us know in the comments from whatever podcast platform you're listening to. Also, please visit our website at www.torocloud.com for our blogs and our products. We're also on social media, Facebook, LinkedIn, Youtube, Twitter, and Instagram. Talk to us there because we listen, just look for Toro Cloud again. Thank you very much for listening to us today. This has been Joe Karlsson, David Brown and Kevin Montalbo at your service for Coding over cocktails.