Transcript
David Brown
Welcome to Episode 79 of the Coding Over Cocktails podcast, my name is David Brown.
Our guest for today is the author of “APIsecurity in Action” and an experienced Security Architect. He has more than 20 years experience as a software developer and security professional, and a PhD in Computer Science.
An active member of the OAuth working group at the IETF, Neil has a deep knowledge of internet security protocols and applied cryptography. Until recently, Neil was the Security Architect for ForgeRock, a leading IAM vendor,
and now runs his own training and consultancy business.
In his spare time, Neil plays guitar and cycles and hangs out with his wife and daughter in the UK Cotswolds.
Joining us for a round of cocktails is Neil Madden! Hi Neil, how are you doing?
Neil Madden
I'm doing great, thanks! It's good to be on. Looking forward to it.
David Brown
Yeah, thanks for joining us. Just before you joined, you said you just moved into a new premises and you're surrounded by boxes, still waiting to be emptied. Nothing worse than moving house. Yeah.
Neil Madden
Yeah it was pretty stressful. The buying and selling process in the UK is kind of crazy if you don't know it. You don't really know you're moving house until like a week before it happens and then chaos ensues. So yeah, it was kind of a fun time.
David Brown
Well look, let's jump into the real topic of conversation which is security in APIs, in particular. Let's just start off by discussing the threats and vulnerabilities that publishers should be aware of with API.
Neil Madden
Right, so “API” is very broad. You know, “application programming interface.” And you know, there's obviously lots of APIs; used locally on your machine to talk to the operating system and to talk to libraries and things like that. But the book is really about web APIs. APIs that are made available remotely over the internet using web technologies. So specifically, mostly RESTful APIs; So HTTP APIs using JSON.
And because they're using web technologies, most of the vulnerabilities you have to worry about for a traditional web application are still relevant, lots of them. Some of them are less relevant. And then you've also got things that are more specific to an API. So APIs, by their very nature, are designed to be automated and used by machines, which also makes it easy for attackers to kind of automate them and automate attacks and things like that. For example, the OWASP, the Open Web Application Security Project – a great resource by the way, if people are looking for security stuff, they published traditionally the very famous “Web Security Top 10,” which is like the top 10 threats you should worry about in security. And they've also recently started publishing an “APIsecurity Top 10.” So, that's a really great resource and they kind of list these 10 things.
So, you know, the top one is something called broken object-level authorization, which is a really classic API vulnerability. Sometimes also called insecure direct-object references, or “IDOR.”
David Brown
Sounds very technical. Can you give us the up-down version?
Neil Madden
Yeah. So what it basically is, imagine you're writing your own email server, like you're going to rival Gmail or whatever. And so you've written an API which allows users to log in and then check their messages. So you've got a slash messages endpoint. And so Alice logs in and she goes to slash messages slash Alice and she reads her emails and you know, the API is checking if she is allowed to access the messages API. And yes she is. So it returns her emails.
But if that's all it's checking then what Alice can do is she can just change that URL to like slash messages slash Bob or something like that, you know? And obviously, often in APIs, these are quite predictable URLs. And if the API is just checking, you know, can she access the messages API and not checking whose messages specifically she has on her access, then she might then get sent back Bob's email messages. And this sounds like a really simple thing and obvious thing but actually this is really prevalent in APIs. And so, those kinds of things you really need to worry about and they list a whole bunch of these things that you have to worry about.
So another classic one is overexposing data in your API, returning sensitive information in APIs because your UI is taking that API response and then maybe rendering a subset of it. And so nobody's actually testing what the underlying API is returning and it turns out, it's returning your social security numbers or credit card details or whatever and so there's lots of things like that to worry about.
David Brown
Yeah. All right, that sounds like a good resource as well. We'll publish that in the written version of this podcast as well. Run me through the security mechanisms for an API.
Neil Madden
Right, so in the book I kind of cover the standard kind of mechanisms that would be used to protect against like whole classes of threats because you don't want to be kind of going down the path of like reacting to individual security vulnerabilities. So you want to make sure you've got these kind of core security mechanisms in place which kind of stop broad classes of attacks straight away. And this kind of five if I can count five main security mechanisms. So there's kind of initially you have some kind of encryption on your connection so you're using HDs um and that's kind of protecting data in transit to and from your api that's kind of a basic thing that most people would would have set up now um you then potentially also have encryption on your back end of encrypting data at rest, you know, depending on the environment you're storing it in and your threat model of how worried you are about people accessing that.
So you're encrypting on the back end and then there's kind of like a bunch of stages of security controls that requests go through as they reach your APIso typically there's some kind of rate limiting which is which is just a mechanism that just prevents your service getting overwhelmed. And it's designed to stop denial of service attacks, particularly things like distributed denial of service attacks where people kind of recruit a huge botnets of compromised machines and have them more flooding your api with traffic. Um And so you have just some kind of like rate limiting which is gonna like realize when there's too many requests happening for your service to handle and it's going to start, you know, shedding load at that point or throttling requests, which is kind of delaying them until until later. Um and that is often performed, you know, at the network layer, so you kind of push that out as far to the edge of your infrastructure as possible.
So right out to, you know, and if you're really worried about ddos attacks then you might employ a commercial services company that provides ddos protection um but otherwise you're kind of doing some kind of rate limiting at your kind of edge load balances and then you know layers within your system so that you're kind of blocking these requests and then beyond that you're then looking at authentication. So working out who who is is sending this request to your API And that that can that can often be quite different in api as to how it is in traditional um web applications and maybe go into that a bit a bit later but but you've got this authentication process which kind of like works out who the user is basically that's making the request. Um And then and then through that you've then got logging, so you have some kind of security or audit log that's recording all request to your API so that you can then later work out who who did what on the system. It's very useful resource particularly after an attack to be able to go back and see, well you know if the attacker did get in, what did they access and things like that um and then you have your authorization where you actually make a decision of whether this user should be allowed to make this request. And this is where you would you would you would fix those broken object level authorization issues. We just talked about, you know, so you have some decision process there which is saying, you know, who's making this request, what are they trying to access? Should they have access to that? You know, potentially other things, you know, some APIs You might might have on call shift rotors and so certain users can only access the api at certain times a day and things like that and that's kind of like your main kind of stages. So you've got encryption of data to protect it, you've got the rate limiting authentication, logging and authorization. And those are the kind of main mechanisms to get right
David Brown
in your book. You also mentioned that security flaws often occur when an attacker can submit inputs that violate your assumptions about how your code should operate. Can you give me an example of that?
Neil Madden
Yeah. So well we've already discussed one with these insecure direct object references. So in that in that case the the API Is kind of assuming that um if you're accessing the messaging API and your you know you're allowed to access the message API That you must be accessing your own messages because you know maybe the Ui only generates links to your own messages right? So nobody's going to go and fiddle with the with the URL Or whatever. So that's the kind of very basic assumption but there's also you know other things that can occur. Um So there's things like um there's there's a class of like denial of service attacks which which caused memory exhaustion attacks. So some ap eyes which takes like binary format message is the message comes in and it has like a length field which tells you how long it is and then the actual message itself. And if you don't, you know validate that length field. You know, that might be like a 32 bit length field or whatever. And if you just blindly allocate a buffer that size to hold the message, you know, they can just send you the maximum possible value and you're you're a P. I is now allocating like two or four gigabytes of memory blocks for each message that comes in. And it means they can just send a few messages and totally take down your APIso again there's an assumption there that you're making that that the people will only the length will be correct and will match the message whereas it won't. And in fact that was that was that was a classic security vulnerability. Heartbleed years ago in in SsL which had kind of pretty much exactly this vulnerability where there was this little used part of the ssl spec which was an echo thing. So you could send it a message and it would send you back the same message back to you as a kind of heartbeat thing to check whether the server is up and hardly anyone used it. And it had exactly this vulnerability that you could just say, well yeah, this this tiny messages actually you know too big about its long and because this was written in C which is not a memory safe language. What actually happened then was it, was it it assumed the message it got in memory was two gigabytes long when it was only tiny. And so it would take a whole chunk of the server's memory and then send that back to the client. And that that memory was like the private memory of the server and contained like private keys and passwords and all kinds of stuff like that. So that was that was a terrible vulnerability. So again, it was just an assumption made by a developer at the time that, you know, the length is going to match, you know, why wouldn't the length match? Um And so it's kind of like a lot of attacks were, you know, go on. Very subtle differences in the assumptions that the developer made when they wrote the api compared to how an attacker actually abuses it.
David Brown
I mean your book has amazing resources when you're developing API's to look at ways of securing them and considerations taken into account in the design. Of course, a lot of companies are already published an api are there any frameworks or tool sets that can assist them with the identification of
Neil Madden
threats? Yes. So there are there are various things I'm Yeah, so there's things like um I struggle to think of names now but there's um first of all you need to you need to kind of understand what a P. I is you actually have, which is a common problem that companies have that, it's quite easy often to just deploy an API And so you end up particularly in the early stages of the company, you know, with lots of a APIs and you don't necessarily know what they are, who's responsible for them even where they are. So you you have tools which can kind of scan your your public kind of cloud offerings and your known IP Address ranges and things like that and they can discover APIs And inventory them so that you can then start tracking down who owns this thing, what security controls are already in place for it and things like that. Um There are then things like this, various kind of appliances you can kind of put in front of your Api which will kind of block various attacks. So there's things like web application, firewalls and things like this, which sometimes now built into things like api gateways um where they have kind of rules that will detect various kind of common kind of attacks and sometimes they can kind of like feed in kind of information about new attacks as well that there are ongoing, so you can kind of keep them up to date. Um
David Brown
Okay, and you you let's talk about the secure development, so when you're creating a ps from the outset, what coding practices can a developer used to avoid incorporating security vulnerabilities,
Neil Madden
right? I mean this is this is a huge area, there's many possible things and people have written a lot on kind of secure development, but there is some, some kind of basic things to kind of look over and I kind of go in in in the book over some basic kind of secure development principles. So things like, you know, if you're if you're coding your API in a memory unsafe language like C or C plus plus, there's a lot more things you have to worry about. Um so, you know, ideally if you can, you know, use a memory safe language like java or rust or, you know, go or something like that, but if you if you have to use your super first for whatever reason that then there's a there's a bunch of things you need to do to kind of make sure you're not having those kind of heartbleed style vulnerabilities where you're getting buffer overruns and things like that.
Um and there's a bunch of nice tools now that you can run, so clang, which is kind of the C compiler has a bunch of tools in it, these kind of sanitizers you can run, which will tell you they kind of instrument, your code when it's compiled and then they detect certain conditions that are risky and will kind of tell you, so they'll detect things like buffer overflows and undefined behavior. So that's things you're doing in your code that are sort of compile a specific how they're, how they're going to behave and might change on different platforms and things. Um so then beyond that kind of basic kind of getting the memory safety right there, there's then a bunch of things and it's kind of like categories.
So one of the most dangerous things you can do in in security is pass anything basically so many security vulnerabilities around passing, which kind of feels weird because passing is kind of one of those areas of computer science that feels like it's like the most studied and kind of like there's really good kind of theory behind it. Um but actually in practice, um it's it's really risky because often different passes will pass things slightly differently, which can lead to vulnerabilities. So example of that, those things which are request smuggling vulnerabilities where you have request comes in and it's not just going straight to your api server, it's going to some kind of gateway or reverse proxy, like an api gateway that's doing some kind of like access control checks first and then it's going to forward it on to your api if it passes those, but if those, your back end server and this gateway, if they pass that http request differently. Um this is this is quite a common vulnerability then an attacker can submit a request which to the gateway looks innocuous and passes the checks, but then on the back end it gets passed as a completely different type of request and causes something completely different to happen. So it's called request smuggling and is kind of serious vulnerability. Um So so the kind of the main solution to that is try and keep it simple and make your your data formats and things as simple as you possibly can. Even Jason which is quite simple is actually people have done some work looking at the differences between Jason Parsons and finding that like basically no to Jason passes in in the world powers Jason exactly the same way when you get down to like looking at all these edge cases. Um So this kind of input validation and you've got to make sure then that people aren't sending malicious things to your API
You’ve then got things like injection attacks and when you're saving your data into the database, making sure that you're not just concatenate strings together to form the SQL statements that you're running to insert things because then otherwise or even to query things because then otherwise um the the attacker then can put especially crafted strings you know you've probably seen the XKCD Little bobby tables cartoon if you haven't googled that where where you know somebody's got their using their name, they've named their child such that it's got like SQL syntax in it and you know it drops the whole database when you load there that child's name into the database. Um So there's things like that and so you have to kind of use these different ways you protect against that. So for database particularly use something called prepared statements where rather than just concatenate strings, you kind of have a SQL statement with just placeholders in it and then you supply the input separately and then the database knows well that's user input and that's the code and so it can't get confused about the two. Um There's also you know, injection into html which causes cross site scripting and stuff and there's different ways you do that. So normally there's some kind of escaping you have to do to kind of protect values. Um Yeah so those are the kind of things you have to look out for um Just think what else I cover in the book around that. But
David Brown
there's a lot I've looked at the book and you cover a lot in this space. It's incredibly comprehensive and and an easy read as well. I recommended let's move on to authentication. You did start talking about some of the security mechanisms and we touched on authentication and authorization we're gonna talk about as well. So you say that token based authentication is the dominant approach to protecting apis. What are some of the advantages and disadvantages of token based authentication?
Neil Madden
Right, so token based authentication is basically so people sometimes have different different definitions of what it is. Um But for the purposes of the book and now I interpret it is that you log in and you get some kind of token which is a string, a random looking string that you then send on subsequent requests to the API and that authenticates you. So that string is somehow connected to your account. Um So typically it might just be by via an entry in a database. So this random string maps to you know something which says you know who your account is, it might you know linked to billing information and things like that. If your account is commercial your API is commercial and some of the advantages of that is obviously then you're not sending use of passwords or other credentials on all your API calls which is particularly important. You know if you have a lot of APIs And a lot of people developing them. You know not all of your developers are going to be you know security people and so they might not know how the best ways to handle passwords and things like that are. And so this token which is typically relatively short lived compared to a password. Um There's less risk then of being leaked somewhere. You know if if the api is logging that information to its logs and it's actually logging these tokens and they're ending up in some centralized log store. It doesn't matter so much if that token is only valid for like 15 minutes or something or a couple of hours compared to if it's somebody's password which they probably reused on their bank account as well and things.
David Brown
Like that.
Neil Madden
So that's kind of one of the main main advantages. There's also kind of, other on the other extreme is kind of more secure approaches to authentication. So those things that you can use certificate authentication at the TLS layer but it's often really complicated to set up. So so token based authentication is kind of hitting this sweet spot where it's kind of quite easy to set up and get going and adds quite a lot of security benefits early on. It avoids a lot of this complexity. And then recently there's been kind of work on like improving the security of token based authentication. So you can later on you can add additional security, things like difficult authentication on top and kind of tie them together in a way um as a kind of optional thing that you add on when you want to harden your systems later.
David Brown
So what the certificate authentication, that's where the client and server authenticating each other based on their certificates is All right.
Neil Madden
Right. Exactly. So certificate is a a bunch of information about who you are and then also a public key in a kind of signed thing that you give to the to the server. You know, you do client authentication. So the server authenticates to you first with its certificate and then you authenticate back to the server with your certificate and you. The difference between a token and certificate is that when you present this certificate you also have to kind of sign something with a private key that only you have to kind of prove that you're the guy who has this certificate.
David Brown
So
Neil Madden
that's where tokens or what's, you know the original way use tokens of what's called a bear a credential. So it's like cash. Like if you've got it you can use it and there's no other check done. So you know if you present that token to the API Then it's assumed that you got it legitimately. So obviously there's a risk then that if somebody manages to steal that token that they can use it just like you can and so there's various things you can do to kind of harden that up but it's a kind of trade off them between adding more complexity to your systems
David Brown
If you just registering a token in the database. Is there any sort of scalability issues as the number of users grow and as you're issuing tokens?
Neil Madden
Yeah, definitely. So it depends on what technology you're using for your backend databases. You know, there's some great databases now that scale really well, particularly in the cloud and things like that. Um So you can actually get. So people are often quite quick to jump from from database back token to other forms. I'll talk about in a second. But actually I would say you can get quite far with the database. So don't assume that you're gonna in in many ways putting stuff in the databases is the most secure thing because it's really easy to revoke tokens. Then you just delete the entry from the database. So the alternative that people look at our self contained tokens. So these are so Jason web tokens is the main standard people use and this is a bit more like the certificate we were just talking about. So it's a bunch of Jason that says you know who the user is, maybe you know what roles they have when they logged in things like this and that then Jason is then signed with with a private key and encoded and that becomes the token and so on your API Then all you have to do is verify the signature on this thing and that it was signed by your trusted authentication service and then you can just unpack the Jason and you've got all the information there. So you're not having to make any kind of call out to to a database. So it's um it's a lot more scalable that approach and potentially has different trade offs. So you get like saying revocation is much harder now because this thing is just valid until it expires. You obviously have to, if you're using time based exploration, you have to have a reliable clock, which is something that's quite often forgotten you know that You've got to be able to tell the time accurately if this thing is supposed to expire in 20 minutes or whatever. You've got to be able to know when 20 minutes is up and if your servers have wildly different ideas of what the current time is that can cause problems. Um So so there's different trade offs there but yeah generally speaking most mature aPI S after a certain point will will move towards these kind of sometimes called stateless tokens or Yeah client side tokens, things like this.
David Brown
I know there's always a lot of confusion between authentication and authorization so maybe he can give us a rundown. How does oo to fit into all of this?
Neil Madden
Right so so the difference you know authentication and authorization is authentication is about as I said it started about who you are and and proving who you are and then authorization is about given that I now know who you are. Should you be allowed to do what you're trying to do. Um And it is kind of interesting. So art is kind of like a standard approach to token based authentication in that you can you stand up something called the north to authorization server and your users can go to that and log in and then get back a token and access token which they can then use to access various APIs. And so forth to is authentication in that regard. But it's not it's not in other ways. So it's designed as a delegated authorization protocol. So when you're logging in at the oath service, what you're actually saying is the bit of software that I'm using my client which might be like a mobile app or it might be a website I'm using that's different from the website I'm trying to access. Um I'm going to authorize that that app or whatever to access my stuff on this other API But I'm going to limit the scope of what it can access. And so or two has these these scopes which get attached to the token which say what they're what you're allowed to access. So so maybe if you go back to like an email example. Right? So Alice Alice has got a custom email app that she wants to use with her email server and they're made by different people. And so she uses oath to to kind of log into her email service and then say I want to give this app a token it can use to access my emails but I want it to be like read only access. Say so it can only only read, it can't write any emails so that's the scope of the token. So so that's the authorization aspect of both too. So Alice is authorizing this app to access her stuff. So she is authenticating to the authorization server. So the authorization servant knows it's her.
But then she's authorizing this app to access her stuff on her behalf. And then the app again gets this access token which it uses to access the APIs. Um And and it's kind of similar in a way because that access token will say who Alice was originally, that's that's known as the resource owner. So who originally authorized this And the backend API is going to probably look at that at some point and say, you know, well which messages is this client trying to access? Is it trying to access Alice's messages or bob's messages? I'm going to compare that to who originally authorized this token. So that's the authorization aspect. Yeah. And then it will also check these scopes and say well you know, if it's trying to write an email and she only authorized read access then I'm going to say no to that.
David Brown
And of course there are lots of different flows as well. Right?
Neil Madden
Yeah. New ones at it every day. So it's a framework rather than a specific way of doing things and that has pros and cons. It's very flexible but it does mean that everyone does things in a slightly different way. Um So you get these. Yeah, these grant flows which kind of how you how Alice kind of proves who she is, The authorization server and then also how the token gets communicated to the client. Um So one of the very simplest ones is the resource owner password credentials grant, which is now being deprecated. Id and shouldn't be used. Where Alice just literally just sends a message to the client, posts a message directly to the authorization server with Alice's username and password in it and gets back a token. So obviously the way that works in terms of like if you've got your mobile app then what that means is your mobile app is asking you for your user name and password and you type it directly into the mobile app and then it goes and gets a token. Now obviously that's like completely insecure really except for very trusted apps because obviously that app, you know this this idea of scopes and limiting what it can ask this to. If you've given it your user name password then it can ask for whatever scope it wants. Right? Because it's got your your whole credentials then. Um And it's it's kind of can take over your account. So so the more secure flow that people generally use in real applications is the authorization code flow where you you redirect the user to the authorization service and the authorization service has its own web page where it will log in Alice take a user name, password, two factor authentication, whatever. Um And then it generates a short lived Authorization code and that really short lived like a few seconds you know 10 or 20 seconds and it redirects back to the client and the client then takes that authorization code and posts it back to an endpoint on the authorization server. Um And then the authorization server returns its access token from that from that end point and that that's much more secure but there's also like um flows for specific circumstances. So there's there's a thing called like the old device flow which is useful for things like smart TVs and things like that Where you want to kind of link to your account but the TV doesn't have any kind of keyboard or anything so you're not gonna be able to type in your password directly on the TV or you're going to have some horrendous like I mean people do do this where you have this like little curse and you have this on the screen keyboard and you have to like which is horrendous for people like me who use password managers with like 30 character random passwords and you like spending forever. So there's a thing called the old device flow. And if anyone listens to this is writing a smart tv, please go and find out about the device flow where basically you render like a QR Code or something like that on the tv screen which user then scans on their mobile
then write the mobile then loads there the authorization server and they log in and authorize the smart tv to have access and in the background. The smart tv is kind of, it's been given a code by the authorization server in the background and it's polling to see when, when the user is finished and then eventually, you know, the user approves it and then it gets back its access token on that on that back channel. So it's kind of the security pros and cons of this, but it's it's usability wise, it's way way better than this kind of using a cursor on your on your screen. So
David Brown
We've covered a lot of ground, we're primarily talking about restful api I'm just wondering do these all these principles we've been talking about they for restful api's do they also apply to service to service based communication saying the microservices based architecture?
Neil Madden
Yeah, they do. The mechanisms you'd use might be the specific techniques you use might be different, but the general principles are the same. Um So typically in the back end, um it's kind of a different approach and there's there's things you have to worry about that you don't have to worry about on the front end. So typically microservices are talking to each other using service accounts so they're not using user accounts to connect to each other. You know, my micro service will have some kind of, well assuming you have any kind of authentication between your Mark services, which people often don't right at the start but you you should do um but so typically you'd have some kind of service account which, you know, even in the simple case of like a traditional web application, you've got your application, your web server and that's talking to a database and it's got some kind of database connection password.
So it's a similar idea, you know, if you're talking between microservices and that service account connection password then is highly privileged so it can access any users stuff. Um and so you do have to be careful that if somebody manages to find a way to make requests through through your api through to the database or through to the backend micro service that they can't then access stuff they shouldn't be allowed to access. And that's there's a whole class of vulnerabilities, they're called service side request forgery, which is again, a horrible acronym. And remember a better way of remembering it is accidental. Web proxy. So you've got some service in your thing. That's that's that's accidentally it it provides a proxy interface that attacker can send request to it and it will send them off to like back end services in your internal network that you should be able to access. So, um in the book, I actually in later chapters, I give an example of this, which is like a service that checks that generates link previews if you've got like a chat service like, you know slack or something like that, You paste a link into the into the chat and it goes off and fetches that link and render the little box with a little summary and maybe an image or something of what that web pages and to do that. It's making you know different ways you can do that but common way is that your back end services then just just taking these murals and connecting to that URL From your back end. But if you're if you're attacking then can then guess your L. S. Of like your internal services like their IP Addresses and things like that. Then it can just feed your service your L. S. Which are you know your billing service or whatever or your payroll service and then your server just blindly connects to it. And like yeah there's a bunch of stuff and I'm just gonna render that into this little box, you know. So there's there's kind of all kinds of subtle problems around that. You have to be really careful when you're particularly when you're back in services processing your L's. Um to make sure that you're validating those correctly.
David Brown
So many considerations that the book is called Api security and Action is published by manning where it's honestly essential reading for anyone building APIs Or already published APIs For that matter Neil how can our listeners follow you on social media? What channel should they be looking at is a linkedin twitter or something else.
Neil Madden
Yes twitter is where I'm most active. So I have the handle Neil Mad Dog. Don't ask me why there was joined twitter late and there was a lot of Neil Madden's on already. So so yeah
David Brown
there are, yeah
Neil Madden
Yeah, so so that that's where I primary and I kind of tweet, I also have a blog which I kind of sort of infrequently update with with articles on things that tends to be a lot more in depth. I'm also at the moment launching a business. I haven't quite launched it yet where I'm going to be providing kind of security training courses. So they'll be online training courses and some in person once. So keep an eye on twitter. I'll you'll be able to see that when that comes out
David Brown
Neil Madden thank you very much for your time today. It's been a pleasure to have you on the show,
Neil Madden
Brilliant, thank you very much.