Transcript
Kevin Montalbo
Welcome to episode 69 of the Coding Over Cocktails podcast. My name is Kevin Montalbo. Joining us is Toro Cloud founder David Brown. Good day, David.
David Brown
Hi, Kevin.
Kevin Montalbo
All our guests for today have spent over 25 years in the software industry with a breadth of experience in different businesses and environments from multinationals to software start ups and user businesses to consultancies, working with UK and internationally known brands and organizations. He is the author of Logging In Action with Fluent D Kubernetes and more which teaches readers how to record and analyze application and infrastructure data using fluent D. We'll talk about that book today and if you stick around until the end of the show, you learn how you can get a copy for yourself. Ladies and gentlemen, joining us for a round of cocktails is Phil Wilkins. Hey, Phil, welcome to the show.
Phil Wilkins
Thank you. Good day to everybody.
Kevin Montalbo
Alright. Good day. So, um let's begin. Systems have been producing logs for decades. What is and what has caused the emergence of unified logging.
Phil Wilkins
So, logging has always been around but what we've seen over probably the last 20 years is systems becoming more and more distributed in their construction. It used to be, we'd run a couple of servers in parallel, maybe in an active, active or active standby combinations, we'd need to bring the logs together. Um But as virtualization was introduced, we saw that extend even further. And then in the last 5 to 10 years, containerization has driven that even further um particularly microservices. So now you've got one application might be spread across multiple environments. And to understand what's going on, you, you need to bring the logs from each of those environments together to be able to get a holistic picture. Um a a and that's the key to, to log unification is to gather that information together to, to a position where you can see what is going on and to win a new solution.
David Brown
And is that like when you talk about um servers running in parallel or a distributed system or a cluster of servers and unifying the logs across those? Is it unifying the logs across the same application servers? Are it also across tiers of the application unifying the logs across across the the um network tier the database tier. The application tier is, is, is that what unified logging is about as well?
Phil Wilkins
Yes. So, so um many people are on a journey of, of varying levels of maturity. And what you've described is almost like the path of maturity for, for log unification to start with. Just get your app servers talking, you know, that's probably the easiest. Um but your Infra guys are gonna be wanting to also look at what you're doing and what's happening on the server infrastructure, your DB A S are going on to know how hard you're pushing their database. So if they see database performance issues, it's like, well, is that something that's happening in my database or you use a consumer saturating me with requests? And the more you can bring that information together the better your understanding is of of your landscape. Yeah, we as an application or server does not operate in isolation.
David Brown
Yes, of course. So what is the difference between log analytics and unified logging?
Phil Wilkins
So, log analytics like most analytical pro processes is about processing a volume of data analyzing, looking for trends quite often and patterns in in the data, whether that's a log entry or yeah, transactional records. Um The log unification is about getting the logs together. So all these independent components that are contributing to the the sum total of your environment and solution you need to get those together um and bring that data in into a single place to be analyzed. Now, one of the tricks wi with more contemporary unification tools is that rather than just grabbing the data, put it into a big pot to be later analyzed is you can start to do some event based processing. And it enables you to be a lot more reactive now. So rather than waiting an hour for the next ana analytical run to happen perhaps on your environment, the unification tool can go. That's a an exception. And I've been given some routing rules to say this exception is particularly important. I'm gonna go and Ping Joe Bloggs and tell him that that exception has happened now. Um
David Brown
I'm glad you brought up the tooling because your, your book talks extensively about fluid D, which is the an extensible open source framework for data collection, the filters and routes, logs for their consumption. When I, when you, when you look at a diagram of fluid D and how it routes logs from the source to their destinations. It looks kind of like a, a middleware ESB type tool. But it's specifically designed for log analytics. Um And you mentioned some of the pur you know, it can do more than just routing of logs, it can actually do some event based processing as well. So can you just run us through how flu and D can help with this unified lobby.
Phil Wilkins
Sure. So, the first, the first thing in being able to unify your logs is to be able to gather the, the, the log content up from a vast pool of resources that that could be your system logs. SN MP traps in your infrastructure through to many, many different types of application logging formats. Um You know, we're in the world of polyglot now. So you enter an enterprise might be running a combination of.net solutions and java and node and, and the list goes on. and they don't all work in the same log format, so being able to cope with that's important. Um And then yeah, once you've started to ingest those, you need to do a number of things, one you've got to decide whether the the log event is of help of use. sometimes particularly more brutal solutions. You, you know, you may be deployed and everything's running smoothly, but there are debug logs being put into your logging. Um And people become nervous of any change. So, rather than going and changing that system, if it's running and its log configurations or even in the code to change the log thresholds, um It's easier to say, OK, we'll, we'll put rolls in to filter that out into the unification process. So we don't take it any further than where we've grabbed it from. And you're not polluting your aggregated views of all the logs with any undue noise. So you can do that, you can route it to different systems as well. Um In large organizations, you'll get specialist teams that are dealing with monitoring of your solution. Um Yeah, traditionally, your CIS admins um will work with tools like NAOS that are focused on all the infrastructure. Other tools will be more focused on application logs. You know, if you feed into log here and things like that, You, you're more oriented towards supporting the app DEV and the apps teams. Um And they want a different set. They're not, they're less interested in the minute of what's happening on the server unless it's significant. Um And, and want to know more about what's happening at the application and the database layer in terms of how their sequel is performing. And what sequel is being executed. So you can start to root out to the different tools, the right events rather than saying, OK, everybody's got to use this one tool that's enterprise wide edict says that that shall only use this. So you can start to think about your best to breed if you want. So that's, that's one of the one of the key use cases. The one that I like to, to show to get people thinking about it is more the, the social alerting if you like or the collaborative mechanisms where you can tease out specific events which are, are for warners of, of something significant. And you can then filter that out when they occur and send a signal to someone and say, look, I've had AAA um an event I've recognized as being a warning to a bigger problem. But if you get in there quick because you know that you got, you say five minutes before things go belly up, um You can get in there and prevent it rather than cure it, which is a lot more useful.
David Brown
Interesting. And, if I'm guessing I'm just thinking about how we get the data into fluid D in, in the first place. So I, I imagine with popular frameworks, there are connectors or prebuilt for fluid D to ingest the logs from popular systems. What if my system is not available as an out of the box connector? What do I need to do to prepare my data? So they can be consumed by fluid D?
Phil Wilkins
So the simplest and the most common approach to, to that sort of thing is is to just let your application work as it does. You know, typically that's AAA file or a rotating file. Sometimes people will write to database. Um And what you can do is rather than point connect your application directly to fluent D through you know, on a pender in your login framework, which is the more optimal route you can set fluent D up to say, right. I'm gonna trail um that file or, or gonna go every minute and grab the latest entries in that database. Um You know, when you're developing, if you're working with a Linux environment, you will be familiar with the idea of tail minus F where you just literally watching the end of a log file as events go through. Well, fluent D's got some fairly sophisticated connectors that are able to do that for you. And it therefore, it's hoovering up the events as they go. And that way you, you, you make no invasive change on the application which is ideal for, for those sensitive brittle use cases, those legacy systems that everybody is terrified of touching that are so critical to your business that you know, the sooner you know, that things are, are going awry the better?
David Brown
And is it, is it a client server? Is there an agent running on my server? Fluent, the agent which is collecting and sending those logs to a fluently server or am I streaming the logs to a server?
Phil Wilkins
So you can, you can set it up? Um And this is the beauty of fluent day because it's had to deal with particularly the border microservices. And um iot you can deploy it as as a central solution. And you can either stream to it, which you can receive the streams or it, if your network will allow it, it can reach out and connect. Um But the more common model um is to put um agents, use fluent data in its agent model. And deploy it closely to the application. Um And in micros in the world of microservice is you can see this happen in AAA number of ways you can deploy fluent D as a side car using that kind of deployment pattern. If you're using a service mesh, then there's an element of fluent D engaged with through sto for example, but even on your legacy environments, you could put a small footprint, agent, fluent D node in with your server right next to it. And then it is a parallel process because it's such a small footprint. And there is a version of a fluent D um I call it like the little brother because it's, it's uses the same exact same principles, but it's stripped back. And it's, the kernel is written in C called fluent bet. A fluent bit has got such a small footprint. It is very easy to actually deploy into internet of things, devices. And it, it's designed to, rather than doing any of that processing filtering, it's designed to just grab and f so it is a true agent in that sense whereas flu indeed can act as the server as well as an agent.
David Brown
Mhm You also wrote that log processing is only as good as the logs that are generated. That's a quote from your book. Sounds like the old adage of garbage in garbage out. So how, how do we generate better log data?
Phil Wilkins
Yes, you couldn't summarize it better to call it garbage in garbage out. Um If you're writing logs and just treating them as quick hacks to, to help you debug it and do local testing, then your logs are gonna be difficult to understand if not meaningless. a year, two years, five years down the line when you're no longer involved with it. And um yeah, the messages you've put in there are, are a bit um unique to your understanding. So, so the best thing you can do, whether you write to a file or using fluent D or something else even um is to think about the, the content that you're putting into the message and make sure that it's meaningful. But data aware if you're dealing with a financial system, just dumping the entire transaction, could create you some real headaches because you could be writing sensitive data into the log. So, you know, you, you have to start thinking about logging almost as important as your transaction itself that you're processing. Um and the more semantically meaningful and, and the more insight that you offer into that log, the better. So pumping out the key variables um into a log entry that affect how your application is behaving is always gonna make your logs more useful and trying to show or provide it in a structured manner will mean that it's gonna be an awful lot easier to start expressing these rules. um Whether that's in the Unification Layer, affluent day or even downstream when you start to do log analytics, if you understand the structure of the data being logged, it's an awful lot easier to tease out meaningful assets and, and tease out activity
David Brown
makes sense. What, what's the difference between an audit event and a log event?
Phil Wilkins
So the very much the same thing um an Audi event, it differentiates itself normally by the fact that it's going to be used beyond just understanding operational state. Um an application behavior in so far as you can record audit events through your login framework as well that they're there to not only perhaps help you understand your application and what it's up to, but you will use it to support compl evidence of compliance and things like that or, or dealing with security things. So who's signed in, in and out of when? For example, you can characterize as an audit event. Um Just that someone's logged in or, or, there, the log in logic is running, that's more of a, just a log, traditional event log. because you can't tease out the, the meaning from it. So, so the audience is very much to be at a trial and show what people are doing and what's happening in your system. So if someone comes and says they think there's a data leak, you can go to your audit trails and go, ok, this is what actually went on and I can account to governance bodies that my system is running true and correct.
David Brown
Why is it important to distinguish between audit events and log events? Is it, is it we're treating them said differently in terms of unified lobby
Phil Wilkins
in the short term, you're probably not gonna treat them that differently. But the, the, the key difference is, is because audit is supporting compliance, you're likely to have rules about how long you record that information from and you might need to store it slightly separately. So it's easier to pull it out to present it if you have to show um evidence of compliance activities,
David Brown
Mhm You've devoted a section to your book about achieving Clear language, which was succeeded by a human and, and machine readable. What can you expound on the importance of these factors? Clear language and human and machine readable.
Phil Wilkins
Yeah. So, um we've all, all developers have probably done it at some time or another, got really bored of writing log entries and put something funny in there. You know, it's gonna throw an exception. So I put Geronimo in there or something like that. Yeah, just to, to lighten the, the, the, the, the process because it, it can get tedious if you were having to write lots of very dry log interest. Um But, you know, if, if you, if you do that and leave that there, that really doesn't tell anybody anything meaningful, um when it's not you and, you know, I will know what it means because I wrote it and know where to go looking for it. And perhaps it might reflect a particular exception that was annoying me during testing. Um But, you know, for an ops team, which you're not, you know, even in the DEV dev ops environment, where the developer is involved in the operations, sooner or later, you're gonna move on to a new project or a new product. And someone's got to keep your solution alive, see, you know, the more meaningful, the more easier it is to understand that statement for in the eyes of another person, the better it's going to be. So we need to think about that. And therefore we, we do need to be aware of our semantics and our technical language. Um If you're gonna use specific terms, that's great because it helps um understanding of the meaning because you know, if, if in the language of, of accounting, you know, a particular type of transaction, you call it that transaction, but make sure that there is a a dictionary of terms if you like. Um and you can add more meaning when, when you're dealing, particularly there are scenarios by using things like unique error codes. And that allows you to attach more, far more cons comprehensive information explaining what the causes of this Sarah can be what the remediation. So you're, you're giving more meaning again, but just saying, I've got a, I've caught an exception. Here's the stack trace move along. Um So, so adding that detail is really helpful then in terms of making it machine readable as well, this comes back to um the workload um involved in processing it. If you make it easier to process the log events, so give it structure then the easier it is to either in the event stream make it actionable. So if I get an event with this attribute, which has a particular value, that's a lot easier than trying to run a red X across A yeah, a stream of consciousness text to tease out and say, actually, I need to tell someone now about this rather than just send it to the log log analytics platform or route it um a account of this type of event in the last 30 seconds into Prometheus.
David Brown
In that regard you, what are you suggesting like a, a Json format or an XML or, or, or simple CS V is, is as long as it's machine readable.
Phil Wilkins
Yeah, it's all down to the culture of the organization. Um Some are better than others. Jason's better than Eczema because it's less verbose but still carries the readable meaning CS vs. Um If you're doing that, at least you can see each of the values, but you put more cognitive workload on the the consumer that's looking at it because you've got to know what each column is in a CS V. You, you, you really don't want to repeat the header every time you write a CS V row. Um you can do but it just is harder to read
David Brown
and is dealing natively is in Json format, right?
Phil Wilkins
Internally, it processes. Everything is Jason. So every log event will get a very basic Jason event structure applied to it. Even if you're only sending it text because what it does is it takes the, the, the log event and treats that as an, an element called the message and it will attach a time stamp to it. And you can link other metadata to, to that as well. And of course, then I in fluent that you can start doing things like examining the payload in your configuration then, because it's able to interpret Jason very easily,
David Brown
right? We've talked about clear language, achieving clear language. What about context? So can you explain how what is the context of logging?
Phil Wilkins
So the context is is all about really trying to facilitate the person looking at um the log and understanding what in what conditions did it occur when we have errors. You really need to, to help you diagnose things, you need to know what's going on around it. Um It, it's, you know, imagine yourself in a forest but you can't see, you hear a loud thump. Now, is that a tree falling or is that a wild animal ro running around in the forest that's about to steamroller you. Um If you give more context, the better you are to you're able to understand that problem or the, the situation you're in and therefore what to do. Um You hear that that tree go down. Well, if you can also feel the the wind blowing against your face really strongly, you know, you, you've probably got a storm and that might be just a tree falling over because it's been blown down whilst not great. Um Unless it happens right next to you. Not a problem. But if that thump is,,, sounding more like,,, flush against something hard,, you're probably gonna want to go run somewhere or try,, because that could be a bear coming for you., you know, in that, in that, in that forest. And what I'm trying to say is, is the more information you can associate with the event,, when you record it, the easier it is to diagnose and determine that path of action. So if you were throwing a, you know, if we take that to a database connection issue, being thrown, um you know, what's the URL Ur I sorry at that database. So, is it a particular database that's causing the problems? And that helps not only in the short term but also in the log analytics phase of? OK. Is it the same database that seems to periodically throw a wobbly and cause me a connection issue? Um or is it the same server that's causing me a problem because actually it's developing a fault? but it's intermittent.
David Brown
It's interesting like there's so much value in what you're saying. And do you find that logging and this attention to logging the language, the machine readable formats. The context is logging, getting as much love as it should in the developed community.
Phil Wilkins
Um I, to be honest, I, I think we can always do better. Um We're always under pressure to, to deliver and get things out the door., so we tend to write logging when we're thinking about and trying, working on testing our own application. So we tend to think about it from the viewpoint of what do I need now, rather than looking at the application and going well. Is this code could be alive in 10 years? What, you know,, I don't want to, um, um, I have someone asking me about code I wrote years and years ago, if I'm still in the same organization or worse,, something's gone wrong in the middle of the night., and,, the,, first tier of support., I just got me out of bed at two in the morning after I've been out and had my cocktails and, um, they're asking me what does this log message mean? Because they're trying to figure out how to get an application back on its feet., and they got,, you know, management screaming at them that a major system is down and they lose,, business is losing revenue. So,, yeah, the, the more you do to, to your logs to make it easier for, for people to deal with those situations and, and, and most importantly, the unexpected, um, because that's, you know, we write code to deal and deal with the expected conditions. It's the, the unexpected ones are always the, the issue, the, the more we do to help ourselves and think about those the better our lives are gonna be.
David Brown
And, your book talks extensively about fluid D or references fluent D as a tool set for unified logging. But the book itself goes into the principles of logging, obviously in great detail as well. Is flu and D the only game in town or are there other solutions for you?
Phil Wilkins
No, there have been there are quite a lot of solutions out there. Um Log unification is a newer ideas. So there are a smaller set of products out there for it. But probably the biggest one that lines up with fluent D that people have heard of is logstash through this, this spring sorry, the Arsic organization and, and is part of the, the well known ELK stack. Um And you can swap the log stash for, for fluent D and it becomes the EF ST or EFK. Um So there are options out there in that direction. You can go to the more classic aggregation model and log analytics. There are plenty of well-known products that do that. SPL is probably one of the best known commercial ones which again has this agent model with the ability to interpret and grab a lot of different data sources. Um But the difference, the key difference is, is it tends to work on the basis of pump it back to the Splunk cord data storage and process things there.
David Brown
What about the public cloud providers? What are they doing in this space?
Phil Wilkins
So, they're interesting um for the hyper scalars Google,, Aws, I, I'm not so familiar with Azure oracle. Um They, they all have actually built support or actually leverage big chunks of the fluent D tool set under the hood. Um oracle, for example, actually a lot of it and can ingest fluent D events. So they give you endpoint in your account and you can fire your events straight at it as if you're talking to a, a fluent D node which makes life really easy. Um GCP actually was one of the first to start adopting the fluent D framework and, and building it into part of their log fra logging mechanisms on the cloud native platform. So, so it's quite conversant with, with fluent D as well. And a lot of these providers are um offering, you know, means to consume fluent D events. Some of them also allow you to actually pull out of their environment using fluent data connectors, log events as well which makes life really easy. So if you're using a powers or a S A service where you can't get in at the monitoring, that's going on, you might be able to pick up some of the behavioral information through actually going and examining its log collection using fluent data and pump it into whatever system that you want to use. And that's really useful when you're getting into a um a multi cloud or high produce case,
David Brown
really interesting stuff. The book is called Logging In Action published by Manning. Phil. How can our listeners stay in touch with what you're writing about? Do you use particular social media channels or a blogging platform?
Phil Wilkins
So, um I'm on, on wordpress, and I blog across a number of subjects including adding extra bits of titbits and information that support that work with the book. And I can be found on two addresses either Phil at MP3 monster.org. So that's my email. So it's MP3 monster.org or blog.mp3.monster.org. That's pretty easy to remember. Um The other one is cloud hyphen, native.info, which is perhaps a more meaningful one for, for most um and a, a lot less esoteric given the, the content that I write about a lot of the time.
David Brown
Fantastic, Phil Wilkins. Thank you very much for your time today.
Phil Wilkins
Thank you. Pleasure.
Kevin Montalbo
All right. That's a wrap. Thank you so much, Phil.
Phil Wilkins
My pleasure. Thank you for having us on.
David Brown
Great. So we'll also append some details about the book to this podcast and I think, you were saying Kevin that we have some sort of giveaway as well. Yeah.
Kevin Montalbo
Phil give us some copies of his book.
David Brown
OK. So we will promote that as well and to tag it at the end, the production of the podcast and video takes a few days. So when that's Freddie, we'll um send you the details and some assets and stuff like that. If you wanted to post that to your own social media channels, you're welcome to
Phil Wilkins
and promote it.
David Brown
Yes. And, and we'll get out, get it out and start promoting it and across our channels as well. All right.
Phil Wilkins
Excellent.
David Brown
Yes. Well, thanks very much and enjoy the rest of your day.
Phil Wilkins
Yeah. No, thank you. The one thing is,, if you're looking for more people then, um, um, I'm reasonably well connected with some very good guests. You've been on other podcasts. So,, yeah, people, that work for Azul for the Dev ops., one of the guys was,, the author of,,, developer advocate book. Um, Kevin.
David Brown
Kevin's always asked interested in getting, getting your guests. So maybe Kevin you can reach out and get a, get a, get some names there.
Phil Wilkins
Yeah, I'll ping you some, some names of people that I think you, you might find interesting to talk to. Yeah, brilliant.
David Brown
No, we're always interested in that. Yeah. Thanks very much
Kevin Montalbo
for that job. A lot easier.
Phil Wilkins
All right. Awesome. Cheers. Have a good day. Cheers. Bye bye.