Ditch Your Logs for Better Monitoring Metrics
Logs are essential; we’d be blind without them. How else would we keep track of the behavior of our application, service, or platform?
This is how we have thought for years. But do we really need logs? I’ve heard teams give many excuses for not handling logs properly:
- We lack the time, money, or willingness to configure monitoring with the central log and monitoring system
- We don’t need logs in component B, since we have them in component A
- Our logs are too big—let’s log only errors
Every approach presented above is wrong. Incomplete logs cause a lack of information, visibility, trackability, and, most importantly, context.
I want to encourage you to look at your logs differently. Do you see patterns there? Some schemas to count records, or to change strings to numbers? We are doing it already for other data, like error requests in Nginx web servers. After all, we love using tools to display information in graphs, and all graphs are based on numbers!
Here are some ways you can change your logs to information that’s more useful and easier to understand.
The best thing we can do with numbers is turn them into metrics. Numbers are definitely faster to process than strings, and they take less data storage space.
However, there is a problem: We still have strings in logs, and generally, those strings hold the crucial information. But we can and should change that.
The answer we seek is a structured logs approach. Let’s look at the main aspects.
First, structured logs are prepared for processing by monitoring tools and are dedicated more toward machines than humans. That is why structured logs are often formatted in XML or JSON and are definitely not one-liners. If you designed the structured logs to use common patterns through all your services, you can process those logs to Elasticsearch or another tool more efficiently. No more heavy filters!
Second, your logs are ready to be segregated, counted, and turned into metrics, so it makes sense to take advantage of this information.
So, how do you turn a thousand test log lines into metrics? Use a time series database. You can parse the information in the context of monitoring and then count everything in specific timeframes, such as how many 200 status response codes you have had, or how many 500-level error codes.
You can turn strings from the body of the logs into a key, and the value will be the number of occurrences. This way you can create your alerting system very efficiently, and it will be way faster and more lightweight than a system based on the classic logging approach.