Posts tagged ‘Analytics’

Share on TwitterDigg This

Introduction

Creating your own log analytics
I find loggly, one really cool project. But at it’s current price is kinda prohibitive for many developer teams. So I decided to get something similar done in house for my projects.
After browsing the web, I found some interesting solutions, that when combined, could really get me a great analytic tool.

From batch load to online analysis

Logging analysis have been pretty much a parse and load job for me over the last years. The task was pretty much load a huge Apache/Tomcat/App log, do some aggregation using a plethora of tools (from simple java to hadoop for really huge logs). But the task was pretty much the same, get a log file at the end of the day, run a batch job for a few hours, load into a SQL DB, and rely on DB aggregation and some Data Warehousing for checking the results.
I knew we could do better than that. The first thing that hit me was this great post [ log everything as json make your life easier ] that I found on Hacker News.
That post made me feel really stupid, how something so obvious had skip us for such a long time. So that was the first thing we’ve done. We moved from plain text logging into a json logging.

Changing your log layout into JSON

This could be easy as candy, but it really depends on your environment. Tomcat is a piece of crap when it comes to logging. I can’t count how many times we ran into trouble with Tomcat’s logging system. For application logging you are pretty much safe, as you would have control of it, and by using slf4j you can pretty much control everything. But when it comes to tomcat’s own access log, things are not that simple :(
So I went ahead and changed the AccessLogValve located at ${catalina.home}/conf/server.xml to look something like this:

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
               prefix="localhost_access_log." suffix=".txt"
               pattern="{&quot;host&quot;:&quot;%h&quot;, &quot;user&quot;:&quot;%u&quot;, &quot;method&quot;:&quot;%m&quot;}" />
		 <!-- %h %l %u %t &quot;%r&quot; %s %b -->

This would produce a log on this format:

{
"host" : "localhost",
"user" : "-" ,
"method" : "GET"
}

Well, that’s even easier to read than the regular tomcat log format.

Here come the new tools

But so far we have not changed much, we are still saving the log into a file, and after that I had to bulk load and all the crap. But then I found cube an amazing node.js udp/tcp analytic tool that uses mongodb as its backend. AMAZING! So now we just need to change our log format to follow it’s convention and then send it via UDP to the server, that could be online, and this is the most exciting part. I just needed to change the AccessLog to output the correct format and send it over UDP. Should be easy. It was NOT!

Problems with Access Log Valve

Ok, now I just had to put my log into cube’s format:

{
  "type": "request",
  "time": "2012-04-28T15:28:20.212Z",
  "data": {
    "host": "localhost",
    "user": "-",
    "method": "GET"
  }
}

And this is where I bumped into the first issue with the AccessLogValve. According to the docs, you do have a datetime parameter the %t, but that is not the timestamp format expected by cube. On the javadoc shows an option to customize the date format by using %t{format}, well it did not work. Tomcat ignores that and just put the same format and to make it worst it adds the string format to the generated string. I’m probably doing something wrong here, but did not had the time to investigate, so let’s move on.

Fixing Tomcat access log with a custom log valve

I found a blog with a nice example on how to delegate AccessLog to a log4j, I’m only changing a bit to overcome the problem with the dateformat of tomcat valve. To overcome that I built the JSON string inside the Valve, and added only the data:{} part of the object as the data from the logging, it kinda makes sense right?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
package com.fb.tomcat;
 
import java.text.DateFormat;
import java.text.MessageFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
 
import org.apache.catalina.valves.AccessLogValve;
import org.apache.log4j.Logger;
 
public class Log4JAccessLogValve extends AccessLogValve{
 
	private final Logger logger = Logger.getLogger("accessLog");
 
	protected static final String info1 ="com.fbt.tomcat.Log4JAccessLogValve";
	final DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.S'Z'");
 
	@Override
	public void log(String message) {
		String json = "'{'\"type\":\"request\", \"time\":\"{0}\", \"data\": '{' {1} '}' '}'";
		String text = MessageFormat.format(json, new Object[]{df.format(new Date()),message});
		logger.info(text);
	}
 
	@Override
    public String getInfo()
    {
        return info1;
    }
 
    @Override
    protected void open()
    {
    }
}

Well, even without the timestamp bug on the valve, we had the issue to send the logs over UDP instead of saving them into disk. At this point we have a few alternatives in hand:

  1. Modify the AccessLogValve to send the log message over UDP.
  2. Use the syslog Appender and make syslog forward the message to cube (this is what loggly does)
  3. Use an UDPAppender (part of log4j companions)

There’s no free cake folks. All of them come with a cost, We just need to pick one.

  1. In this option you’ll need to deal with possible connection problems, synchronization and low level stuff
  2. The easier one, but you will need to have a syslogd running and configured
  3. The UDPAppender is not part of ANY log4j official package. And the build is broken, you have to face some nasty Maven SHIT to get it working

So let’s go ahead and choose the second implementation.

Log4j configuration

og4j.logger.accessLog=INFO, SYSLOG, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=${catalina.home}/logs/access_log4j.log
log4j.appender.file.Append=true
log4j.appender.file.MaxFileSize=20MB
log4j.appender.file.MaxBackupIndex=2
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%m
 
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=localhost
log4j.appender.SYSLOG.Facility=local2
log4j.appender.SYSLOG.FacilityPrinting=true

XSyslog config

Now you need to configure your syslog to forward to cube own server. I’m putting here the config for my linux rsyslog daemon, please refer to your own syslog tool to get the job done. Just append the following line to your /etc/rsyslog.conf

local2.* @localhost:1180

Summing up

  1. Install mongoDB
  2. Install cube
  3. Copy log4j.jar to your tomcat lib folder, along with the log4j.properties shown here
  4. Copy your compiled custom AccessLog jar into bin folder of tomcat
  5. Modify server.xml to use the new AccessLogValve
  6. Modify rsyslog.conf to forward the logs
  7. Access localhost:1081 and enjoy some real time logging monitoring

Happy coding (and monitoring)