Parsing java log files

Apache log file parsing Using Java

In this post, we will be looking at how to parse the apache log file in Java. We will also be looking at different parts of the regular expression that will help us parse the apache log file in detail.

The file format was designed for human inspection but not for easy parsing. The problem is that different delimiters are used in the log file – square brackets for the date, quotes for the request line, and spaces sprinkled all through. If you try to use a StringTokenizer, you might be able to get it working, but you would spend a lot of time fiddling with it. Regex will save you a lot of lengthy code, and let’s understand how?

A sample Apache log line looks something like the below :

String ApacheLogSample = "123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] \"GET /java/javaResources.html "+ "HTTP/1.0\" 200 10450 \"-\" \"Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)\"";

And below is the regex for parsing the above file line:

String regex = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+-]\\d)\\] \"(.+?)\" (\\d) (\\d+) \"([^\"]+)\" \"(.+?)\"";
  • ([\d.]+)
    It represents digits followed by a dot(.), eg -> 123.
  • +
    It is used to get any number of digits followed by a dot(.), which will help get the IPs in the log file.
  • (\S+)
    This matches any character that is not a whitespace character.
  • \[([\w:/]+\s[+-]\d)\] -> [w:/]
    This represents a word followed by a colon(:) or slash(/). It will cover 27/Oct/2000:09:27:09 in the ApacheLogSample String, \s[+-], means a whitespace character followed by either plus(+) or minus(-), and d represents exactly four repetitions of digits.
  • (.+?)
    It is used to get any character up to the quotes. We can’t use (.+) here, because that would match too much(up to the quote at the end of the line).
  • \d
    It will match precisely 3 repetitions of digits, e.g., 123 or even 1234, but not 12.
  • (\d+)
    It will match any number of digits.
  • ([^”]+)
    It will match any character other than double quotes ( » ).

After understanding the above regex, let’s look at the program to parse the file in java. Here, we use double slash ( \\ ) to escape the characters only.

public class ApacheLogParser < public static void main(String argv[]) < String regex = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+-]\\d)\\] \"(.+?)\" (\\d) (\\d+) \"([^\"]+)\" \"(.+?)\""; String ApacheLogSample = "123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] \"GET /java/javaResources.html " + "HTTP/1.0\" 200 10450 \"-\" \"Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)\""; Pattern p = Pattern.compile(regex); System.out.println("Apache log input line: " + ApacheLogSample); Matcher matcher = p.matcher(ApacheLogSample); if (matcher.find()) < System.out.println("IP Address: " + matcher.group(1)); System.out.println("UserName: " + matcher.group(3)); System.out.println("Date/Time: " + matcher.group(4)); System.out.println("Request: " + matcher.group(5)); System.out.println("Response: " + matcher.group(6)); System.out.println("Bytes Sent: " + matcher.group(7)); if (!matcher.group(8).equals("-")) System.out.println("Referer: " + matcher.group(8)); System.out.println("User-Agent: " + matcher.group(9)); >> >

The output of the program :

Apache log input line: 123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] "GET /java/javaResources.html HTTP/1.0" 200 10450 "-" "Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)" IP Address: 123.45.67.89 UserName: - Date/Time: 27/Oct/2000:09:27:09 -0400 Request: GET /java/javaResources.html HTTP/1.0 Response: 200 Bytes Sent: 10450 User-Agent: Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)

So, that’s it. This is all you have to do to parse an apache log file using java and regex. If you want to learn more about regex, then you can see the below topics –

We hope that you find it helpful. If you have any doubts or concerns, feel free to write us in the comments or mail us at [email protected].

Источник

Parsing java log files

Можно было бы сократить первую часть про неправильный код. А вообще задача понравилась — наглядно иллюстрирует, как можно с помощью правильной архитектуры значительно упростить себе жизнь.

Не очень понимаю, почему все пишут про использование стримов. Как делал я. 1. Сделал метод, который возвращает List — записей из файлов логов. 2. Сделал метод, который возвращает урезанный List, которые укладываются в заданный временной интервал. Все, дальше проходя по листу итерацией можно крутить его как хочешь по любым условия, хочешь ip доставай, хочешь user + status.

Очень крутая задача =) Подсказка, в классе дате есть методы boolean after(Date date)/before(Date date).

Мозгодробильная задача которая часть 7. Хз что нужно валидатору. Тесты бы что ли показывали, на которых типа падает при проверке. Отложу ее в сторону. Может быть потом когда-нибудь вернусь. Плюс этой задачи в целом: лучше стал понимать стримы. Но часть 7 . валидатор — бессердечная ты сцука!

в 7 задаче не забываем помнить что в новом запросе на месте «after» и «before» тоже могут быть «null» и что границы не включаем в результат 🙂

Задача 1. «1.2.1. «. то нужно возвратить данные касающиеся только данного периода (включая даты after и before).» «включая даты after и before» — это значит, что данные за эту дату так же должны быть добавлены во множество! Кучу времени потратил на выяснение, что условие «включая даты after и before» не условие. Задача 5. Есть решение в «лоб», а есть красивое решение, у JR красивое. В коде JavaRush есть интересные решения.

Задачка просто 10 из 10 по полезности. Чему научился: — закрепил знания в теме Stream API; — научился делать так как надо СРАЗУ, а то потом задолбаешься переделывать); — научился выносить общие части кода в отдельные методы; — немножко регулярок)

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Java log parser. The goal was to write a parser in Java that parses web server access log file, loads the log to MySQL and checks if a given IP makes more than a certain number of requests for the given duration.

jakubpas/java-log-parser

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

The goal is to write a parser in Java that parses web server access log file, loads the log to MySQL and checks if a given IP makes more than a certain number of requests for the given duration.

(1) Create a java tool that can parse and load the given log file to MySQL. The delimiter of the log file is pipe (|)

(2) The tool takes «startDate», «duration» and «threshold» as command line arguments. «startDate» is of «yyyy-MM-dd.HH:mm:ss» format, «duration» can take only «hourly», «daily» as inputs and «threshold» can be an integer.

(3) This is how the tool works:

java -cp "parser.jar" com.ef.Parser --startDate=2017-01-01.13:00:00 --duration=hourly --threshold=100 The tool will find any IPs that made more than 100 requests starting from 2017-01-01.13:00:00 to 2017-01-01.14:00:00 (one hour) and print them to console AND also load them to another MySQL table with comments on why it's blocked. java -cp "parser.jar" com.ef.Parser --startDate=2017-01-01.13:00:00 --duration=daily --threshold=250 The tool will find any IPs that made more than 250 requests starting from 2017-01-01.13:00:00 to 2017-01-02.13:00:00 (24 hours) and print them to console AND also load them to another MySQL table with comments on why it's blocked. 

(1) Write MySQL query to find IPs that mode more than a certain number of requests for a given time period.

Ex: Write SQL to find IPs that made more than 100 requests starting from 2017-01-01.13:00:00 to 2017-01-01.14:00:00. 

(2) Write MySQL query to find requests made by a given IP.

Date, IP, Request, Status, User Agent (pipe delimited, open the example file in text editor)

Date Format: «yyyy-MM-dd HH:mm:ss.SSS»

Also, please find attached a log file for your reference.

The log file assumes 200 as hourly limit and 500 as daily limit, meaning:

(1) When you run your parser against this file with the following parameters

java -cp «parser.jar» com.ef.Parser —startDate=2017-01-01.15:00:00 —duration=hourly —threshold=200

The output will have 192.168.11.231. If you open the log file, 192.168.11.231 has 200 or more requests between 2017-01-01.15:00:00 and 2017-01-01.15:59:59

(2) When you run your parser against this file with the following parameters

java -cp «parser.jar» com.ef.Parser —startDate=2017-01-01.00:00:00 —duration=daily —threshold=500

The output will have 192.168.102.136. If you open the log file, 192.168.102.136 has 500 or more requests between 2017-01-01.00:00:00 and 2017-01-01.23:59:59

(1) Java program that can be run from command line

java -cp "parser.jar" com.ef.Parser --accesslog=/path/to/file --startDate=2017-01-01.13:00:00 --duration=hourly --threshold=200 

(2) Source Code for the Java program

(3) MySQL schema used for the log data

(4) SQL queries for SQL test

(1) The application was created using java8, spring-boot and maven To generate schema run:

If you use different username or password remember to change it in the application.yml config file

The program can by compiled to jar by:

(2) The command line parameters were implemented

(3) The application can be run by default by:

java -jar target/log_parser-0.0.1-SNAPSHOT.jar --accesslog=access.log --startDate=2017-01-01.13:00:00 --duration=hourly --threshold=100 

To show help message you can simply run:

java -jar target/log_parser-0.0.1-SNAPSHOT.jar 

(1) Write MySQL query to find IPs that mode more than a certain number of requests for a given time period.

 SELECT l.ip FROM log l WHERE l.date BETWEEN '2017-01-01.13:00:00' AND '2017-01-01.13:00:00' + INTERVAL 1 HOUR GROUP BY l.ip HAVING count(l.ip) >= 100; 

(2) Write MySQL query to find requests made by a given IP.

 SELECT l.* from log l WHERE ip = '192.168.228.188'; 

(1) Java program that can be run from command line

(2) Source Code for the Java program

(3) MySQL schema used for the log data

In schema.sql file in root directory 

(4) SQL queries for SQL test

I did not provided validation and unit tests but I am happy to add them if you like. The solution was tested on Ubuntu 14.04 and MacOS Sierra

About

Java log parser. The goal was to write a parser in Java that parses web server access log file, loads the log to MySQL and checks if a given IP makes more than a certain number of requests for the given duration.

Источник

Читайте также:  Get request domain php
Оцените статью