Logs

Making sense of Logs

What is Log

In a Linux server environment, logs play a crucial role in system administration, offering detailed records of system events, application activities, and security incidents. These logs are typically stored in the /var/log directory and encompass various types of logs such as system logs (/var/log/syslog), authentication logs (/var/log/auth.log), and application-specific logs. By examining these logs, administrators can monitor the health and performance of the server, identify and troubleshoot errors, and gain insights into user activities and system behaviors.

Logs are invaluable for diagnosing issues and ensuring system stability. For example, if a server experiences unexpected crashes or performance degradation, administrators can analyze relevant logs to pinpoint the root cause, whether it be a hardware failure, a misconfigured service, or an application bug. System logs can reveal patterns of recurring errors, while security logs help detect unauthorized access attempts or suspicious activities, aiding in proactive measures to fortify the server against potential threats.

In addition to their diagnostic utility, logs support compliance and auditing requirements. Many industries are subject to regulations mandating detailed record-keeping and reporting on system activities. By maintaining comprehensive log files, organizations can demonstrate adherence to these regulations and provide necessary documentation during audits. Furthermore, automated log management tools can assist in aggregating, filtering, and analyzing log data, enabling efficient monitoring and alerting systems that enhance the overall security posture and operational efficiency of Linux servers.

git-main-01
git-install-01

How to install log

On most system it is installed by default but if it's not it should be in the standard repo for the current linux distro. For Debian and Ubuntu

Enabled when installing a server app

Termomonlogy and Concept

  1. Syslog: A standard protocol used for logging system messages. It allows the separation of the software that generates messages from the system that stores them and the software that reports and analyzes them.
  2. Log Rotation: The process of managing log files, which involves archiving older logs and creating new ones to prevent log files from consuming too much disk space. This is typically managed by tools like logrotate.
  3. Daemon: A background process that runs on the system, often responsible for logging various events. Common logging daemons include rsyslog, syslog-ng, and journald.
  4. /var/log: The standard directory in Linux systems where log files are stored. Subdirectories and files within /var/log contain logs from the system, applications, and various services.
  5. Authentication Logs: Logs that track login attempts, user authentications, and related activities. Common files include /var/log/auth.log and /var/log/secure.
  6. System Logs: General logs about the system’s operation, typically found in /var/log/syslog or /var/log/messages, depending on the distribution.
  7. Application Logs: Logs generated by specific applications, stored in files usually named after the application within /var/log.
  8. Kernel Logs: Logs that contain messages from the kernel, often found in /var/log/kern.log or accessed via the dmesg command.
  9. Error Logs: Logs that specifically capture error messages, aiding in troubleshooting and debugging issues. Common examples include /var/log/httpd/error_log for Apache or /var/log/mysql/error.log for MySQL.
  10. Event Logs: Logs that record significant events within the system or applications, useful for auditing and monitoring purposes.
  11. Log Levels: Categories that define the severity or importance of log messages. Common levels include DEBUG, INFO, NOTICE, WARNING, ERROR, CRITICAL, ALERT, and EMERGENCY.
  12. Log Parsing: The process of analyzing log file content to extract meaningful information. This often involves using tools like awk, sed, grep, or specialized log analysis software.
  13. Log Aggregation: The practice of collecting logs from multiple sources into a centralized location for easier analysis and management. Tools like ELK Stack (Elasticsearch, Logstash, Kibana) are often used for this purpose.
  14. Log Monitoring: Continuously checking log files for specific patterns or thresholds to detect issues in real-time. This can be automated using tools like Nagios, Splunk, or Prometheus.
  15. Audit Logs: Logs that provide a record of security-relevant events, often used for compliance and forensic analysis. The auditd service is commonly used to create and manage audit logs in Linux.
  16. Syslog Facility: A code that represents the type of program or subsystem generating a log message (e.g., AUTH, CRON, KERN, MAIL).
  17. Log Forwarding: The process of sending log messages from one system to another, often used in distributed environments to centralize logs for better analysis and management.
  18. Structured Logging: A logging practice where log entries are formatted in a consistent, machine-readable way (e.g., JSON format) to facilitate automated parsing and analysis.
  19. Log Retention Policy: Guidelines that define how long different types of log data should be kept before being archived or deleted, balancing storage costs with compliance and operational needs.
  20. Security Information and Event Management (SIEM): A solution that provides real-time analysis of security alerts generated by network hardware and applications, often integrating log collection and analysis capabilities.

NGinx Logs

The Access Log
The format of an the access log is
52.167.144.18 – – [20/May/2024:06:31:01 +0200] “GET /wp-content/uploads/2022/08/064-26_1-300×300.png HTTP/1.1” 200 134970 “-” “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36” “52.167.144.18” 535 0.004 “134970” “0.003” “ljusexperten.se”

Client IP Address
52.167.144.18
(This is the IP address of the client making the request.)

Identity of the Client:

(This field is typically used for the client identity but is not used here, hence represented by a hyphen.)

User ID

(This field would contain the user ID if user authentication is used. It is a hyphen here, indicating no user ID.)

Timestamp
[20/May/2024:06:31:01 +0200]
This shows the date and time when the request was received, formatted as [day/month/year:hour:minute:second timezone].

Request Line
“GET /wp-content/uploads/2022/08/064-26_1-300×300.png HTTP/1.1”
This contains the HTTP method (GET), the requested resource (/wp-content/uploads/2022/08/064-26_1-300×300.png), and the HTTP protocol version (HTTP/1.1).

Status Code
200
(This is the HTTP status code returned by the server. 200 means the request was successful.)

Response Size
134970
(This is the size of the response body in bytes.)

Referrer
“-”
(This field typically contains the URL of the referring page. It is a hyphen here, indicating no referrer.)

User-Agent
“Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36”
(This string identifies the client software making the request. Here, it indicates the request was made by the Bing search engine bot.)

Client IP Address (Again)
“52.167.144.18”
(This is the IP address of the client again, likely included in the log format for redundancy.)

Request Processing Time
535
(This is the time taken to process the request on the server, usually measured in milliseconds.)

Request Time
0.004
(This is the total time taken to handle the request, typically from the moment the request is received to the final response, measured in seconds.)

Response Size (Again)
“134970”
(This is the size of the response body in bytes again, likely included for redundancy.)

Request Time (Again)
“0.003”
(This is the total request handling time again, likely included for redundancy.)

Host
“ljusexperten.se”
(This is the host header from the request, indicating the domain name requested.)

To extract values from this file awk is your friend
User-Agent – ALL
awk -F\" '{print $6}' filtered_access_nginx.log

Get that list sorted
awk -F\" '{print $6}' filtered_access_nginx.log | sort

And pipe that to get a count of each line
awk -F\" '{print $6}' filtered_access_nginx.log | sort | uniq -c

And sort it in desc order – You can combine this with head and tail
awk -F\" '{print $6}' filtered_access_nginx.log | sort | uniq -c | sort -nr

NGinx Access Log (awk also works for Access SSL) - Counts total number request based on user agent

The Access Log
The format of an the access log is
52.167.144.18 – – [20/May/2024:06:31:01 +0200] “GET /wp-content/uploads/2022/08/064-26_1-300×300.png HTTP/1.1” 200 134970 “-” “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36” “52.167.144.18” 535 0.004 “134970” “0.003” “ljusexperten.se”

Client IP Address
52.167.144.18
(This is the IP address of the client making the request.)

Identity of the Client:

(This field is typically used for the client identity but is not used here, hence represented by a hyphen.)

User ID

(This field would contain the user ID if user authentication is used. It is a hyphen here, indicating no user ID.)

Timestamp
[20/May/2024:06:31:01 +0200]
This shows the date and time when the request was received, formatted as [day/month/year:hour:minute:second timezone].

Request Line
“GET /wp-content/uploads/2022/08/064-26_1-300×300.png HTTP/1.1”
This contains the HTTP method (GET), the requested resource (/wp-content/uploads/2022/08/064-26_1-300×300.png), and the HTTP protocol version (HTTP/1.1).

Status Code
200
(This is the HTTP status code returned by the server. 200 means the request was successful.)

Response Size
134970
(This is the size of the response body in bytes.)

Referrer
“-”
(This field typically contains the URL of the referring page. It is a hyphen here, indicating no referrer.)

User-Agent
“Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36”
(This string identifies the client software making the request. Here, it indicates the request was made by the Bing search engine bot.)

Client IP Address (Again)
“52.167.144.18”
(This is the IP address of the client again, likely included in the log format for redundancy.)

Request Processing Time
535
(This is the time taken to process the request on the server, usually measured in milliseconds.)

Request Time
0.004
(This is the total time taken to handle the request, typically from the moment the request is received to the final response, measured in seconds.)

Response Size (Again)
“134970”
(This is the size of the response body in bytes again, likely included for redundancy.)

Request Time (Again)
“0.003”
(This is the total request handling time again, likely included for redundancy.)

Host
“ljusexperten.se”
(This is the host header from the request, indicating the domain name requested.)

To extract values from this file awk is your friend
User-Agent – ALL
awk -F\" '{print $6}' filtered_access_nginx.log

Get that list sorted
awk -F\" '{print $6}' filtered_access_nginx.log | sort

And pipe that to get a count of each line
awk -F\" '{print $6}' filtered_access_nginx.log | sort | uniq -c

And sort it in desc order – You can combine this with head and tail
awk -F\" '{print $6}' filtered_access_nginx.log | sort | uniq -c | sort -nr

NGinx Access Log (awk also works for Access SSL) - Counts total number request based on ip

To extract values from this file awk is your friend
IP timestamp and User-Agent(s)
awk -F\" '{print $1, $6}' filtered_access_nginx.log

Filter so that only ip adress and user agent is left
awk -F'"' '{print $1, $6}' filtered_access_nginx.log | awk '{printf "%s ", $1; for (i=7; i<=NF; i++) printf "%s ", $i; print ""}'

Sort it, and count total numbers of ip adress and sort then based on that count
awk -F'"' '{print $1, $6}' filtered_access_nginx.log | awk '{printf "%s ", $1; for (i=6; i<=NF; i++) printf "%s ", $i; print ""}' | sort | uniq -c | sort -nr

Apache Access Log

The Access Log
The format of an the apache access log is
52.167.144.18 – – [20/May/2024:06:30:40 +0200] “GET /varumarken/?query_type_farg=or&filter_farg=brunbets,elephant,galvaliseradaniseret-stal,gr%E2%88%9A,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al HTTP/1.0” 200 31817

Client IP Address
52.167.144.18
(This is the IP address of the client making the request.)

Identity of the Client:

(This field is typically used for the client identity but is not used here, hence represented by a hyphen)

User ID:

This field would contain the user ID if user authentication is used. It is a hyphen here, indicating no user ID.

Timestamp
[20/May/2024:06:30:40 +0200]
(This shows the date and time when the request was received, formatted as [day/month/year:hour:minute:second timezone].)

Request Line
“GET /varumarken/?query_type_farg=or&filter_farg=brunbets,elephant,galvaliseradaniseret-stal,gr%E2%88%9A,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al,st%E2%88%9Al HTTP/1.0”
( This contains the HTTP method (GET), the requested resource (/varumarken/?query_type_farg=or&filter_farg=…), and the HTTP protocol version (HTTP/1.0). )

Status Code
200
This is the HTTP status code returned by the server. 200 means the request was successful.

Response Size
31817
( This is the size of the response body in bytes )

————————————————————————————

To extract values from this file awk is your friend
IP and Timestamp
awk -F\" '{print $1}' filtered_access_apache.log

HTTP Verb
awk -F\" '{print $2}' filtered_access_apache.log

Status Code and response size
awk -F\" '{print $3}' filtered_access_apache.log

Commands

Some commonly used log commands
DESCRIPTION COMMAND
Get the 5 FIRST lines in filename.txt head -n 5 filename.txt
Get the 5 LAST lines in filename.txt tail -n 5 filename.txt