Introduction to Bayesian inference with PyStan – Part II
22/07/2020

Paving the way to successful Log Management metamorphosis – Deploying Graylog in AWS (Ubuntu 20.04 LTS)

Today’s change in technological and methodological developments to Cloud Computing, Continuous Integration/ Continuous Delivery (CI/CD) and DevOps – together with the shift from monolithic to lightweight micro-service architecture pattern, is enabling organisations to speed up development and deployment production applications.

A paradigm shift that also comes with shortcomings. Distributed logs, including the proliferation of instances and containers, are making log management and monitoring much more of a challenge. Not only sheer the volume of interconnected data points across modularised/ distributed systems is to be considered. Moreover, the structured and semi-structured log data entails being parsed, normalised and analysed in real-time. As micro-services run on multiple hosts, log messages generated by micro-services are spread across multiple servers – making it exceed human abilities to find valuable information or permit tracking errors to their source for correction, amidst many logs files (without even mentioning auto-scaled environments). In many ways, organisations embarking on the journey to this paradigm shift – successful development and operations comes down to successful Log Management to grant full visibility into the health of micro-service environments and fulfil logging and monitoring requirements for compliance.

What is Log Data?

Logs or log files can be described as the lingua franca of a computer system, software and other network apparatus emitted in response to an event occurring within a system or network.

In general, a log file consists of 3 attributes:

  1. Timestamp – the time and date the message was generated.

  2. Source system – the apparatus creating the log file.

  3. Log message – the actual log data.

NIST[1] categorises log events in 3 types: security software-, operating system- and application logs. Yet, there is no standardisation on the extension of the log files or the schema of the log data i.e. content, format, or severity – leading to each system, application or network generating different log files in different formats.

Getting Insights from Log Data

Central Log Management is critical and essential when organisations become steeped in the mindset of moving towards Cloud Computing and light-weight micro-service architectures. First and foremost, a holistic view of log data generated across the enterprise infrastructure eliminates the complexity and is much more powerful than analysing log data in isolation. Per contra, ingestion log data from different source points is leading up to implications arising by non-standardisation, which makes it exceptionally hard or even infeasible to analyse log events side-by-side without a tool. A Log Management solution is required to centralise, correlate and analyse all log files, to ensure that data hidden in logs are turned into meaningful, actionable insights. Besides a centralised log management solution, also a systematic and comprehensive approach is required to be able to analyse log data from the entire infrastructure stack. Typical use cases facilitated by centralised log management solutions are:

Real-time Monitoring and Troubleshooting: The accumulation of all performance and error log data in one central location and making it accessible to authorised users plays a crucial role in reducing MTTR (mean-time-to-recovery) through time-efficient and proactive monitoring and breaking down the barriers between IT Ops and developers. Automated monitoring and issue troubleshooting help to assure application and infrastructure health by tailing logs in real-time to pinpoint and alert on operational problems to further drill-down to find the root cause of the issue.

Event Correlation: A powerful analysis technique that allows drawing complex relations from various log events into identifiable patterns. If those identified patterns indicate anomalies – automated actions (i.e. alerts or alarms) based on defined conditions and rules can be performed to achieve a streamlined in-depth control. Event correlation is typically used to identify indicators of an attack to enhance security and enable security professionals to detect and alert on threats.

Compliance and Regulations: After all, not only the rapidly evolving technology landscape has reinforced the need for a log management solution. Security and compliance regulations mandate organisations to collect, retain and protect log data and provide its availability for auditing e.g. PCI DSS, ISO 27002 or GDPR.

Despite COVID-19 hitting enterprise wallets – the demand for Log Management solutions is still anticipated to grow. Data Insights is a holistic solution integrator with extensive expertise in Log Management. In the following – it will be demonstrated how to install Graylog in AWS on an Ubuntu 20.04 LTS machine.

Graylog as Central Log Management Solution

Graylog is a powerful open-source enterprise-grade log management system solution, providing an integrated platform for the collection, storage, normalisation, search, analysis and visualisation of log data from across the entire IT infrastructure and application stack on a centralised server. The software operates on a three-tier architecture and scalable storage – built around Elasticsearch and MongoDB.
The minimum system setup consists of the Graylog web interface, Graylog server, Elasticsearch nodes to store log data and provide search capabilities to Graylog, and MongoDB to store configuration data.

Source: Graylog

Guide: How to Deploy Graylog in AWS (Ubuntu 20.04 LTS)

So, here we go. This brief tutorial takes you step-by-step through the process of installing a Graylog server in AWS on a clean Ubuntu 20.04 LTS machine, and the configuration of a simple input that receives system logs.

Step 1: Deploy Ubuntu 20.04 LTS server in AWS

Step 2: Install OpenJDK, MongoDB, Elasticsearch

Step 3: Install Graylog

Step 4: Setup Syslog Input

Note: This tutorial does not cover security settings! Make sure the Graylog server is not publicly exposed, and (enterprise) security best practices and guidelines are followed.

Step 1: Deploy Ubuntu 20.04 LTS machine in AWS

  1. Launch an EC2 instance
    Log in the AWS console and in the top navigation bar – go to > Services > EC2 >

    Choose Amazon Machine Image (AMI)
    Select > Ubuntu Server 20.04 LTS (HVM), SSD Volume Type

  2. Choose an Instance Type
    Graylog requires at least 4GB memory – depending on the data volume intended to be collected RAM is to be increased

  3. Finish configuration wizard and spin up the Virtual Machine (VM)
    Note: Add Storage – In this tutorial, we are only doing a basic setup with our VM’s syslog input.
    Optional: If you aim to configure more inputs, increase the disk storage to at least 40GB.
    Go to > Launch
    Choose> Create a new pair, save .pem file locally and select > Launch instances

  4. Configure Security Group
    Once the instance is launched click on the newly launched instance the instance overview page. In the description section, select to the default security group that has been launched by the configuration wizard. Open ports: 9000, 514, 1514 (Select > Source> Anywhere)

  5. Allocate Elastic IP address
    In the navigation pane – go to > Elastic IPs
    Select > Allocate new address > Amazons pool (IPv4 address pool) > Allocate
    Go to > Actions > Associate address
    Select > launched_instance > Associate

  6. SSH to launched AWS instance as Ubuntu user
    In the EC2 instance overview dashboard, select the launched_instance and go to > Connect – and follow the terminal instructions.

  7. Update Ubuntu machine
    To update the Ubuntu machine run the following commands below:

    sudo apt-get update
    sudo apt-get upgrade

When prompted enter y

Step 2: Install OpenJDK, MongoDB and Elasticsearch

Since Elasticsearch is a Java-based software – a prerequisite to run Elasticsearch is the installation of Java.

OpenJDK Installation

To install the open-source version of Java – run the following commands below:

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install apt-transport-https openjdk-8-jre-headless uuid-runtime pwgen

When prompted enter y

To verify the Java installation – run the following command below:


java -version

The output should be similar as below:

MongoDB Installation

To install MongoDB – run the commands below:


sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
sudo apt-get update
sudo apt-get install -y mongodb-org

To enable MongoDB automatically during the operating system’s startup and verify it is running – run the commands below:

1 sudo systemctl daemon-reload
2 sudo systemctl enable mongod.service
3 sudo systemctl restart mongod.service
4 sudo systemctl --type=service --state=active | grep mongod
5 sudo systemctl status mongod

The output should be similar as below: