You are consultant and you are asked to create a script to analyze the logs files on a server. Your client is interested in knowing the remote connection attempts using the SSH protocol. This task requires you to process a set of files, taking into account only events related to connection attempts.
Keep only the main part
The logs are available in the auth.zip file.
In the provided files, several lines are not connection attempts.
Find a way, as automatically as possible, to keep only what will be useful to you later. This will allow you to group all useful data in a single file, rather than having it scattered over several.
What is the size of the resulting file and how can you reduce it to save storage space?
Analysis
The work you have been asked to do has a long-term objective, so you decide to write a script that can be run on demand without having to know all the commands required to operate it.
Create a script which, from a file like the one above, displays the following information according to the options passed to it:
-u
|
(†) user IDs of successfully logged-in users and, at the end, their total number |
-U
|
(†) identification of rejected users and, at the end, their total number |
-i
|
(†) list of IP addresses of users who have successfully logged on |
-I
|
(†) list of IP addresses of rejected users |
-b
|
(†) list of IP addresses blocked and, at the end, their total number |
-B
|
(†) list of IP addresses blocked, each followed by its total blocking time (note that the same IP address may be blocked several times) |
-n
|
(†) list of IP addresses whose users have been rejected but not blocked, and their total number |
-d
|
average duration of IP address blocks |
-D [IP]
|
(†) start and end dates of attacks emanating from the IP address |
-f
|
average weekly frequency of successful connections |
-F
|
average daily frequency of unsuccessful connections |
-c
|
provides a list of successful connections in CSV[1] with the following columns:
|
-C
|
same as |
This script always takes the name of the log file as a parameter, and can accept only one option.
Methodology
-
You should only use the Bash scripting language, the usual Linux commands and a few advanced utilities like
sedandawk. Do not use Python, R or any other command (unless explicitly requested). Common commands arebasename,cat,cp,cut,date,echo,grep,head,ls,mv,seq,sort,tail,touch,tr,uniq,wc. -
Read and reread the documentation for the commands you want to use (manual pages, help obtained with
--helpor-hwhen available). -
The questions are not sorted by difficulty. Don’t get stuck, go and answer others and come back to the problems later with your experience.
-
Design your commands to be as efficient as possible and to use as few sub-shells as possible.
-
Don’t try to make a script from the start; proceed step by step, solving each problem one after the other.
-
Test every step of the way.
-
When you get to the script development, imagine that it won’t be used by you, or even by someone with Shell knowledge.
-
Imagine that you’re part of a team and that your script will be modified by people other than yourself.
-
An IP address has a very specific format, and you could take advantage of a function that checks the validity of this format to prevent usage errors in arguments that request an IP address.
-
To react effectively to script options, take a look at the internal command
getopts. -
In your answers, indicate only the commands used, never give the result of these commands unless prompted.
-
In your answers for options marked with (†), briefly explain why you use such and such a command with such and such an option.