Wednesday, June 2, 2021

Use python to plot graphs from csv file - simple example

Excercise: Try plotting the data from csv file using python and save the plot to a jpeg or png file, which we can later make it available for other apps.

csvfile name: test.csv

Location: Current working directory

Python version: 2.7.5

Modules used: matplotlib, numpy

Code:

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
data = np.genfromtxt("test.csv", delimiter=",", names=["x", "y"])
plt.plot(data['x'], data['y'])
plt.savefig('/tmp/test1.png')

Code explanation:

numpy is needed to read the file in this example
matplotlib is needed for plotting the chart
matplotlib.use('Agg') is needed to supress the requirement of working DISPLAY variable

csv content:

1,2
2,5
3,7
4,7
5,9
6,10
7,20
8,21
9,22
10,30

Output:

Thank you

Wednesday, April 14, 2021

Run Jobs in parallel using Linux Shell Script (multiple loaders)

Objective:

We have job description in the form of a string contained in a file called job.queue.
Each string corresponds to a job itself.
We have a lot of jobs in the file job.queue.
If we run them in sequential mode, it will take a lot of time.
We would like to parallelise them using linux shell script here to reduce the overall job run time.

Lab setup:

OS: Centos 7.9.2009
Directory needed: ~/ptest
Files: job.queue, actual script file,job.parallel

Code to create job.queue:

cd ~/ptest;
cat /dev/null > job.queue;
for i in {1..50000}
do
echo job${i} >> job.queue;
done
echo 10 >job.parallel;

Code to launch the jobs in parallel:

cd ~/ptest;

#!/bin/ksh
#code for parallel processing - model 2
cd ~/ptest;
cat /dev/null > job.processed;
cat /dev/null > job.tracker;
cat /dev/null > job.lock;
ldcnt=5;
export ldcnt;
#Module to reset all operational files to the count of loaders.
for i in $(eval echo "{1..${ldcnt}}")
do
cat /dev/null > job.processed.th${i};
cat /dev/null > job.tracker.th${i};
cat /dev/null > job.lock.th${i};
rm job.parallel.th${i} >/dev/null 2>&1;
rm job.queue.th${i} >/dev/null 2>&1;
done
if ! [ -f job.parallel ];
then
echo 1 > job.parallel;
fi
fsq()
{
exec 200>job.lock.${thn};
while true
do
flock -xn 200 && break || sleep .01;
done
echo $key1,processed >> job.processed.${thn};
200>&-
}
loader()
{
typeset -i i;
i=0;
while read line
do
pth=$(cat job.parallel.${thn});
if [ $i -ge ${pth} ];
then
while true
do
if [ $i -lt ${pth} ];
then
break;
else
sleep .01;
i=$(ps -ef|grep -i fsq|grep ${thn}|grep -v grep|wc -l);
fi
done
fi
((i++));
key1=$line;
export -f fsq;
export key1;
{ nohup bash -c fsq ${thn} & } >/dev/null 2>&1;
jid=$(echo $!);
echo $line - submitted under $jid|tee -a job.tracker.${thn};
done < job.queue.${thn}
}
#Module to launch the loaders
typeset -i j;
j=0;
#Module to break the work unit
lcnt=$(cat job.queue|wc -l);
spltcnt=$(echo "scale=0; $(bc -l <<< ${lcnt}+${ldcnt}-1)/${ldcnt}"|bc);
awk -v s=${spltcnt} 'NR%s==1 { file = FILENAME ".th" sprintf("%d", (NR/s)+1) } { print > file }' job.queue;
#parallel thread distributor
pth=$(cat job.parallel);
typeset -i m;
m=0;
if [ ${pth} -ge ${ldcnt} ];
then
for l in $(eval echo "{1..${pth}}")
do
((m++));
if [ -f job.parallel.th${m} ];
then
cupval=$(cat job.parallel.th${m});
newval=$(bc -l <<< ${cupval}+1);
echo ${newval} > job.parallel.th${m};
else
echo 1 > job.parallel.th${m};
fi
if [ ${m} -ge ${ldcnt} ];
then
m=0;
fi
done
else
for l in $(eval echo "{1..${ldcnt}}")
do
((m++));
if [ ${pth} -gt 0 ];
then
echo 1 > job.parallel.th${m};
else
echo 0 > job.parallel.th${m};
fi
((pth--));
done
fi
#loader launcher
for k in $(ls -tr job.queue.th*)
do
thn=$(echo $k|cut -d '.' -f 3);
export thn;
export -f loader;
{ nohup bash -c loader ${thn} & } >/dev/null 2>&1;
done

Throughput Chart for Multiple Loader (aka - the above script):

# of jobs/sec is shown below...

Thank you for your visit.

We will meet in next blog.

Monday, March 1, 2021

Install & Setup Hadoop in Standalone Mode

Environment:

Os: Centos 7.9

Kernel: 3.10.0-1160.6.1.el7.x86_64

Java version:

openjdk version "1.8.0_282"

OpenJDK Runtime Environment (build 1.8.0_282-b08)

OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

1) Setup hadoop user in centos machine(follow blog link below)

https://oracledbaplanner.blogspot.com/2021/02/adding-linux-usergroup-and-modifying.html

2) Setup mountpoint for Apache hadoop install

[root@localhost ~]# mkdir /opt/hadoop

[root@localhost ~]# chown hadoop:bigdata /opt/hadoop

[root@localhost ~]# ls -altr /opt/hadoop
total 0
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 2 hadoop bigdata 6 Mar 1 02:14 .

[root@localhost ~]#

3) Download the hadoop binary from url https://downloads.apache.org/hadoop/common/
Specifically like below..

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

4) Once download is finished, gzip decompress and untar the binary

[hadoop@localhost hadoop]$ ls -altr
total 489016
-rw-r--r--. 1 hadoop bigdata 500749234 Jul 15 2020 hadoop-3.3.0.tar.gz
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 2 hadoop bigdata 33 Mar 1 02:32 .
[hadoop@localhost hadoop]$ gzip -d hadoop-3.3.0.tar.gz

[hadoop@localhost hadoop]$ pwd
/opt/hadoop

[hadoop@localhost hadoop]$ ls -altr
total 1034752
-rw-r--r--. 1 hadoop bigdata 1059584000 Jul 15 2020 hadoop-3.3.0.tar
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 2 hadoop bigdata 30 Mar 1 23:13 .
[hadoop@localhost hadoop]$ tar -tvf hadoop-3.3.0.tar|head
drwxr-xr-x brahma/brahma 0 2020-07-06 15:50 hadoop-3.3.0/
-rw-rw-r-- brahma/brahma 175 2020-03-24 13:23 hadoop-3.3.0/README.txt
..

I evaluated the nohup.out to see if there are any errors reported during in untar operation.So the file is nearly 1GB in size. After untar the total usage of the directory is 2GB in size.

[hadoop@localhost hadoop]$ ls -altr
total 1038160
drwxr-xr-x. 10 hadoop bigdata 215 Jul 6 2020 hadoop-3.3.0
-rw-r--r--. 1 hadoop bigdata 1059584000 Jul 15 2020 hadoop-3.3.0.tar
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 3 hadoop bigdata 67 Mar 1 23:13 .
-rw-------. 1 hadoop bigdata 3486147 Mar 1 23:14 nohup.out

[hadoop@localhost hadoop]$ du -sh .
2.0G .

[hadoop@localhost hadoop]$ du -sk .
2092064 .
[hadoop@localhost hadoop]$

5) Detect and set the java home. Run the below command on the centos machine, look for java.home and use that value to set JAVA_HOME in hadoop's .bash_profile file.

java -XshowSettings:properties -version

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre
export JAVA_HOME

6) Now launch a new session of hadoop user and follow the below steps [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html]

a) Goto the untarred directory of the hadoop-3.3.0

[hadoop@localhost hadoop-3.3.0]$ ls -altr
total 84
-rw-r--r--. 1 hadoop bigdata 175 Mar 24 2020 README.txt
-rw-r--r--. 1 hadoop bigdata 1541 Mar 24 2020 NOTICE.txt
-rw-r--r--. 1 hadoop bigdata 27570 Mar 24 2020 NOTICE-binary
-rw-r--r--. 1 hadoop bigdata 15697 Mar 24 2020 LICENSE.txt
-rw-r--r--. 1 hadoop bigdata 22976 Jul 4 2020 LICENSE-binary
drwxr-xr-x. 3 hadoop bigdata 4096 Jul 6 2020 sbin
drwxr-xr-x. 3 hadoop bigdata 20 Jul 6 2020 etc
drwxr-xr-x. 2 hadoop bigdata 4096 Jul 6 2020 licenses-binary
drwxr-xr-x. 3 hadoop bigdata 20 Jul 6 2020 lib
drwxr-xr-x. 10 hadoop bigdata 215 Jul 6 2020 .
drwxr-xr-x. 2 hadoop bigdata 203 Jul 6 2020 bin
drwxr-xr-x. 2 hadoop bigdata 106 Jul 6 2020 include
drwxr-xr-x. 4 hadoop bigdata 288 Jul 6 2020 libexec
drwxr-xr-x. 4 hadoop bigdata 31 Jul 6 2020 share
drwxr-xr-x. 3 hadoop bigdata 67 Mar 1 23:13 ..
[hadoop@localhost hadoop-3.3.0]$

[hadoop@localhost hadoop-3.3.0]$ pwd
/opt/hadoop/hadoop-3.3.0
[hadoop@localhost hadoop-3.3.0]$

b) Create input directory and copy *.xml files from etc/hadoop directory

[hadoop@localhost hadoop-3.3.0]$ mkdir input

[hadoop@localhost hadoop-3.3.0]$ ls -ld input
drwxr-xr-x. 2 hadoop bigdata 6 Mar 2 01:13 input
[hadoop@localhost hadoop-3.3.0]$ cp etc/hadoop/*.xml input/

[hadoop@localhost hadoop-3.3.0]$

c) Launch the below command to let the example copy the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar grep input output 'dfs[a-z.]+'

Output:

2021-03-02 01:24:00,805 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-03-02 01:24:00,932 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-03-02 01:24:00,932 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2021-03-02 01:24:01,129 INFO input.FileInputFormat: Total input files to process : 10
2021-03-02 01:24:01,172 INFO mapreduce.JobSubmitter: number of splits:10
2021-03-02 01:24:01,460 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local440722701_0001
2021-03-02 01:24:01,460 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-03-02 01:24:01,680 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2021-03-02 01:24:01,680 INFO mapreduce.Job: Running job: job_local440722701_0001
2021-03-02 01:24:01,686 INFO mapred.LocalJobRunner: OutputCommitter set in config null
...
2021-03-02 01:24:04,925 INFO mapreduce.Job: map 100% reduce 100%
2021-03-02 01:24:04,926 INFO mapreduce.Job: Job job_local2096128567_0002 completed successfully
2021-03-02 01:24:04,931 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=1203532
FILE: Number of bytes written=3576646
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=17
Map output materialized bytes=25
Input split bytes=127
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=25
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=41
Total committed heap usage (bytes)=273997824
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=123
File Output Format Counters
Bytes Written=23
[hadoop@localhost hadoop-3.3.0]$

d) Verify the output

[hadoop@localhost hadoop-3.3.0]$ cat output/*
1 dfsadmin
[hadoop@localhost hadoop-3.3.0]$

[hadoop@localhost hadoop-3.3.0]$ ls -altr output/*
-rw-r--r--. 1 hadoop bigdata 11 Mar 2 01:24 output/part-r-00000
-rw-r--r--. 1 hadoop bigdata 0 Mar 2 01:24 output/_SUCCESS
[hadoop@localhost hadoop-3.3.0]$ cat output/part-r-00000
1 dfsadmin
[hadoop@localhost hadoop-3.3.0]$

# We are done with Standalone mode of operation. We will see Pseudo-Distributed Operation in seperate blog.

Sunday, February 28, 2021

Adding a linux user/group and modifying user's primary group

Adding a linux user/group and modifying the primary group of the user.

Steps:

[root@localhost ~]# groupadd bigdata

[root@localhost ~]# grep -i bigdata /etc/group
bigdata:x:1001:
[root@localhost ~]#

[root@localhost ~]# useradd -m hadoop

[root@localhost ~]# id hadoop
uid=1001(hadoop) gid=1002(hadoop) groups=1002(hadoop)

[root@localhost ~]# ls -altr /home/hadoop
total 12
-rw-r--r--. 1 hadoop hadoop 231 Mar 31 2020 .bashrc
-rw-r--r--. 1 hadoop hadoop 193 Mar 31 2020 .bash_profile
-rw-r--r--. 1 hadoop hadoop 18 Mar 31 2020 .bash_logout
drwxr-xr-x. 4 root root 35 Mar 1 02:00 ..
drwx------. 2 hadoop hadoop 62 Mar 1 02:00 .

[root@localhost ~]# usermod -g bigdata hadoop

[root@localhost ~]# ls -altr /home/hadoop
total 12
-rw-r--r--. 1 hadoop bigdata 231 Mar 31 2020 .bashrc
-rw-r--r--. 1 hadoop bigdata 193 Mar 31 2020 .bash_profile
-rw-r--r--. 1 hadoop bigdata 18 Mar 31 2020 .bash_logout
drwxr-xr-x. 4 root root 35 Mar 1 02:00 ..
drwx------. 2 hadoop bigdata 62 Mar 1 02:00 .

[root@localhost ~]#

[root@localhost ~]# passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@localhost ~]#

<<<<<<<<<<< set a strong password

Now logged in as hadoop.

Last failed login: Mon Mar 1 02:06:41 EST 2021 from gateway on ssh:notty
Last login: Mon Mar 1 02:06:20 2021

[hadoop@localhost ~]$

[hadoop@localhost ~]$ ls -altr
total 16
-rw-r--r--. 1 hadoop bigdata 231 Mar 31 2020 .bashrc
-rw-r--r--. 1 hadoop bigdata 193 Mar 31 2020 .bash_profile
-rw-r--r--. 1 hadoop bigdata 18 Mar 31 2020 .bash_logout
drwxr-xr-x. 4 root root 35 Mar 1 02:00 ..
-rw-------. 1 hadoop bigdata 5 Mar 1 02:06 .bash_history
drwx------. 2 hadoop bigdata 83 Mar 1 02:06 .
[hadoop@localhost ~]$

This closes this blog.Thanks

Vagrant box - authentication issues

Box Os: Centos 7.9
Kernel: 3.10.0-1160.6.1.el7.x86_64

Symptoms:
1) Authentication Failure after setting up the box with insecure key on first reboot (vagrant halt/up/ssh). This is expected since the Vagrant will replace the insecure key on first boot.

default: SSH auth method: private key
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...

2) Even after generating a new key using the below method and copying the public key to the vagrant box the authorized_keys file will become zero bytes on restart (vagrant halt/up). This leads to authentication failure again.

$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/<useracnt>/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/<useracnt>/.ssh/id_rsa
Your public key has been saved in /home/<useracnt>/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:7KVti+u55MszInXV0py5gkjyAucdhfWoE3R5NtgqUcs <useracnt>@<machineid>
The key's randomart image is:
+---[RSA 3072]----+
| .+++ |
| .oo++= |
| .oE+.= o |
| . o ++. o * |
| + =o+So.. . |
| o =oo+. . |
| o .+ o. |
| . .+o+ . |
| . oX*. |
+----[SHA256]-----+

3) We then set config.ssh.insert_key = false in Vagranfile, this ensures the authorized_keys arent replaced. But still we get encryption related error as below.

The private key you're attempting to use with this Vagrant box uses
an unsupported encryption type. The SSH library Vagrant uses does not support
this key type. Please use `ssh-rsa` or `ssh-dss` instead. Note that
sometimes keys in your ssh-agent can interfere with this as well,
so verify the keys are valid there in addition to standard
file paths.

4) We then generate rsa key with advanced option as below... which generated RSA 4096 instead of RSA 3072. We copy the public key using ssh-copy-id, still the encryption message is thrown.

$ ssh-keygen -t rsa -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/<useracnt>/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/<useracnt>/.ssh/id_rsa
Your public key has been saved in /home/<useracnt>/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:hrSfT0367PoPlEMojHJZGJIbKdS9JadCDkdWiR2ayVs <useracnt>@<machineid>
The key's randomart image is:
+---[RSA 4096]----+
| ..o+B+=. |
| ooO=*=o . |
| **oE*o . . |
| +*oo . . . |
| ..o S = |
| o . = . |
| o o o |
| o o . |
| o+=.. |
+----[SHA256]-----+

Error:

Now Solution:

5) Now after some search, we go through several solutions, none worked and I dont want to workaround the issue. Finally hit https://stackoverflow.com/questions/22922891/vagrant-ssh-authentication-failure solution offered by Sankari steps followed, it worked until resolving the encryption issue, one additional step as detailed below was needed for the fix.

a. Generate the ssh-key pair fresh in the client centos machine...

[vagrant@localhost ~]$ ssh-keygen -t rsa -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/vagrant/.ssh/id_rsa):
/home/vagrant/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/vagrant/.ssh/id_rsa.
Your public key has been saved in /home/vagrant/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:9FVGrkwfmAn7XD5jMBJ+tkG6C3zWLv4BcX/vlSkuYzI vagrant@localhost.localdomain
The key's randomart image is:
+---[RSA 4096]----+
| o ..+ |
| . * B |
| . * & + |
| o . # @ . |
| S * B B .|
| + + . ++|
| o o. oo|
| E =.....|
| =.+. .|
+----[SHA256]-----+

b. Copy the public key to authorized_keys with 0600 permisson.
c. Copy/Paste the private key to local cygwin64's .ssh/ directory.
d. Try vagrant ssh now, it is still attempting to use the below private key [so the stackoverflow solution steps stop here]

debug1: Will attempt key: <windowsdrive>:/VBoxVms/vagrant_home/vcentos79_hivetest/.vagrant/machines/default/virtualbox/private_key explicit

e. Add the below config setting to Vagrantfile config.ssh.private_key_path = ['~/.ssh/id_rsa'], this seems to have done the trick.

debug1: Trying private key: <windowsdrive>:/cygwin64/home/<useracnt>/.ssh/id_rsa

f. Now the vagrant ssh works fine.

debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
Last login: Mon Mar 1 01:01:08 2021 from gateway
[vagrant@localhost ~]$

g. Try a vagrant halt/up. Should be fine.

$ vagrant halt
==> default: Attempting graceful shutdown of VM...
<useracnt>@<machineid> ~/dba/vcentos79_hivetest

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
default: /vagrant => <windowsdrive>:/VBoxVms/vagrant_home/vcentos79_hivetest
==> default: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> default: flag to force provisioning. Provisioners marked to run always will still run.
<useracnt>@<machineid> ~/dba/vcentos79_hivetest

This closes this blog. Thank you for your visit.

Thursday, January 14, 2021

How to stop/start a postgres db

Lab: How to stop/start a postgres db

As postgres user (recommendation is never allow direct connection to postgres, allow other users with su to switch to postgres user).

Start:

[postgres@localhost ~]$ pg_ctl -D /pgData/data -l logfile start
waiting for server to start.... done
server started

[postgres@localhost ~]$
Where /pgData/data is the datafile directory where the postgres database is stored
Examine the postgres db status:

[postgres@localhost ~]$ ps -ef|grep -i postgres
root 9284 9269 0 02:40 pts/0 00:00:00 su - postgres
postgres 9285 9284 0 02:40 pts/0 00:00:00 -bash
postgres 9319 1 0 02:42 ? 00:00:00 /pgBin/12.5/bin/postgres -D /pgData/data
postgres 9321 9319 0 02:42 ? 00:00:00 postgres: checkpointer
postgres 9322 9319 0 02:42 ? 00:00:00 postgres: background writer
postgres 9323 9319 0 02:42 ? 00:00:00 postgres: walwriter
postgres 9324 9319 0 02:42 ? 00:00:00 postgres: autovacuum launcher
postgres 9325 9319 0 02:42 ? 00:00:00 postgres: stats collector
postgres 9326 9319 0 02:42 ? 00:00:00 postgres: logical replication launcher
postgres 9327 9285 0 02:42 pts/0 00:00:00 ps -ef
postgres 9328 9285 0 02:42 pts/0 00:00:00 grep --color=auto -i postgres
[postgres@localhost ~]$

9319 is the parent process for the postgres process tree

[postgres@localhost ~]$ pg_ctl status -D /pgData/data
pg_ctl: server is running (PID: 9319)
/pgBin/12.5/bin/postgres "-D" "/pgData/data"
[postgres@localhost ~]$

Stop:

[postgres@localhost ~]$ pg_ctl -D /pgData/data stop
waiting for server to shut down.... done
server stopped
[postgres@localhost ~]$

Status:

[postgres@localhost ~]$ pg_ctl status -D /pgData/data
pg_ctl: no server running
[postgres@localhost ~]$

Steps to enable postgresql database to accept remote connections

Lab Excercise: Try to connect to our test database using test role/user through pgAdmin version 4.29 console when postgres configured in a virtualbox with NAT network(port forwarded)

We need to do the following if we use virtualbox with NAT network alone.

OS details: Centos 7.9

Steps:

1) NAT network allow internet access but it doesnt allow incoming connections. Hence we setup port forwarding for it to work. Add a port forwarding in virtualbox console for the vm which we are testing.

127.0.0.1-54320 => VM's 5432 (ex. port for postgres)

2) Backup & Edit postgres.conf file, then enable both port & listen_address to '*'
3) Backup & Edit pg_hba.conf file, then make the below entry

host all all 10.0.0.0/16 md5

4) stop the firewall or add exception
a) stop the firewall - pgAdmin connects perfectly fine
b) Add exception via
i) service [see below] - didnt work as expected the pgAdmin console was timing out always. Reason could be the netstat -plantu or ss -plantu didnt show the postgresql service against port we have postgres running, insteas it showed postgres.

But firewall-cmd --get-service was showing postgresql as the service.

[root@localhost ~]# firewall-cmd --add-service=postgresql --permanent
success
[root@localhost ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3
sources:
services: dhcpv6-client ssh
ports:
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
[root@localhost ~]# firewall-cmd --reload
success
[root@localhost ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3
sources:
services: dhcpv6-client postgresql ssh
ports:
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
[root@localhost ~]#
[root@localhost ~]# firewall-cmd --get-service|tr " " "\n"
..
plex
pmcd
pmproxy
pmwebapi
pmwebapis
pop3
pop3s
postgresql
privoxy
..
zabbix-server
[root@localhost ~]#

ii) Try adding the port to exception - pgAdmin connects fine.

[root@localhost ~]# firewall-cmd --add-port=<port>/tcp --permanent
success
[root@localhost ~]# firewall-cmd --reload
success
[root@localhost ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3
sources:
services: dhcpv6-client ssh
ports: <port>/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
[root@localhost ~]#

Thanks for your visit to the blog.

Saturday, January 9, 2021

Postgres Custom Installation to the directory we choose instead going with default rpm install [/usr/local/pgsql]

Environment:

Centos 7.9 - x86_64

Vagrant box:

Custom build box

Postgres version:

12.5 (latest in rel 12 by the time of this documentation)

Install doc:

https://www.postgresql.org/docs/12/install-procedure.html

List of options decided to customize:

1) Path of the postgres binary [--prefix=PREFIX]

2) Other options to include in configure [--with-pgport=1512 for version 12, --with-perl,--with-python for pl/perl & pl/python support]

Options understood but considered for later testing:

1) --with-segsize=SEGSIZE
2) --with-blocksize=BLOCKSIZE
3) --with-wal-blocksize=BLOCKSIZE
4) --enable-dtrace

Leaving them at their default values.

Postgres recommended options not considered for now:

This install is majorly focused on administration side learning rather than development.

1) --enable-debug
2) --enable-cassert

Prerequisites:

https://www.postgresql.org/docs/12/install-requirements.html

make version 3.80 or newer
GCC
tar, gzip, bzip2
Perl 5.8.3
Python 2.4 min.
openssl

Not a exhaustive list, but the ones I am intrested in.

Verification of the precheck:

[root@localhost ~]# make --version

GNU Make 3.82

Built for x86_64-redhat-linux-gnu

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.

[root@localhost ~]# which openssl

/usr/bin/openssl

[root@localhost ~]# perl --version

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

(with 41 registered patches, see perl -V for more detail)

[root@localhost ~]# python --version

Python 2.7.5

[root@localhost ~]#

So I have all the packages I need.

Directory structure and memory foot print:

[root@localhost ~]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sdb1 2.1G 33M 2.1G 2% /tmp

/dev/mapper/dbvg-db00 2.0G 33M 2.0G 2% /pgBin

/dev/mapper/dbvg-db01 2.0G 33M 2.0G 2% /pgData

/dev/mapper/dbvg-db03 2.0G 33M 2.0G 2% /pgBackup

/dev/mapper/dbvg-db02 2.0G 33M 2.0G 2% /pgWal

[root@localhost ~]# free -k

total used free shared buff/cache available

Mem: 1882080 201560 1476888 8792 203632 1528440

Swap: 3774456 0 3774456

[root@localhost ~]#

Steps to install the postgres binary:

Step 1) Source download

Download the binary from https://www.postgresql.org/docs/12/install-getsource.html => https://www.postgresql.org/ftp/source/v12.5/

Step 2) Decompress and unatar

[root@localhost pgbin]# ls -altr

total 133628
drwxrwxrwx. 1 vagrant vagrant 4096 Nov 9 17:39 postgresql-12.5
-rwxrwxrwx. 1 vagrant vagrant 136826880 Jan 9 03:46 postgresql-12.5.tar
drwxrwxrwx. 1 vagrant vagrant 4096 Jan 9 03:50 ..
drwxrwxrwx. 1 vagrant vagrant 0 Jan 9 03:51 .
[root@localhost pgbin]# find . -name 'configure'
./postgresql-12.5/configure
[root@localhost pgbin]#

Step 3) Create necessary directory

Create directory 12.5 under /pgBin mountpoint

Step 4) Launch configure command with necessary flags

cd /pgBin/src12.5/postgresql-12.5/

./configure --prefix=/pgBin/12.5 --with-pgport=1512 --with-perl --with-python

You may want to consider installing the below listed packages based on error.

Extra packages installing:

yum install perl-ExtUtils-MakeMaker perl-ExtUtils-Embed

yum install readline-devel

yum install zlib-devel

yum install python-devel

You can examine config.log or on screen message, the config once finished successful move to the next step.

We have the below list of files freshly created:

-rw-r--r--. 1 root root 83906 Jan 9 04:19 config.log_readline

-rw-r--r--. 1 root root 71852 Jan 9 06:53 config.log_zlibdevel

-rwxr-xr-x. 1 root root 40367 Jan 9 07:01 config.status

-rw-r--r--. 1 root root 3998 Jan 9 07:01 GNUmakefile

drwxrwxrwx. 16 1107 1107 4096 Jan 9 07:01 src

-rw-r--r--. 1 root root 442216 Jan 9 07:01 config.log

drwxrwxrwx. 6 1107 1107 4096 Jan 9 07:01 .

[root@localhost postgresql-12.5]#

Complete list:

[root@localhost postgresql-12.5]# find . -mtime -1 -ls

2117696 4 drwxrwxrwx 6 1107 1107 4096 Jan 9 07:01 .

1316371 4 drwxrwxrwx 16 1107 1107 4096 Jan 9 07:01 ./src

3146075 4 drwxrwxrwx 31 1107 1107 4096 Jan 9 07:01 ./src/include

3148247 36 -rw-r--r-- 1 root root 32788 Jan 9 07:01 ./src/include/pg_config.h

3148248 4 -rw-r--r-- 1 root root 1 Jan 9 07:01 ./src/include/stamp-h

3148249 4 -rw-r--r-- 1 root root 323 Jan 9 07:01 ./src/include/pg_config_ext.h

3148250 4 -rw-r--r-- 1 root root 1 Jan 9 07:01 ./src/include/stamp-ext-h

3148252 0 lrwxrwxrwx 1 root root 30 Jan 9 07:01 ./src/include/pg_config_os.h -> ../../src/include/port/linux.h

1317040 4 drwxrwxrwx 2 1107 1107 4096 Jan 9 07:01 ./src/interfaces/ecpg/include

3148251 4 -rw-r--r-- 1 root root 572 Jan 9 07:01 ./src/interfaces/ecpg/include/ecpg_config.h

1319139 4 -rw-r--r-- 1 root root 1 Jan 9 07:01 ./src/interfaces/ecpg/include/stamp-h

2119058 0 drwxrwxrwx 6 1107 1107 252 Jan 9 07:01 ./src/backend/port

2119591 0 lrwxrwxrwx 1 root root 37 Jan 9 07:01 ./src/backend/port/tas.s -> ../../../src/backend/port/tas/dummy.s

2119592 0 lrwxrwxrwx 1 root root 38 Jan 9 07:01 ./src/backend/port/pg_sema.c -> ../../../src/backend/port/posix_sema.c

2119593 0 lrwxrwxrwx 1 root root 38 Jan 9 07:01 ./src/backend/port/pg_shmem.c -> ../../../src/backend/port/sysv_shmem.c

3148246 36 -rw-r--r-- 1 root root 34059 Jan 9 07:01 ./src/Makefile.global

1319140 0 lrwxrwxrwx 1 root root 31 Jan 9 07:01 ./src/Makefile.port -> ../src/makefiles/Makefile.linux

2119586 84 -rw-r--r-- 1 root root 83906 Jan 9 04:19 ./config.log_readline

2119587 72 -rw-r--r-- 1 root root 71852 Jan 9 06:53 ./config.log_zlibdevel

2119588 432 -rw-r--r-- 1 root root 442216 Jan 9 07:01 ./config.log

2119590 40 -rwxr-xr-x 1 root root 40367 Jan 9 07:01 ./config.status

3148245 4 -rw-r--r-- 1 root root 3998 Jan 9 07:01 ./GNUmakefile

[root@localhost postgresql-12.5]#

Examining a file to see what it has...

./src/Makefile.global

Snipped content: This shows that we chose pgBin/12.5 for our base path in the binary install and postgres configure created a make file which will be later used in the make process.

==========================

prefix := /pgBin/12.5

exec_prefix := ${prefix}

datarootdir := ${prefix}/share

bindir := ${exec_prefix}/bin

==========================

Step 5) Launch make command

make world

Other options are make or make all. Since make world included better coverage chose make world over all or just make.

Output:

PostgreSQL, contrib, and documentation successfully made. Ready to install.

[root@localhost ~]# ls -altr /pgBin/12.5/

total 0

drwxr-xr-x. 2 root root 6 Jan 9 03:55 .

drwxr-xr-x. 4 root root 33 Jan 9 04:08 ..

[root@localhost ~]#

<<< so this directory is still empty

Step 6) make check

As non root user this step has to be done, so I chose vagrant which is the only other user I have in this step.

=======================

All 194 tests passed.

=======================

this step created the below directory under src binary....

drwxrwxr-x. 4 vagrant vagrant 30 Jan 9 07:43 tmp_install

Step 7) make install-world

make install-world

Output:

...

PostgreSQL, contrib, and documentation installation complete.

<<<<<<<<<<<<<<<<< This completes installation.

Post Install

Step 1) Environment Variable config:

The step below will make runtime library [after installation to enable the run-time linker to find the shared libraries faster].

Optional : /sbin/ldconfig /pgBin/12.5/lib

Environment variable to go into ~/.bash_profile

LD_LIBRARY_PATH=/pgBin/12.5/lib

export LD_LIBRARY_PATH

PATH=/pgBin/12.5/bin:$PATH

export PATH

MANPATH=/pgBin/12.5/share/man:$MANPATH

export MANPATH

Step 2) Adding postgres user

[root@localhost pgData]# adduser postgres

[root@localhost pgData]# id postgres

uid=1001(postgres) gid=1001(postgres) groups=1001(postgres)

[root@localhost pgData]#

Step 3) changing ownership of the data directory

mkdir /pgData/data

chown postgres:postgres /pgData/data

Step 4) Initialize the first postgres database

su - postgres

.bash_profile is edited with the variables we formed in step 1 of post install.

initdb -D /pgData/data

pg_ctl -D /pgData/data -l logfile start

createdb test

psql test

Output below:

[postgres@localhost ~]$ initdb -D /pgData/data

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".

The default database encoding has accordingly been set to "UTF8".

The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgData/data ... ok

creating subdirectories ... ok

selecting dynamic shared memory implementation ... posix

selecting default max_connections ... 100

selecting default shared_buffers ... 128MB

selecting default time zone ... America/New_York

creating configuration files ... ok

running bootstrap script ... ok

performing post-bootstrap initialization ... ok

syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections

You can change this by editing pg_hba.conf or using the option -A, or

--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

pg_ctl -D /pgData/data -l logfile start

[postgres@localhost ~]$ pg_ctl -D /pgData/data -l logfile start

waiting for server to start.... done

server started

[postgres@localhost ~]$ createdb test

[postgres@localhost ~]$ psql test

psql (12.5)

Type "help" for help.

test=# \c

You are now connected to database "test" as user "postgres".

test=# \l

test=#