Bigdata Greenplum DBA Interview Questions and Answers

Q1 : What is the segment in Greenplum?
A : Database instances in the Greenplum system are called segments. Segment stores the data and carries out the query processing. In Greenplum distributed systems, each segment contains a distinct portion of the data.

Q2 : How to delete a standby?
A : To remove the currently configured standby master host from your Greenplum Database system, run the following command in the master only:
# gpinitstandby -r

Q3 :  How many user schemas are there in the database?
A : Use”dn” at psql prompt.

Q4 : How to check the size of a table?
A : Table Level:
psql> select pg_size_pretty(pg_relation_size(‘schema.tablename’));
Replace schema.tablename with your search table.
Table and Index:
psql> select pg_size_pretty(pg_total_relation_size(‘schema.tablename’));
Replace schema.tablename with your search table.

Q5 : How do I get help on syntax to alter table?
A : In psql session type h alter table which will display the syntax:
gpdb=# h alter table

Q6 : How Greenplum updates records?
A : Greenplum does not update the records, it marks the delete flag.
Each record contains two slots, xmin and xmax.
Updating a row is treated as a delete and an insert, so the XID (transaction ID) is saved to the xmax of the current row and the xmin of the newly inserted row.
Now let us see how update works with help of an example;
Sample data:
[ROW id][xmin][xmax]
[R1][T1][0] —————-> First time record is inserted, R1
Updating same records after sometime
[R1][T1][T22] ————-> Record R1 is updated; T22 is the new row transaction id
[R22][T22][0] ————-> New updated record R33; Note 0 in xmax indicates new record.
If the record is deleted simply xmax will have the XID.

Q7 : How to check whether Greenplum server is up and running?
A : The gpstate is the utility to check gpdb status.
Use gpstate -Q to show a quick status. Refer to gpstate –help for more options.
Sample output:
[gpadmin@stinger2]/export/home/gpadmin>gpstate -Q
gpadmin-[INFO]:-Obtaining GPDB array type, [Brief], please wait…
gpadmin-[INFO]:-Obtaining GPDB array type, [Brief], please wait…
gpadmin-[INFO]:-Quick Greenplum database status from Master instance only
gpadmin-[INFO]:———————————————————-
gpadmin-[INFO]:-GPDB fault action value = readonly
gpadmin-[INFO]:-Valid count in status view = 4
gpadmin-[INFO]:-Invalid count in status view = 0
gpadmin-[INFO]:———————————————————-

Q8 : Why do we need gpstop -m and gpstart -m?
A : The gpstart -m command allows you to start the master only and none of the data segments and is used primarily by support to get system level information/configuration. An end user would not regularly or even normally use it.

Q9 : How to add mirrors to the array?
A : The gpaddmirrors utility configures mirror segment instances for an existing Greenplum Database system that was initially configured with primary segment instances only.
For more details check help.
# gpaddmirrors –help

Q10 : How to add new user to the database?
A : Use createuser utility to create users. See createuser –help for more details.
You can also use SQL commands in psql prompt to create users.
For example: CREATE USER or ROLE ….

Q11 : How to see the value of GUC?
A : By connecting GPDB database using psql query catalog or do show parameter.
Example:
gpdb# select name,setting from pg_settings where name=’GUC’;
or
gpdb# show ;

Q12 : What is resource queues?
A : Resource queues are used to manager Greenplum database workload management. All user / queries can be prioritized using Resource queues. Refer Admin guide for more details.

Q13 : How do I clone my production databaes to PreProd / QA environment?
A : If Prod and QA on same GPDB cluster, use CREATE database template .
If Prod and QA are on different clusters, use backup and restore utilities.

Q14 : What is Greenplum performance monitor and how to install ?
A : It’s a monitoring tool that collects statistics on system and query performance and builds historical data.

Q15 : What are major differences between Oracle and Greenplum?
A : Oracle is relational database. Greenplum is MPP nature. Greenplum is shared nothing architecture. There are many other differences in terms of functinality and behaviour.

Q16 : Can you do partitioning in Greenplum tables?
A : Yes

Q17 : What are the tools available in Greenplum to take backup and restores?
A : For non-parallel backups:
Use postgres utililities (pg_dump, pg_dumpall for backup, and pg_restore for restore).
Another useful command for getting data out of database is the COPY to .
For parallel backups:
gp_dump and gpcrondump for backups and gp_restore for restore process.

Q18 : How do I monitor user activity history in Greenplum database?
A :Use Greenplum performance monitor (gpperfmon), which has GUI to monitor and query performance history.

Q19 : What is gpdetective and how do I run it in Greenplum?
A : The gpdetective utility collects information from a running Greenplum Database system and creates a bzip2-compressed tar output file. This output file helps with the diagnosis of Greenplum Database errors or system failures. for more details check help.
gpdetective –help

Q20 : Which parameters can you use to manage workload in a Greenplum database?
A : workload management is done by creating resource queues and assigning various limits.

Q21 : How to re-sync a standby?
A : Use this option if you already have a standby master configured, and just want to resynchronize the data between the primary and backup master host. The Greenplum system catalog tables will not be updated.
# gpinitstandby -n (resynchronize)

Q22 : How do I clone my production databaes to PreProd / QA environment?
A : If Prod and QA on same GPDB cluster, use CREATE database template .
If Prod and QA are on different clusters, use backup and restore utilities.

Q23 : What is good and bad about the Greenplum, compared to Oracle and Greenplum?
A : Greenplum is built on top of Postgresql. It is shared nothing, MPP architecture best for data warehousing env. Good for big data analytics purpose.
Oracle is an all-purpose database.

Q24 : Where can I get help on postgres psql commands?
A : In psql session
“ ?” – for all psql session help
“h ” For any SQL syntax help.

Q25 : What is Analyze and how frequen should I run this?
A : ANALYZE collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
It is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics will help the query planner to choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy is to run VACUUM and ANALYZE once a day during a low-usage time of day.