Data Analysis with Python and Pandas Interview Questions and Answers

Question 1 : What is Series in Pandas?
Answer : A Series is a one-dimensional array that is capable of storing various data types. The row labels of series are called the index. By using a ‘series’ method, we can easily convert the list, tuple, and dictionary into series. A Series cannot contain multiple columns.

Learn Data Analysis with Python and Pandas to Unleash a Modern Career

Join Data Analysis with Python and Pandas Training

Question 2 : What are the different types of Data Structures in Pandas?
Answer : Pandas provide two data structures, which are supported by the pandas library, Series, and DataFrames. Both of these data structures are built on top of the NumPy.
A Series is a one-dimensional data structure in pandas, whereas the DataFrame is the two-dimensional data structure in pandas.

Question 3 : How to calculate the standard deviation from the Series?
Answer : The Pandas std() is a function for calculating the standard deviation of the given set of numbers, DataFrame, column, and rows.
Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Question 4 : What are the significant features of the pandas Library?
Answer : The important features of the panda’s library are:

  • Memory Efficient
  • Data Alignment
  • Reshaping
  • Merge and join
  • Time Series

Question 5 : Explain what is Reindexing in pandas?
Answer : Reindexing is used to conform DataFrame to a new index with optional filling logic. It places NA/NaN in that location where the values are not present in the previous index. It returns a new object unless the new index is produced as equivalent to the current one, and the value of copy becomes False. It is used to change the index of the rows and columns of the DataFrame.

Question 6 : Explain Categorical data in Pandas?
Answer : A Categorical data is defined as a Pandas data type that corresponds to a categorical variable in statistics. A categorical variable is generally used to take a limited and usually fixed number of possible values. Examples: gender, country affiliation, blood type, social class, observation time, or rating via Likert scales. All values of categorical data are either in categories or np.nan.

This data type is useful in the following cases:

  • It is useful for a string variable that consists of only a few different values. If we want to save some memory, we can convert a string variable to a categorical variable.
  • It is useful for the lexical order of a variable that is not the same as the logical order (?one?, ?two?, ?three?) By converting into a categorical and specify an order on the categories, sorting and min/max is responsible for using the logical order instead of the lexical order.
  • It is useful as a signal to other Python libraries because this column should be treated as a categorical variable.

Question 7 : How do we create a copy of the series in Pandas?
Answer : We create the copy of series by using the this syntax:

pandas.Series.copy
Series.copy(deep=True)

The above statements make a deep copy that includes a copy of the data and the indices. If we set the value of deep to False, it will neither copy the indices nor the data.

Question 8 : How to add an Index, row, or column to a Pandas DataFrame?
Answer :

Adding an Index to a DataFrame:
Pandas allow adding the inputs to the index argument if you create a DataFrame. It will make sure that you have the desired index. If you don?t specify inputs, the DataFrame contains, by default, a numerically valued index that starts with 0 and ends on the last row of the DataFrame.

Adding Rows to a DataFrame:
We can use .loc, iloc, and ix to insert the rows in the DataFrame.

  • The loc basically works for the labels of our index. It can be understood as if we insert in loc[4], which means we are looking for that values of DataFrame that have an index labeled 4.
  • The iloc basically works for the positions in the index. It can be understood as if we insert in iloc[4], which means we are looking for the values of DataFrame that are present at index ‘4`.
  • The ix is a complex case because if the index is integer-based, we pass a label to ix. The ix[4] means that we are looking in the DataFrame for those values that have an index labeled 4. However, if the index is not only integer-based, ix will deal with the positions as iloc.

Adding Columns to a DataFrame:
If we want to add the column to the DataFrame, we can easily follow the same procedure as adding an index to the DataFrame by using loc or iloc.

Question 9 : How to iterate over a Pandas DataFrame?
Answer : You can iterate over the rows of the DataFrame by using for loop in combination with an iterrows() call on the DataFrame.

Question 10 : How to Rename the Index or Columns of a Pandas DataFrame?
Answer : You can use the .rename method to give different values to the columns or the index values of DataFrame.

Learn Data Analysis with Python and Pandas to Unleash a Modern Career

Join Data Analysis with Python and Pandas Training

Question 11 : What is Python pandas used for?
Answer : Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license.

Question 12 : What is Pandas/Python Pandas?
Answer : Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.

Question 13 : What is map function in Python?
Answer : map function executes the function given as the first argument on all the elements of the iterable given as the second argument. If the function given takes in more than 1 arguments, then many iterables are given. #Follow the link to know more similar functions.

Question 14 : What is the process of compilation and linking in python?
Answer : The compiling and linking allows the new extensions to be compiled properly without any error and the linking can be done only when it passes the compiled procedure. If the dynamic loading is used then it depends on the style that is being provided with the system. The python interpreter can be used to provide the dynamic loading of the configuration setup files and will rebuild the interpreter.
The steps that are required in this as:

  1. Create a file with any name and in any language that is supported by the compiler of your system. For example file.c or file.cpp
  2. Place this file in the Modules/ directory of the distribution which is getting used.
  3. Add a line in the file Setup.local that is present in the Modules/ directory.
  4. Run the file using spam file.o
  5. After successful run of this rebuild the interpreter by using the make command on the top-level directory.
  6. If the file is changed then run rebuildMakefile by using the command as ‘make Makefile’.

Question 15 : What is Matplotlib?
Answer : matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like wxPython, Qt, or GTK+.

Question 16 : Whenever Python exits, why isn’t all the memory de-allocated?
Answer :

  1. Whenever Python exits, especially those Python modules which are having circular references to other objects or the objects that are referenced from the global namespaces are not always de-allocated or freed.
  2. It is impossible to de-allocate those portions of memory that are reserved by the C library.
  3. On exit, because of having its own efficient cleanup mechanism, Python would try to de-allocate/destroy every other object.

Question 17 : What is NP Python?
Answer : NumPy (pronounced /ˈnʌmpaɪ/ (NUM-py) or sometimes /ˈnʌmpi/ (NUM-pee)) is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.

Question 18 : How do you calculate percentiles with Python/ NumPy?
Answer : We can calculate percentiles with the following code

1
2
3
4
import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50)  #Returns 50th percentile, e.g. median
print(p)

Output

3

Question 19 : What is a pandas DataFrame?
Answer : DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used Pandas object.

Question 20 : Explain the use of decorators.
Answer : Decorators in Python are used to modify or inject code in functions or classes. Using decorators, you can wrap a class or function method call so that a piece of code can be executed before or after the execution of the original code. Decorators can be used to check for permissions, modify or track the arguments passed to a method, logging the calls to a specific method, etc.

Learn Data Analysis with Python and Pandas to Unleash a Modern Career

Join Data Analysis with Python and Pandas Training

Question 21 : What are negative indexes and why are they used?
Answer : The sequences in Python are indexed and it consists of the positive as well as negative numbers. The numbers that are positive uses ‘0’ that is used as the first index and ‘1’ as the second index and the process goes on like that.
The index for the negative number starts from ‘-1’ that represents the last index in the sequence and ‘-2’ as the penultimate index and the sequence carries forward like the positive number.
The negative index is used to remove any new-line spaces from the string and allow the string to except the last character that is given as S[:-1]. The negative index is also used to show the index to represent the string in the correct order.

Question 22 : What is Scipy?
Answer : SciPy (pronounced “Sigh Pie”) is open-source software for mathematics, science, and engineering. It is also the name of a very popular conference on scientific programming with Python. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation.

Question 23 : How do you make 3D plots/visualizations using NumPy/SciPy?
Answer : Like 2D plotting, 3D graphics is beyond the scope of NumPy and SciPy, but just as in the 2D case, packages exist that integrate with NumPy. Matplotlib provides basic 3D plotting in the mplot3d subpackage, whereas Mayavi provides a wide range of high-quality 3D visualization features, utilizing the powerful VTK engine.

Question 24 : What is the difference between NumPy and SciPy?
Answer :

  1. In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, et cetera.
  2. All numerical code would reside in SciPy. However, one of NumPy’s important goals is compatibility, so NumPy tries to retain all features supported by either of its predecessors.
  3. Thus NumPy contains some linear algebra functions, even though these more properly belong in SciPy. In any case, SciPy contains more fully-featured versions of the linear algebra modules, as well as many other numerical algorithms.
  4. If you are doing scientific computing with python, you should probably install both NumPy and SciPy. Most new features belong in SciPy rather than NumPy.

Question 25 : What is PIP for Python?
Answer : PIP is a package management system used to install and manage software packages written in Python. Many packages can be found in the Python Package Index (PyPI). Python 2.7.9 and later (on the python2 series), and Python 3.4 and later include pip (pip3 for Python 3) by default.

Learn Data Analysis with Python and Pandas to Unleash a Modern Career

Join Data Analysis with Python and Pandas Training

Question 26 : Explain split(), sub(), subn() methods of “re” module in Python.
Answer : To modify the strings, Python’s “re” module is providing 3 methods. They are:

  • split() – uses a regex pattern to “split” a given string into a list.
  • sub() – finds all substrings where the regex pattern matches and then replace them with a different string
  • subn() – it is similar to sub() and also returns the new string along with the no. of replacements.

Question 27 : How can you generate random numbers in Python?
Answer : Random module is the standard module that is used to generate the random number. The method is defined as:

1
2
import random
random.random

The statement random.random() method return the floating point number that is in the range of [0, 1). The function generates the random float numbers. The methods that are used with the random class are the bound methods of the hidden instances. The instances of the Random can be done to show the multi-threading programs that create different instance of individual threads. The other random generators that are used in this are:

  1. randrange(a, b): it chooses an integer and define the range in-between [a, b). It returns the elements by selecting it randomly from the range that is specified. It doesn’t build a range object.
  2. uniform(a, b): it chooses a floating point number that is defined in the range of [a,b).Iyt returns the floating point number
  3. normalvariate(mean, sdev): it is used for the normal distribution where the mu is mean and the sdev is a sigma that is used for standard deviation.
  4. The Random class that is used and instantiated creates an independent multiple random number generators.

Question 28 : How to get indices of N maximum values in a NumPy array?
Answer : We can get the indices of N maximum values in a NumPy array using the below code:

1
2
3
import numpy as np
arr = np.array([1, 3, 2, 4, 5])
print(arr.argsort()[-3:][::-1])

Output

[ 4 3 1 ]

Question 29 : What is Sympy?
Answer : SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible.

Question 30 : Explain Inheritance in Python with an example.
Answer : Inheritance allows One class to gain all the members(say attributes and methods) of another class. Inheritance provides code reusability, makes it easier to create and maintain an application. The class from which we are inheriting is called super-class and the class that is inherited is called a derived / child class.

They are different types of inheritance supported by Python:

  1. Single Inheritance – where a derived class acquires the members of a single superclass.
  2. Multi-level inheritance – a derived class d1 in inherited from base class base1, and d2 are inherited from base2.
  3. Hierarchical inheritance – from one base class you can inherit any number of child classes
  4. Multiple inheritance – a derived class is inherited from more than one base class