10 Data Science Interview Questions You must be Acquainted With


1. How would you create a taxonomy to identify key customer trends in unstructured data?

The best way to approach this question is to mention that it is good to check with the business owner and understand their objectives before categorizing the data. Having done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly by validating it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over the time.

2. Python or R – Which one would you prefer for text analytics?
The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools.

3. What are feature vectors?

A feature vector is an n-dimensional vector of numerical features that represent some object. In machine learning, feature vectors are used to represent numeric or symbolic characteristics, called features, of an object in a mathematical, easily analyzable way.

4. What is root cause analysis?
Root cause analysis was initially developed to analyze industrial accidents, but is now widely used in other areas. It is a problem solving technique used for isolating the root causes of faults or problems. A factor is called a root cause if its deduction from the problem-fault-sequence averts the final undesirable event from reoccurring.

 5. What is logistic regression? 

Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.

 6. What is Interpolation and Extrapolation?
Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.
7.  What is power analysis?
An experimental design technique for determining the effect of a given sample size.

8. Explain cross-validation.
It is a model validation technique for evaluating how the outcomes of a statistical analysis will generalize to an independent data set. It is mainly used in backgrounds where the objective is forecast and one wants to estimate how accurately a model will accomplish in practice. The goal of cross-validation is to term a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and gain insight on how the model will generalize to an independent data set.

9.  What is the goal of A/B Testing?
This is a statistical hypothesis testing for randomized experiment with two variables A and B. The objective of A/B testing is to detect any changes to a web page to maximize or increase the outcome of a strategy.

10. Can you use machine learning for time series analysis?
Yes, it can be used but it depends on the applications.

Comments

Post a Comment

Popular posts from this blog

TOGAF Certification Opens doors to success, Understand the need and opportunities it presents.

5 things preventing your website traffic