AI/ML Interview Questions

Foundations

Supervised vs Unsupervised vs Reinforcement

Supervised: learn from labeled data (classification, regression). Unsupervised: find structure in unlabeled data (clustering, dimensionality reduction). Reinforcement: learn via rewards from environment.

Modeling

Overfitting and regularization

Overfitting: memorizing noise. Combat with simpler models, more data, dropout/L2, early stopping, cross-validation.

Optimization

Gradient descent

Iteratively update parameters to minimize loss: θ = θ - α ∇L(θ). Use learning rate schedules and momentum/Adam.

Preprocessing

Feature scaling

Normalize or standardize features to speed convergence and improve performance (especially for distance-based algorithms).

Evaluation

Evaluation metrics

Classification: accuracy, precision/recall/F1, ROC-AUC. Regression: MSE, MAE, R².

Foundations

Common algorithms

Logistic regression, decision trees/random forests, SVM, KNN, K-means, PCA, Naive Bayes.

Neural Networks

Deep learning basics

Neural networks with layers/activations; train with backprop + SGD/Adam. Use CNNs for images, RNN/Transformers for sequences.

Production

ML pipelines

Data ingestion → preprocessing → training → evaluation → deployment → monitoring. Use experiment tracking (MLflow, Weights & Biases).

Optimization

Hyperparameter tuning

Use grid/random search or Bayesian optimization; tune learning rate, regularization, tree depth, etc. Validate via cross-validation.

Preprocessing

Data leakage

Leakage occurs when test information influences training (e.g., scaling with global stats). Split data first; fit transforms on training only.

Evaluation

Model interpretability

Explain predictions using feature importance, SHAP/LIME; prefer simpler models in regulated contexts; balance accuracy vs transparency.

Neural Networks

Embeddings and vector databases

Convert items to dense vectors (word2vec/BERT). Use vector DBs (FAISS, Milvus) for similarity search and retrieval-augmented applications.

Production

MLOps: deployment and monitoring

Serve via REST/streaming; monitor drift and performance; implement rollback/AB testing; track lineage and reproducibility.