10 NumPy Functions Every Data Scientist Should Know

Before Pandas. Before Scikit-learn. There is NumPy.
NumPy is the foundation of every data science library in Python. If you understand NumPy, everything else makes more sense.
Here are the 10 most important NumPy functions — with examples and real use cases.
1. np.array() — Create Your First Array
Everything in NumPy starts with an array.
Example:
import numpy as np
data = np.array([10, 20, 30, 40, 50])
print(data) # → [10 20 30 40 50]
print(data.shape) # → (5,)
print(data.dtype) # → int64
Use case: Converting a Python list into a NumPy array for fast mathematical operations.
2. np.zeros() and np.ones() — Create Empty Arrays
Useful for initializing arrays before filling them with data.
Example:
zeros = np.zeros((3, 4)) # 3 rows, 4 columns of 0s
ones = np.ones((2, 3)) # 2 rows, 3 columns of 1s
print(zeros.shape) # → (3, 4)
print(ones.shape) # → (2, 3)
Use case: Creating placeholder arrays for machine learning model weights before training.
3. np.arange() — Create a Range of Numbers
Like Python's range() but returns a NumPy array.
Example:
arr = np.arange(0, 100, 10)
print(arr) # → [ 0 10 20 30 40 50 60 70 80 90]
Use case: Generating evenly spaced values for plotting graphs or creating test datasets.
4. np.reshape() — Change Array Shape
Change the shape of an array without changing its data.
Example:
arr = np.arange(12)
reshaped = arr.reshape(3, 4) # 3 rows, 4 columns
print(reshaped)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
Use case: Reshaping image data or feature matrices before feeding into a machine learning model.
5. np.mean(), np.median(), np.std() — Basic Statistics
The three most used statistical functions in data science.
Example:
scores = np.array([85, 90, 78, 92, 88, 76, 95])
print(np.mean(scores)) # → 86.28
print(np.median(scores)) # → 88.0
print(np.std(scores)) # → 6.6
Use case: Calculating average salary, median house price, or standard deviation of test scores.
6. np.min(), np.max(), np.sum() — Aggregations
Find the smallest, largest, or total value in an array.
Example:
sales = np.array([1200, 1500, 980, 2100, 1750])
print(np.min(sales)) # → 980
print(np.max(sales)) # → 2100
print(np.sum(sales)) # → 7530
Use case: Finding the lowest temperature in weather data, highest revenue month, total annual sales.
7. np.where() — Conditional Selection
Apply conditions to an array. Returns elements that meet the condition.
Example:
scores = np.array([45, 72, 88, 55, 91, 63])
result = np.where(scores >= 70, "Pass", "Fail")
print(result) # → ['Fail' 'Pass' 'Pass' 'Fail' 'Pass' 'Fail']
Use case: Labeling students as pass or fail, flagging transactions above a threshold, categorizing data.
8. np.unique() — Find Unique Values
Returns all unique values in an array.
Example:
categories = np.array(["A", "B", "A", "C", "B", "A", "D"])
print(np.unique(categories)) # → ['A' 'B' 'C' 'D']
print(np.unique(categories, return_counts=True))
# → (['A' 'B' 'C' 'D'], [3, 2, 1, 1])
Use case: Finding unique product categories, counting how many times each value appears in a column.
9. np.sort() — Sort an Array
Sort values in ascending or descending order.
Example:
prices = np.array([450, 120, 890, 230, 670])
print(np.sort(prices)) # → [120 230 450 670 890]
print(np.sort(prices)[::-1]) # → [890 670 450 230 120]
Use case: Sorting product prices, ranking exam scores, ordering time series data.
10. np.dot() — Matrix Multiplication
Essential for machine learning. Used in neural networks, linear regression, and more.
Example:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
result = np.dot(A, B)
print(result)
# → [[19 22]
# [43 50]]
Use case: Calculating predictions in linear regression, forward pass in neural networks, feature transformations.
Quick Reference — Save This
| Function | What It Does |
|---|---|
| np.array() | Create a NumPy array |
| np.zeros() / np.ones() | Create arrays of 0s or 1s |
| np.arange() | Create a range of numbers |
| np.reshape() | Change array shape |
| np.mean() / np.median() / np.std() | Basic statistics |
| np.min() / np.max() / np.sum() | Aggregations |
| np.where() | Conditional selection |
| np.unique() | Find unique values |
| np.sort() | Sort array values |
| np.dot() | Matrix multiplication |
One Important Rule
NumPy arrays are faster than Python lists — but only if you use NumPy operations on them.
# Slow ❌ — using Python loop on NumPy array
total = 0
for x in data:
total += x
# Fast ✅ — using NumPy operation
total = np.sum(data)
Always use NumPy functions on NumPy arrays. Never loop through them manually.
Save this article and come back to it every time you work with numerical data.





