Working with Data
Python is widely used for working with data because it has simple syntax and powerful libraries for analysis, visualization, and data processing.
In this guide you will learn how to:
- Load data
- Store data in Python structures
- Work with tabular data
- Filter and transform data
- Save results
Reading Data From a CSV File
CSV (Comma Separated Values) files are one of the most common data formats.
Example CSV:
name,age,city
Alice,30,Denver
Bob,25,Boulder
This example shows a CSV (Comma-Separated Values) file, which is a simple format used to store tabular data.
The first line is the header row. It defines the column names for the data. In this case, the columns are:
- name
- age
- city
Commas separate each column value. Each comma marks the boundary between columns.
Each line after the header represents a row of data. Every row must follow the same column order and structure defined in the header.
Create a plain text file on your computer and save it as ‘people.csv’
Using Python’s built-in csv module:
import csv
with open("people.csv") as file:
reader = csv.DictReader(file)
for row in reader:
print(row["name"], row["age"])
Using Pandas for Data Analysis
The pandas library is the most popular tool for working with tabular data in Python.
Install pandas:
pip install pandas
Import pandas:
import pandas as pd
Loading Data With Pandas
Load a CSV file:
data = pd.read_csv("people.csv")
View the first rows:
data.head()
Example output:
name age city
0 Alice 30 Denver
1 Bob 25 Boulder
Exploring Data
Check column names:
data.columns
Basic summary statistics:
data.describe()
View data types:
data.dtypes
Selecting Columns
Select a single column:
data["age"]
Select multiple columns:
data[["name", "city"]]
Filtering Data
Select rows matching conditions.
Example: people older than 25
filtered = data[data["age"] > 25]
Example: select a specific city
denver_people = data[data["city"] == "Denver"]
Creating New Columns
You can calculate new data from existing columns.
Example:
data["age_next_year"] = data["age"] + 1
Sorting Data
Sort by a column:
data.sort_values("age")
Sort descending:
data.sort_values("age", ascending=False)
Grouping Data
Grouping summarizes data by categories.
Example dataset:
city,temperature
Denver,70
Denver,72
Boulder,68
Calculate average temperature by city:
data.groupby("city")["temperature"].mean()
Handling Missing Data
Check missing values:
data.isnull()
Remove missing rows:
data.dropna()
Fill missing values:
data.fillna(0)
Saving Data
Save a DataFrame to CSV:
data.to_csv("output.csv", index=False)
Save to Excel:
data.to_excel("output.xlsx")
Working With JSON Data
JSON is commonly used in APIs.
Example JSON:
{
"name": "Alice",
"age": 30
}
Load JSON in Python:
import json
with open("data.json") as f:
data = json.load(f)
Access values:
print(data["name"])
Example Data Workflow
Example complete workflow:
import pandas as pd
# load data
data = pd.read_csv("sales.csv")
# filter rows
recent_sales = data[data["year"] == 2024]
# calculate totals
total = recent_sales["revenue"].sum()
print("Total revenue:", total)
# save results
recent_sales.to_csv("filtered_sales.csv", index=False)
Summary
Python provides powerful tools for working with data:
- Lists and dictionaries store structured data
- CSV and JSON allow data exchange
- Pandas simplifies tabular analysis
- Filtering, grouping, and sorting reveal insights
These tools make Python one of the most widely used languages for data science, research, and analytics.