# Introduction to Pandas

Pandas is a Python library that provides high-performance, easy-to-use data structures and data analysis tools. It is built on top of NumPy and is especially useful for working with **tabular data** such as spreadsheets, CSV files, or SQL tables.

The two main data structures in pandas are:

- **Series**: A one-dimensional labeled array
- **DataFrame**: A two-dimensional labeled data structure (like a table)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Pandas Series

A **Series** is like a one-dimensional array with labels (called an index). It can hold any data type (integers, strings, floats, Python objects, etc.).

In [None]:
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s)

## Pandas DataFrame

A **DataFrame** is a 2D labeled data structure with rows and columns. You can think of it as a dictionary of Series objects sharing the same index.

In [None]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['NY', 'LA', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
df

## Operations with DataFrames

### Inspecting data

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

### Selecting data

In [None]:
df['Name']  # Select one column

In [None]:
df[['Name', 'Age']]  # Select multiple columns

In [None]:
df.iloc[0]   # Select first row by position

In [None]:
df.loc[0]    # Select first row by index label

### Filtering data

In [None]:
df[df['Age'] > 30]

### Adding new columns

In [None]:
df['Age in 5 years'] = df['Age'] + 5
df

## Reading CSV and ASCII Data

Pandas makes it easy to read external data files. For example:

- `pd.read_csv('file.csv')` for CSV files
- `pd.read_table('file.txt')` for general ASCII files with delimiters

Here, we'll create a sample CSV file and read it back:

In [None]:
sample_data = """Name,Age,City
Alice,25,NY
Bob,30,LA
Charlie,35,Chicago
David,40,Houston"""

with open('sample.csv', 'w') as f:
    f.write(sample_data)

# Read CSV
df_csv = pd.read_csv('sample.csv')
df_csv

## Simple Visualization with Pandas

Pandas integrates well with Matplotlib for visualization.

In [None]:
df['Age'].plot(kind='bar', title='Ages of People')
plt.ylabel('Age')
plt.show()

## Practice Exercises

1. Create a DataFrame with columns: `Product`, `Price`, `Quantity`. Add a new column `Total` as `Price * Quantity`.

2. Read the CSV file `sample.csv` and select only the rows where `Age > 30`.

3. Plot a line graph of `Quantity` over `Product`.

*(Try solving these before looking at solutions.)*