Introduction

We briefly introduced some of the most common classes and methods in the Introduction tutorial. In this tutorial we go through some more! Pandas provides plenty of methods to obtain basic information about the data. The following are some of the common methods and objects:
# Load a sample data
import pandas as pd
dataPath = 'https://raw.githubusercontent.com/alineu/pyDataScintist-Notebooks/main/data/'
df = pd.read_csv(dataPath+'car_data.csv')
df.head()
symboling normalized-losses make fuel-type aspiration num-of-doors body-style drive-wheels engine-location wheel-base ... engine-size fuel-system bore stroke compression-ratio horsepower peak-rpm city-mpg highway-mpg price
0 3 ? alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111 5000 21 27.0 13495
1 3 ? alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111 5000 21 27.0 16500
2 1 ? alfa-romero gas std two hatchback rwd front 94.5 ... 152 mpfi 2.68 3.47 9.0 154 5000 19 26.0 16500
3 2 164 audi gas std four sedan fwd front 99.8 ... 109 mpfi 3.19 3.4 10.0 102 5500 24 30.0 13950
4 2 164 audi gas std four sedan 4wd front 99.4 ... 136 mpfi 3.19 3.4 8.0 115 5500 18 22.0 17450
5 rows × 26 columns

index: Stores the Index labels for all objects

df.index
RangeIndex(start=0, stop=205, step=1)

columns: Stores the column labels for pandas DataFrames

df.columns
Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration',
       'num-of-doors', 'body-style', 'drive-wheels', 'engine-location',
       'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type',
       'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke',
       'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg',
       'highway-mpg', 'price'],
      dtype='object')

describe: Provides a summary of descriptive statistics, excluding NaN values, for all columns with numerical data types.

df.describe()
symboling wheel-base length width height curb-weight engine-size compression-ratio city-mpg highway-mpg
count 205.000000 202.000000 205.000000 205.000000 205.000000 203.000000 205.000000 205.000000 205.000000 200.000000
mean 0.834146 98.821287 174.049268 65.907805 53.724878 2558.211823 126.907317 10.142537 25.219512 30.695000
std 1.245307 5.982678 12.337289 2.145204 2.443522 520.943508 41.642693 3.972040 6.542142 6.937769
min -2.000000 86.600000 141.100000 60.300000 47.800000 1488.000000 61.000000 7.000000 13.000000 16.000000
25% 0.000000 94.500000 166.300000 64.100000 52.000000 2157.000000 97.000000 8.600000 19.000000 25.000000
50% 1.000000 97.000000 173.200000 65.500000 54.100000 2414.000000 120.000000 9.000000 24.000000 30.000000
75% 2.000000 102.300000 183.100000 66.900000 55.500000 2943.500000 141.000000 9.400000 30.000000 34.000000
max 3.000000 120.900000 208.100000 72.300000 59.800000 4066.000000 326.000000 23.000000 49.000000 54.000000

info: Shows information about a DataFrame including the index, column names, data type, non-null values count, and memory used by the DataFrame

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   symboling          205 non-null    int64  
 1   normalized-losses  205 non-null    object 
 2   make               205 non-null    object 
 3   fuel-type          205 non-null    object 
 4   aspiration         204 non-null    object 
 5   num-of-doors       205 non-null    object 
 6   body-style         204 non-null    object 
 7   drive-wheels       205 non-null    object 
 8   engine-location    205 non-null    object 
 9   wheel-base         202 non-null    float64
 10  length             205 non-null    float64
 11  width              205 non-null    float64
 12  height             205 non-null    float64
 13  curb-weight        203 non-null    float64
 14  engine-type        205 non-null    object 
 15  num-of-cylinders   205 non-null    object 
 16  engine-size        205 non-null    int64  
 17  fuel-system        205 non-null    object 
 18  bore               205 non-null    object 
 19  stroke             205 non-null    object 
 20  compression-ratio  205 non-null    float64
 21  horsepower         205 non-null    object 
 22  peak-rpm           203 non-null    object 
 23  city-mpg           205 non-null    int64  
 24  highway-mpg        200 non-null    float64
 25  price              205 non-null    object 
dtypes: float64(7), int64(3), object(16)
memory usage: 41.8+ KB

dtypes: Shows the data type of each column

df.dtypes
symboling              int64
normalized-losses     object
make                  object
fuel-type             object
aspiration            object
num-of-doors          object
body-style            object
drive-wheels          object
engine-location       object
wheel-base           float64
length               float64
width                float64
height               float64
curb-weight          float64
engine-type           object
num-of-cylinders      object
engine-size            int64
fuel-system           object
bore                  object
stroke                object
compression-ratio    float64
horsepower            object
peak-rpm              object
city-mpg               int64
highway-mpg          float64
price                 object
dtype: object

shape: Shows the number of rows and columns of a pandas object

df.shape
(205, 26)

size: Returns the number of rows in a pandas Series or the number of rows times the number of columns of a DataFrame

df.shape
5330