Introduction
We briefly introduced some of the most common classes and methods in the Introduction tutorial. In this tutorial we go through some more!Pandas
provides plenty of methods to obtain basic information about the data. The following are some of the common methods and objects:
# Load a sample data
import pandas as pd
dataPath = 'https://raw.githubusercontent.com/alineu/pyDataScintist-Notebooks/main/data/'
df = pd.read_csv(dataPath+'car_data.csv')
df.head()
symboling | normalized-losses | make | fuel-type | aspiration | num-of-doors | body-style | drive-wheels | engine-location | wheel-base | ... | engine-size | fuel-system | bore | stroke | compression-ratio | horsepower | peak-rpm | city-mpg | highway-mpg | price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | ? | alfa-romero | gas | std | two | convertible | rwd | front | 88.6 | ... | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27.0 | 13495 |
1 | 3 | ? | alfa-romero | gas | std | two | convertible | rwd | front | 88.6 | ... | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27.0 | 16500 |
2 | 1 | ? | alfa-romero | gas | std | two | hatchback | rwd | front | 94.5 | ... | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26.0 | 16500 |
3 | 2 | 164 | audi | gas | std | four | sedan | fwd | front | 99.8 | ... | 109 | mpfi | 3.19 | 3.4 | 10.0 | 102 | 5500 | 24 | 30.0 | 13950 |
4 | 2 | 164 | audi | gas | std | four | sedan | 4wd | front | 99.4 | ... | 136 | mpfi | 3.19 | 3.4 | 8.0 | 115 | 5500 | 18 | 22.0 | 17450 |
5 rows × 26 columns
index
: Stores the Index labels for all objects
df.index
RangeIndex(start=0, stop=205, step=1)
columns
: Stores the column labels for pandas
DataFrames
df.columns
Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-of-doors', 'body-style', 'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'price'], dtype='object')
describe
: Provides a summary of descriptive statistics, excluding NaN values, for all columns with numerical data types.
df.describe()
symboling | wheel-base | length | width | height | curb-weight | engine-size | compression-ratio | city-mpg | highway-mpg | |
---|---|---|---|---|---|---|---|---|---|---|
count | 205.000000 | 202.000000 | 205.000000 | 205.000000 | 205.000000 | 203.000000 | 205.000000 | 205.000000 | 205.000000 | 200.000000 |
mean | 0.834146 | 98.821287 | 174.049268 | 65.907805 | 53.724878 | 2558.211823 | 126.907317 | 10.142537 | 25.219512 | 30.695000 |
std | 1.245307 | 5.982678 | 12.337289 | 2.145204 | 2.443522 | 520.943508 | 41.642693 | 3.972040 | 6.542142 | 6.937769 |
min | -2.000000 | 86.600000 | 141.100000 | 60.300000 | 47.800000 | 1488.000000 | 61.000000 | 7.000000 | 13.000000 | 16.000000 |
25% | 0.000000 | 94.500000 | 166.300000 | 64.100000 | 52.000000 | 2157.000000 | 97.000000 | 8.600000 | 19.000000 | 25.000000 |
50% | 1.000000 | 97.000000 | 173.200000 | 65.500000 | 54.100000 | 2414.000000 | 120.000000 | 9.000000 | 24.000000 | 30.000000 |
75% | 2.000000 | 102.300000 | 183.100000 | 66.900000 | 55.500000 | 2943.500000 | 141.000000 | 9.400000 | 30.000000 | 34.000000 |
max | 3.000000 | 120.900000 | 208.100000 | 72.300000 | 59.800000 | 4066.000000 | 326.000000 | 23.000000 | 49.000000 | 54.000000 |
info
: Shows information about a DataFrame including the index, column names, data type, non-null values count, and memory used by the DataFrame
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 205 entries, 0 to 204 Data columns (total 26 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 symboling 205 non-null int64 1 normalized-losses 205 non-null object 2 make 205 non-null object 3 fuel-type 205 non-null object 4 aspiration 204 non-null object 5 num-of-doors 205 non-null object 6 body-style 204 non-null object 7 drive-wheels 205 non-null object 8 engine-location 205 non-null object 9 wheel-base 202 non-null float64 10 length 205 non-null float64 11 width 205 non-null float64 12 height 205 non-null float64 13 curb-weight 203 non-null float64 14 engine-type 205 non-null object 15 num-of-cylinders 205 non-null object 16 engine-size 205 non-null int64 17 fuel-system 205 non-null object 18 bore 205 non-null object 19 stroke 205 non-null object 20 compression-ratio 205 non-null float64 21 horsepower 205 non-null object 22 peak-rpm 203 non-null object 23 city-mpg 205 non-null int64 24 highway-mpg 200 non-null float64 25 price 205 non-null object dtypes: float64(7), int64(3), object(16) memory usage: 41.8+ KB
dtypes
: Shows the data type of each column
df.dtypes
symboling int64 normalized-losses object make object fuel-type object aspiration object num-of-doors object body-style object drive-wheels object engine-location object wheel-base float64 length float64 width float64 height float64 curb-weight float64 engine-type object num-of-cylinders object engine-size int64 fuel-system object bore object stroke object compression-ratio float64 horsepower object peak-rpm object city-mpg int64 highway-mpg float64 price object dtype: object
shape
: Shows the number of rows and columns of a pandas
object
df.shape
(205, 26)
size
: Returns the number of rows in a pandas
Series
or the number of rows times the number of columns of a DataFrame
df.shape
5330