Pandas DataFrame是一个二维表结构,包含了行和列的标签,每一列可以有不同的数据类型。
以下是Pandas DataFrame结构对象常用的属性和方法:
DataFrame对象常用的属性有:
示例代码:
import pandas as pd
import numpy as np
# 创建DataFrame
data = {'name': ['Tom', 'Jack', 'Mary', 'Jane'],
'age': [20, 21, 19, 20],
'score': [80, 90, 85, 92]}
df = pd.DataFrame(data)
# 属性
print("DataFrame的形状为:", df.shape)
print("DataFrame的行索引为:", df.index)
print("DataFrame的列索引为:", df.columns)
print("DataFrame的每列数据类型为:", df.dtypes)
print("DataFrame的值为:", df.values)
输出结果为:
DataFrame的形状为: (4, 3)
DataFrame的行索引为: RangeIndex(start=0, stop=4, step=1)
DataFrame的列索引为: Index(['name', 'age', 'score'], dtype='object')
DataFrame的每列数据类型为: name object
age int64
score int64
dtype: object
DataFrame的值为: [['Tom' 20 80]
['Jack' 21 90]
['Mary' 19 85]
['Jane' 20 92]]
DataFrame结构对象常用的方法有:
示例代码:
import pandas as pd
import numpy as np
# 创建DataFrame
data = {'name': ['Tom', 'Jack', 'Mary', 'Jane'],
'age': [20, 21, 19, 20],
'score': [80, 90, 85, 92]}
df = pd.DataFrame(data)
# 方法
print("DataFrame前2行为:", df.head(2))
print("DataFrame后2行为:", df.tail(2))
print("DataFrame的信息为:",df.info())
print("DataFrame的统计信息为:",df.describe())
print("按分数对DataFrame进行排序:",df.sort_values(by='score', ascending=False))
print("删除缺失值:",df.dropna())
print("填充缺失值为0:",df.fillna(value=0))
print("按照年龄进行分组:",df.groupby('age').sum())
print("数据透视表:",pd.pivot_table(df, values='score', index=['name'], columns=['age']))
df1 = pd.DataFrame({'name': ['Tom', 'Jack', 'Mary', 'Jane'], 'gender': ['M', 'M', 'F', 'F']})
print("合并两个DataFrame:",pd.merge(df, df1, on='name'))
df2 = pd.DataFrame({'gender': ['M', 'M', 'F', 'F'], 'income': [2000, 3000, 2500, 2800]})
print("按索引合并两个DataFrame:",df.join(df2))
输出内容为:
DataFrame前2行为: name age score
0 Tom 20 80
1 Jack 21 90
DataFrame后2行为: name age score
2 Mary 19 85
3 Jane 20 92
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 4 non-null object
1 age 4 non-null int64
2 score 4 non-null int64
dtypes: int64(2), object(1)
memory usage: 224.0+ bytes
DataFrame的信息为: None
DataFrame的统计信息为: age score
count 4.000000 4.000000
mean 20.000000 86.750000
std 0.816497 5.377422
min 19.000000 80.000000
25% 19.750000 83.750000
50% 20.000000 87.500000
75% 20.250000 90.500000
max 21.000000 92.000000
按分数对DataFrame进行排序: name age score
3 Jane 20 92
1 Jack 21 90
2 Mary 19 85
0 Tom 20 80
删除缺失值: name age score
0 Tom 20 80
1 Jack 21 90
2 Mary 19 85
3 Jane 20 92
填充缺失值为0: name age score
0 Tom 20 80
1 Jack 21 90
2 Mary 19 85
3 Jane 20 92
按照年龄进行分组: score
age
19 85
20 172
21 90
数据透视表: age 19 20 21
name
Jack NaN NaN 90.0
Jane NaN 92.0 NaN
Mary 85.0 NaN NaN
Tom NaN 80.0 NaN
合并两个DataFrame: name age score gender
0 Tom 20 80 M
1 Jack 21 90 M
2 Mary 19 85 F
3 Jane 20 92 F
按索引合并两个DataFrame: name age score gender income
0 Tom 20 80 M 2000
1 Jack 21 90 M 3000
2 Mary 19 85 F 2500
3 Jane 20 92 F 2800
本文链接:http://task.lmcjl.com/news/4493.html