import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h']) df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) print(df)输出结果:
0 1 2 a 0.187208 -0.951407 0.316340 b NaN NaN NaN c -0.365741 -1.983977 -1.052170 d NaN NaN NaN e -1.024180 1.550515 0.317156 f -0.799921 -0.686590 1.383229 g NaN NaN NaN h -0.207958 0.426733 -0.325951上述示例,通过使用 reindex(重构索引),我们创建了一个存在缺少值的 DataFrame 对象。
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h']) df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) print(df['noe'].isnull())输出结果:
a False b True c False d True e False f False g True h False Name: 1, dtype: boolnotnull() 函数,使用示例:
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h']) df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) print df['one'].notnull()输出结果:
a True b False c True d False e True f True g False h True Name: 1, dtype: bool
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h'],columns=['one', 'two', 'three']) df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) print (df['one'].sum()) print()输出结果:
3.4516595395128
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(3, 3), index=['a', 'c', 'e'],columns=['one', 'two', 'three']) df = df.reindex(['a', 'b', 'c']) print(df) #用 0 填充 NaN print (df.fillna(0))输出结果:
one two three a 1.497185 -0.703897 -0.050513 b NaN NaN NaN c 2.008315 1.342690 -0.255855 one two three a 1.497185 -0.703897 -0.050513 b 0.000000 0.000000 0.000000 c 2.008315 1.342690 -0.255855当然根据您自己的需求,您也可以用其他值进行填充。
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h'],columns=['one', 'two', 'three']) df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) print df.fillna(method='ffill')输出结果:
one two three a 0.871741 0.311057 0.091005 b 0.871741 0.311057 0.091005 c 0.107345 -0.662864 0.826716 d 0.107345 -0.662864 0.826716 e 1.630221 0.482504 -0.728767 f 1.283206 -0.145178 0.109155 g 1.283206 -0.145178 0.109155 h 0.222176 0.886768 0.347820或者您也可以采用向后填充的方法。
import pandas as pd import numpy as np df = pd.DataFrame({'one':[10,20,30,40,50,666], 'two':[99,0,30,40,50,60]}) #使用replace()方法 print (df.replace({99:10,666:60,0:20}))输出结果:
one two 0 10 10 1 20 20 2 30 30 3 40 40 4 50 50 5 60 60
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h'],columns=['one', 'two', 'three']) df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) print(df) #删除缺失值 print (df.dropna())输出结果:
one two three a -2.025435 0.617616 0.862096 b NaN NaN NaN c -1.710705 1.780539 -2.313227 d NaN NaN NaN e -2.347188 -0.498857 -1.070605 f -0.159588 1.205773 -0.046752 g NaN NaN NaN h -0.549372 -1.740350 0.444356 one two three a -2.025435 0.617616 0.862096 c -1.710705 1.780539 -2.313227 e -2.347188 -0.498857 -1.070605 f -0.159588 1.205773 -0.046752 h -0.549372 -1.740350 0.444356axis = 1 表示按列处理,处理结果是一个空的 DataFrame 对象。
本文链接:http://task.lmcjl.com/news/17288.html