“Pandas”的版本间差异

2020年7月9日 (四) 06:56的版本

Python Data Analysis Library

数据结构

https://www.cnblogs.com/songxiaohua/p/9445087.html

DataFrame

参见 https://blog.csdn.net/u014281392/article/details/75331570

df = pd.DataFrame([[1, 2, 3],[4, 5, 6]], columns=['col1','col2','col3'], index=['a','b'])

表格方式定义，行是 index，列是columns
调用行 df.loc(['a'])，调用列 df['col1']

Series

每一项称为items,比较像字典，又分为index和values
默认的index是range(),所以可以从ndarray转换而来
可以从字典装换而来，key是变成index

sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
ser3 = Series(sdata)

不同Series对象可以根据索引进行匹配计算。
根据索引进行检索：ser3.loc['Ohio']
Series.describe() #看这个序列数值的基本统计量
缺失值处理 Series.isnull(), Series.notnull(), Series.fillnull()

Index

比较像集合set，但是元素可以重复

io

read_csv

import pandas as pd
data=pd.read_csv('cGs_for_LAMOST.csv',comment='#')
data.columns
ra=data['ra']
dec=data['dec]

现在推荐read_csv 读普通表格了，默认总是把第一行做表头，如果没有表头用header=None
详见 https://blog.csdn.net/brucewong0516/article/details/79092579

comment='#'

sep=' '(或者'\s' ;sep='\t'(分隔符是Tab键）

read_table

：读普通的ascii文件

file=pd.read_table(path+'test1.spectrum',skiprows=range(0,6),\
                 delim_whitespace=True, names=('A', 'B', 'C'), dtype={'A': np.int64, 'B': np.float64, 'C': np.float64})

pickle

使用DataFrame的to_pickle属性就可以生成pickle文件对数据进行永久储存

df = pd.DataFrame(np.arange(20).reshape(4,5))
df.to_pickle('foo.pkl')
pd.read_pickle('foo.pkl')

@@ 第15行： / 第15行： @@
  ser3 = Series(sdata)
 * 不同Series对象可以根据索引进行匹配计算。
-* ser3.loc['Ohio']
+* 根据索引进行检索：ser3.loc['Ohio']
 * Series.describe()  #看这个序列数值的基本统计量
 * 缺失值处理 Series.isnull(),  Series.notnull(), Series.fillnull()

“Pandas”的版本间差异

2020年7月9日 (四) 06:56的版本

目录

数据结构

DataFrame

Series

Index

io

pickle

导航菜单

“Pandas”的版本间差异

2020年7月9日 (四) 06:56的版本

数据结构

DataFrame

Series

Index

io

pickle

导航菜单

搜索