当字符串列含量比已经不再有HDFStore.append（字符串，数据帧）失败

python pandas 数据帧data.frame hdf5 pytables
我已经通过HDFStore，基本上存储有关的测试运行我做的汇总行存储在大熊猫。数每行中的字段包含可变长度的描述信息。当我做了测试运行，我创建了一个新的，在一个单一的一行：

def export_as_df(self): return pd.DataFrame(data=[self._to_dict()], index=[datetime.datetime.now()])

然后调用HDFStore.append(string, DataFrame)以新行添加到现有的这工作得很好，除了这里的字符串列的内容之一，比现有的，最长的实例较大 CodeGo.net，于是我得到了以下错误：

File "<ipython-input-302-a33c7955df4a>", line 516, in save_pytablesstore.append('tests', test.export_as_df())File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/io/pytables.py", line 532, in appendself._write_to_group(key, value, table=True, append=True, **kwargs)File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/io/pytables.py", line 788, in _write_to_groups.write(obj = value, append=append, complib=complib, **kwargs)File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/io/pytables.py", line 2491, in writemin_itemsize=min_itemsize, **kwargs)File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/io/pytables.py", line 2254, in create_axesraise Exception("cannot find the correct atom type -> [dtype->%s,items->%s] %s" % (b.dtype.name, b.items, str(detail)))Exception: cannot find the correct atom type -> [dtype->object,items->Index([bp, id, inst, per, sp, st, title], dtype=object)] [values_block_3] column has a min_itemsize of [51] but itemsize [46] is required!

我找不到任何有关如何创建一个解决方案是什么时候在这里指定字符串的长度？更新：代码是失败的：

  store = pd.HDFStore(pytables_store)     for test in self.backtests:   try:    min_itemsizes = { 'buy_pattern' : 60, 'sell_pattern': 60, 'strategy': 60, 'title': 60 }    store.append('tests', test.export_as_df(), min_itemsize = min_itemsizes)

下面是根据0.11rc1错误：

File "<ipython-input-110-492b7b6603d7>", line 522, in save_pytables store.append('tests', test.export_as_df(), min_itemsize = min_itemsizes)File "/Users/admin/dev/pandas/pandas-0.11.0rc1/pandas/io/pytables.py", line 610, in append self._write_to_group(key, value, table=True, append=True, **kwargs)File "/Users/admin/dev/pandas/pandas-0.11.0rc1/pandas/io/pytables.py", line 871, in _write_to_group s.write(obj = value, append=append, complib=complib, **kwargs)File "/Users/admin/dev/pandas/pandas-0.11.0rc1/pandas/io/pytables.py", line 2707, in write min_itemsize=min_itemsize, **kwargs)File "/Users/admin/dev/pandas/pandas-0.11.0rc1/pandas/io/pytables.py", line 2447, in create_axes self.validate_min_itemsize(min_itemsize)File "/Users/admin/dev/pandas/pandas-0.11.0rc1/pandas/io/pytables.py", line 2184, in validate_min_itemsize raise ValueError("min_itemsize has [%s] which is not an axis or data_column" % k)ValueError: min_itemsize has [buy_pattern] which is not an axis or data_column

数据样本：

       all_day    buy_pattern 2013-04-14 12:11:44.377695 False Hammer() and LowerLow()                id instrument 2013-04-14 12:11:44.377695 tafdcc96ba4eb11e2a86d14109fcecd49  EURUSD        open_margin periodicity sell_pattern strategy 2013-04-14 12:11:44.377695  0.0001  1:00:00     Tsl()        title top_bottom wick_body 2013-04-14 12:11:44.377695 tsl   0.5   2

dtypes：

print prob_test.export_as_df().get_dtype_counts()  bool  1 float64 2 int64  1 object  7 dtype: int64

我每次删除，因为我想清理结果H5文件。不知道是否因为它是失败的DF在H5不存在（因此也没有任何列）在优先个附加的存在一样傻？
本文地址：CodeGo.net/9115474/
-------------------------------------------------------------------------------------------------------------------------
1.这里是链接到这个新的文档部分：这个问题是你specifiying在min_itemsize列，是不是data_column。简单的解决方法是增加data_columns=True您的追加，但我也更新，如果你传递一个有效的专栏中，我觉得这是有道理的，以自动创建data_columns的代码，你想有一个最小列大小 CodeGo.net，所以让它发生。还创建了一个新的文档部分的字符串列，以显示与解释性的例子（文档将尽快更新）。

# this is the new behavior (after code updates)n [340]: dfs = DataFrame(dict(A = 'foo', B = 'bar'),index=range(5))In [341]: dfsOut[341]:   A B0 foo bar1 foo bar2 foo bar3 foo bar4 foo bar# A and B have a size of 30In [342]: store.append('dfs', dfs, min_itemsize = 30)In [343]: store.get_storer('dfs').tableOut[343]: /dfs/table (Table(5,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": StringCol(itemsize=30, shape=(2,), dflt='', pos=1)} byteorder := 'little' chunkshape := (963,) autoIndex := True colindexes := { "index": Index(6, medium, shuffle, zlib(1)).is_CSI=False}# A is created as a data_column with a size of 30# B is size is calculatedIn [344]: store.append('dfs2', dfs, min_itemsize = { 'A' : 30 })In [345]: store.get_storer('dfs2').tableOut[345]: /dfs2/table (Table(5,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": StringCol(itemsize=3, shape=(1,), dflt='', pos=1), "A": StringCol(itemsize=30, shape=(), dflt='', pos=2)} byteorder := 'little' chunkshape := (1598,) autoIndex := True colindexes := { "A": Index(6, medium, shuffle, zlib(1)).is_CSI=False, "index": Index(6, medium, shuffle, zlib(1)).is_CSI=False}

本文标题：当字符串列含量比已经不再有HDFStore.append（字符串，数据帧）失败
本文地址：CodeGo.net/9115474/

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。