DSFF (DataSet File Format) is a tiny library relying on
openpyxl that allows to store a dataset with its features for use with machine learning in an XSLX file whose structure is enforced. It is intended to make easy to store, edit and exchange a dataset.
It is used with the Packing Box to export datasets in a convenient format.
This library is available on PyPi and can be simply installed using Pip:
pip install --user dsff
DSFF is straightforward and contains only the minimum for storing a dataset.
The following document properties of the XSLX format are used:
title: this holds the name of the dataset
description: this holds a serialized dictionary of the metadata from the dataset
An XSLX workbook format as a DSFF has two and only two worksheets:
data: the matrix of the whole dataset (including headers), eventually containing samples' metadata but mostly the feature values
features: the name-description pairs of each feature used in
data(including two headers: