Skip to content

Usage

This library contains the DSFF class that can:

  • Behave as a context manager
  • Have items got (data and features return the related worksheets) or set (for setting, in order of precedence, standard XSLX properties and metadata contained in the description property)
  • Write data, features and metadata
  • Convert to the ARFF (for use with the Weka framework) or CSV formats or to a FilelessDataset structure (from the Packing Box)

Modes

The DSFF class can be instantiated using a mode of file operation. It works similarly to the native file.open function but with a more reduced set of modes. The following table indicates

Modes r r+ w w+
Read * * *
Write * * *
Create * *
Truncate * *

Bound methods for conversions

When Read is available, the to_* (e.g. to_arff) methods are bound to the DSFF class. On the contrary, when Write is available, the from_* (e.g. from_arff) methods are bound to the DSFF class. As a consequence, the modes with "+" have both to_* and from_* methods attached.

The following pictures illustrate the available alternative formats and their applicable modes:

Converting from other formats to DSFF Converting from DSFF to other formats
From alternative formats to DSFF From DSFF to alternative formats

Lossy conversions

The following conversions only preserve the data (not the dictionary of features or metadata):


Usage

Creating a DSFF from a FilelessDataset

>>> import dsff
>>> with dsff.DSFF() as f:
    f.write("/path/to/my-dataset")  # folder of a FilelessDataset (containing data.csv, features.json and metadata.json)
# while leaving the context, ./my-dataset.dsff is created

Creating an ARFF file from a DSFF

>>> import dsff
>>> with dsff.DSFF("my-dataset.dsff") as f:
    f.to_arff()  # creates ./my-dataset.arff

Creating a CSV file from a DSFF

>>> import dsff
>>> with dsff.DSFF("my-dataset.dsff") as f:
    f.to_csv()  # creates ./my-dataset.csv

Creating a FilelessDataset from a DSFF

>>> import dsff
>>> with dsff.DSFF("/path/to/my-dataset.dsff") as f:
    f.to_dataset()  # creates ./[dsff-title] with data.csv, features.json and metadata.json