Command Line Interface
A simple command line interface is installed when you install the client library.
The CLI is available via the hoss command.
% hoss -h
Usage: hoss [OPTIONS] COMMAND [ARGS]...
A Command Line Interface to interact with a Hoss server.
Options:
-h, --help Show this message and exit.
Commands:
download Download files to a local directory from a prefix in a Dataset
upload Upload files in a directory to an existing dataset
version Print client library and server version info
Upload Tool
The tool will upload a directory into a dataset with some options as shown below.
% hoss upload -h
Usage: hoss upload [OPTIONS] DATASET_NAME DIRECTORY
Upload files in a directory to an existing dataset
Options:
-n, --namespace TEXT Namespace that contains the dataset
[default: default]
-e, --endpoint TEXT Hoss server root endpoint [default:
http://localhost]
-p, --prefix TEXT Optional prefix to where the files should be
uploaded. If this is not provided, the files
will be uploaded to a 'directory' in the
root of the dataset. The the 'directory'
name will be the same as the source
directory name.
-s, --skip TEXT Optional regular expression used to filter
out files to skip (e.g. myprefix.*\.txt)
-j, --num_processes INTEGER Number of processes to use when uploading
files. If you have too many processes you'll
run out of bandwidth and uploads will
timeout/fail. If you don't have enough, your
upload could take more time. In general, if
you have lots of small files you benefit
from more processes, and if you have large
files, you likely don't need that many
because boto will use concurrent uploads
[default: 1]
-c, --max_concurrency INTEGER Maximum number of concurrent s3 API transfer
operations. [default: 10]
--multipart_threshold INTEGER Threshold in megabytes for which transfers
will be split into multiple parts, defaults
to 5MB [default: 5]
--multipart_chunk_size INTEGER Size in megabytes for each multipart chunk,
if used. Defaults to 5MB [default: 5]
-m, --metadata TEXT Object metadata key-value pair(s) applied to
every object uploaded. You may specify
multiple values by repeating the option
(e.g. -m foo=bar -m fizz=buzz
-h, --help Show this message and exit.
Before running the tool, make sure you set the HOSS_PAT env var to a valid PAT, e.g.
export HOSS_PAT=hp_abcdef1234567890
Then you can simply call hoss upload -e <server url> <dataset name> <directory> . The image below shows the
user interface for a test upload using a temporary directory of test data writing to
the Hoss running on localhost.

If you have many small files or large network bandwidth you’ll likely benefit from increasing num_processes. Some
experimentation may be needed to find an optimal value depending on your data, processing power, and network
bandwidth.
The tool is also available via the hoss.tools.upload.upload_directory function
and can be used directly in a Jupyter Notebook.
Download Tool
The tool will download a prefix within a Dataset (notionally a “folder”) to a local directory.
% hoss download -h
Usage: hoss download [OPTIONS] DATASET NAMESPACE PREFIX DESTINATION
Download files to a local directory from a prefix in a Dataset
DATASET is the name of the dataset from which to download data
NAMESPACE is the name of the namespace that contains the Dataset
PREFIX is the prefix inside the dataset to download. Use "/" to indicate the
root of the dataset.
DESTINATION is the local directory to write files to
Options:
-e, --endpoint TEXT Hoss server root endpoint [default:
http://localhost]
-r, --recursive If set, download all files with the prefix.
Otherwise, only download files at the same
level as the prefix, assuming a `/` delimiter
in the keys to represent 'directories'
[default: False]
-c, --max_concurrency INTEGER max concurrency used when analyzing the
prefix via requests to the object store
[default: 10]
-j, --num_processes INTEGER Number of processes to use when downloading
files. If you have too many processes you'll
run out of bandwidth and downloads will
timeout/fail. If you don't have enough, your
download could take more time. In general, if
you have lots of small files you benefit from
more processes, and if you have large files,
you likely don't need that many [default: 1]
-h, --help Show this message and exit.
Before running the tool, make sure you set the HOSS_PAT env var to a valid PAT, e.g.
export HOSS_PAT=hp_abcdef1234567890
Then you can call hoss download -e <server url> <dataset name> <namespace name> <prefix> <directory>.
To download from the root of a dataset, use / for the prefix. To download all files in a dataset, you would use
/ for the prefix and the -r flag to recursively fetch all data.
If you have many small files or large network bandwidth you’ll likely benefit from increasing num_processes. Some
experimentation may be needed to find an optimal value depending on your data, processing power, and network
bandwidth.
The tool is also available via the hoss.tools.download.download_prefix function
and can be used directly in a Jupyter Notebook.