@tableofcontents

/** @page intro Introduction

Performing I/O is straightforward when a small number of processors
are being used.

@image html I_O_on_Few.png "I/O on One or a Few Processors"

Parallel I/O does not scale to thousands of processors, because the
parallel disk systems do not support the bandwitdh to allow thousands
of processors to access the disk at the same time. As a result, most
of the processors will have to wait to do I/O.

An obvious solution is to designate a small number of processors to do
I/O, and use the rest for computation.

@image html I_O_on_Many_Intracomm.png "I/O on Many Processors Intracomm"

PIO provides a netCDF-like API which provides this service. User code
is written as if parallel I/O is being used from every processor, but,
under the hood, PIO uses the I/O processors to do all data access.

With Intracomm Mode, the I/O processors are a subset of the
computational processors, and only one computational unit is
supported. In Async Mode, the I/O processors are dedicated to I/O, and
do not perform computation. Also, more than one computational unit may
be designated.

@image html I_O_on_Many_Async.png "I/O on Many Processors Async"

The user initializes the PIO IO System, designating some processors
for I/O, others for computation.

PIO decompositions and distributed arrays allow the code to be written
in terms of the local, distributed sub-array (see @ref decomp). PIO
handles the stitching of all data into the correct global space in a
netCDF variable.

PIO also allows for the creation of multiple computational units. Each
computational unit consists of many processors. I/O for all
computational units is accomplished through one set of dedicated I/O
processors (see @ref iosystem).

PIO uses <a
href=http://www.unidata.ucar.edu/software/netcdf/docs/html_guide/index.html#user_guide>
netcdf </a> and <a
href=https://parallel-netcdf.github.io/> pnetcdf</a> to
read and write the netCDF files (see @ref install).

## Basic description of how to optimize IO in a parallel environment:

PIO calls are collective.  A MPI communicator is set in a call to @ref
PIO_init and all tasks associated with that communicator must
participate in all subsequent calls to PIO.  An application can make
multiple calls to @ref PIO_init in order to support multiple MPI
communicators.

Begin by getting and unpacking the most recent release of PIO from
[gitHub](https://github.com/PARALLELIO/ParallelIO/releases) and
installing on your system as per the instructions in the
[Installation](@ref install) document. Take a look at examples of PIO
usage in both complex and simple test programs in the [Examples](@ref
examp) document. Finally, read through the [FAQ](@ref faq) to see if
any remaining questions can be answered.

### Using PIO has three basic steps. ###

1. Your program should call the @ref PIO_init function, and provide
the MPI communicator (and the rank within that communicator) of the
calling task. This call initializes an IO system type structure that
will be used in subsequent file and decomposition functions.

2. You can open a file for reading or writing with a call to @ref
PIO_createfile or @ref PIO_openfile. In this call you will specify the
file type: pio_iotype_netcdf, pio_iotype_pnetcdf, pio_iotype_netcdf4c
or pio_iotype_netcdf4p; along with the file name and optionally the
netcdf mode.

3. Finally, you can read or write decomposed data to the output
file. You must describe the mapping between the organization of data
in the file and that same data in the application space.  This is done
in a call to @ref PIO_initdecomp. In the simplest call to this
function, a one dimensional integer array is passed from each task,
the values in the array represent the offset from the beginning of the
array on file.


*/
