Introduction

About this documentation

This documentation is governed by GNU Free Documentation License, version 1.3. It will assume that you are using a relatively recent version of SPPAS. There’s no reason not to download the latest version whenever released: it’s easy and fast!

Any and all constructive comments are welcome.

What is SPPAS?

Overview

SPPAS - the automatic annotation and analyses of speech is a scientific computer software package. SPPAS is daily developed with the aim to provide a robust and reliable software. Available for free, with open source code, there is simply no other package for linguists to simple use in both the automatic annotations of speech and the analyses of any kind of annotated data. You can imagine the annotations or analyses you need, SPPAS does the rest! It doesn’t do? Send your suggestion to the author!

Annotating recordings is very labor-intensive and cost-ineffective since it has to be performed manually by experienced researchers with many hours of work. As the primary functionality, SPPAS proposes a set of automatic or semi-automatic annotations of recordings. In the present context, annotations are defined as the practice of adding interpretative, linguistic information to an electronic corpus of spoken and/or written language data. Annotation can also refer to the end-product of this process (Leech, 1997). SPPAS automatizes the annotation processes and allows users to save time. In order to be used efficiently, SPPAS expects a rigorous methodology to collect data and to prepare them.

Linguistics annotation, especially when dealing with multiple domains, makes use of different tools within a given project. This implies a rigorous annotation framework to ensure compatibilities between annotations and time-saving. SPPAS annotation files are in a specific XML format with extension xra. Annotations can be imported from and exported to a variety of other formats including Praat (TextGrid, PitchTier, IntensityTier), Elan (eaf), Transcriber (trs), Annotation Pro (antx), Phonedit (mrk), Sclite (ctm, stm), HTK (lab, mlf), subtitles formats (srt, sub), CSV files…

[…] when multiple annotations are integrated into a single data set, inter-relationships between the annotations can be explored both qualitatively (by using database queries that combine levels) and quantitatively (by running statistical analyses or machine learning algorithms) (Chiarcos 2008). As a consequence, the annotations must be time-synchronized: annotations need to be time-aligned in order to be useful for purposes such as analyses. Some special features are offered in SPPAS for managing annotated files and analyzing data. Among others, it includes a tool to filter multi-levels annotations (Bigi and Saubesty, 2015). Other included tools are to estimate descriptive statistics, a version of the Time Group Analyzer (Gibbon 2013), manage annotated files, manage audio files, etc. These data analysis tools of SPPAS are mainly proposed in the Graphical User Interface. However, advanced users can also access directly the Application Programming Interface, for example to estimate statistics or to manipulate annotated data.

User engagement

By using SPPAS, you agree to cite a reference in your publications. The full list of references is available in Chapter 7.

Important: about SPPAS 3.+

Python 2.7 reached the end of its life.

For a full support of SPPAS 3.+, upgrade your Python to 3.x as Python 2.7 is no longer maintained, i.e. no new bug reports, fixes, or changes will be made to SPPAS based on Python 2.

Need help

  1. Many problems can be solved by updating the version of SPPAS.

  2. When looking for more detail about some subject, one can search this documentation. This documentation is available in-line - see the SPPAS website, it is also included in the package in PDF format.

  3. There is a F.A.Q. in the SPPAS web site.

  4. There are tutorials in the SPPAS web site.

  5. If none of the above helps, you may contact the author by e-mail. Do not expect an answer if you don’t indicate clearly:

    1. your operating system and its version,
    2. the version of SPPAS (supposed to be the last one), and
    3. the log file,
    4. for automatic annotations, send the report file, and a sample of the data on which a problem occurs.

About the author

Since January 2011, Brigitte Bigi is the main author of SPPAS. She has a tenured position of researcher at the French CNRS - Centre National de la Recherche Scientifique. She’s working since 2009 at Laboratoire Parole et Langage in Aix-en-Provence, France.

More about the author:

Contact the author by e-mail:

Do not contact the author if:

Possible e-mails are:

Contributors

Here is the list of other contributors in programming:

Licenses

SPPAS software, except documentation and resources, are distributed under the terms of the GNU GENERAL PUBLIC LICENSE v3.

Linguistic resources of SPPAS are either distributed:

See the chapter 4 of this documentation for details about individual license of the proposed resources.

To summarize, SPPAS users are:

Supports

2011-2012:

Partly supported by ANR OTIM project (Ref. Nr. ANR-08-BLAN-0239), Tools for Multimodal Information Processing. Read more at: http://www.lpl-aix.fr/~otim/

2013-2015:

Partly supported by ORTOLANG (Ref. Nr. ANR-11-EQPX-0032) funded by the « Investissements d’Avenir » French Government program managed by the French National Research Agency (ANR). Read more at: http://www.ortolang.fr/

2014-2016:

SPPAS was also partly carried out thanks to the support of the following projects or groups:

2017-2020:

The introduction of Naija language is supported by the ANR NaijaSynCor.

2019-2020:

The introduction of workspaces to manage files and the SPEAKER annotation type were both supported by the Vapvisio ANR project (ANR-18-CE28-0011-01).

Getting and installing

A tutorial is dedicated to the download and installation of SPPAS.

Websites

The main website of SPPAS is located at the following URL:

http://www.sppas.org

Click on Get it -> Download to get access to the last release and to some of the recent ones.

The source code is hosted by sourceforge at:

https://sourceforge.net/projects/sppas/

Download and install SPPAS

The main website contains the Download page to download recent versions, and an installation guide is also available: http://www.sppas.org/installation.html. Moreover, a tutorial is describing each of the download and installation steps (series 2). The tutorial is in French but English subtitles are available.

To summarize:

Notice that administrator rights are required to perform steps 2 and 3.

In case of difficulty arising from this setup, you’re invited to consult the web first. It probably will provide the solution. If, however, the problems were to persist, contact the author by e-mail.

The package

Unlike many other software tool, SPPAS is not distributed as an executable program only. Instead, everything is done so that users can check / change operation. It is particularly suitable for automatic annotations: it allows anyone to adapt automatic annotations to its own needs. The package of SPPAS is then a directory with content as files and folders.

The SPPAS package contains:

Update

SPPAS is constantly being improved and new packages are published frequently (about 10 versions a year). It is important to update regularly in order to get the latest features and corrections.

Updating SPPAS is very easy and fast: apply steps 2 and 3 of the installation process.

Features

How to use SPPAS?

There are three main ways to use SPPAS:

  1. The Graphical User Interface (GUI) is as user-friendly as possible:

    • double-click on the sppas.bat file, under Windows;
    • double-click on the sppas.command file, under MacOS or Linux.
  2. The Command-line User Interface (CLI), with a set of programs, each one essentially independent of the others, that can be run on its own at the level of the shell.

  3. Scripting with Python and SPPAS provides the more powerful way.

Features of SPPAS can then be used either with a Command-line User Interface (CLI) or a Graphical User Interface (GUI). This latter requires wxPython to be installed but not the former. So, there’s no specific difficulty by using this software.

Advanced users can also access directly the Application Programming Interface - API.

What SPPAS can do?

Features of SPPAS can be divided into 3 main categories:

  1. Annotate
  2. Analyze
  3. Convert

The three next figures list the features of each category and the interface to get access to it.

SPPAS Automatic annotations
SPPAS Automatic annotations
SPPAS Automatic analysis
SPPAS Automatic analysis
SPPAS Automatic file conversion
SPPAS Automatic file conversion

Main and important recommendations

About files

There is a list of important things to keep in mind while annotating with SPPAS. They are summarized as follows and detailed in the chapters of this documentation:

  1. Speech audio files for automatic annotations:

    • only wav and au files are supported
    • only mono (= one channel) files are supported
    • frame rate is preferably 16000hz
    • bit rate is preferably 16 bits
    • good recording quality is expected. It is obviously required to never convert from a compressed file, like mp3 or aac for example.
  2. Annotated data files:

    • UTF-8 encoding only
  3. It is recommended to use only US-ASCII characters in file names (obviously it includes its path)

About automatic annotations

The quality of the results for most of the automatic annotations is highly influenced by the quality of the data the annotation takes in input. This is a politically correct way to say: Garbage in, garbage out!

Annotations are based on the use of linguistic resources. Resources for several languages are gently shared and freely available in the package of SPPAS. The quality of the automatic annotations is largely influenced by the quality of the linguistic resources. Any help is welcome to improve or add resources.

About linguistic resources

Users are of crucial importance for resource development.

The users of SPPAS are invited to contribute to improve them. They can release the improvements to the public, so that the whole community benefits.