User Guide

Command Line Interface

python -m mobisurvstd SOURCE OUTPUT_DIRECTORY --survey-type TYPE [--bulk]
  • SOURCE is a path to either a directory or a zipfile where the survey data is stored.
  • OUTPUT_DIRECTORY is a path to the directory where the standardized survey should be stored.
  • --survey-type TYPE is the type of the original survey (“emc2”, “emp2019”, “egt2010”, “egt2020”, “edgt”, “edvm”, or “emd”). If omitted, MobiSurvStd will guess the survey type.
  • --bulk must be used when you want to import all surveys located within the SOURCE directory.

Examples

From directory

Read the EGT2020 survey from the original_egt2020 directory and store the standardized version in the standardized_egt2020 directory.

python -m mobisurvstd original_egt2020 standardized_egt2020 --survey-type egt2020

From zipfile

Read the EGT2020 survey from the original_egt2020.zip file and store the standardized version in the standardized_egt2020 directory.

python -m mobisurvstd original_egt2020.zip standardized_egt2020 --survey-type egt2020

Bulk import

Read all surveys in the my_surveys directory and store their standardized version in the standardized_surveys directory.

python -m mobisurvstd my_surveys standardized_surveys --bulk

Usage from Python

standardize

Converts a mobility survey to a clean standardized format.

mobisurvstd.standardize(
    source: str,
    output_directory: str,
    survey_type: str | None = None,
    add_name_subdir: bool = False,
) -> mobisurvstd.classes.SurveyData | None
Parameters
----------
source
    Path to a directory or zipfile.
    When a directory is given, it must be the top-level directory of the survey to be converted.
    When a zipfile is given, the directories within the zipfile are read recursively so that the
    survey's files can be found no matter how deeply nested the zipfile is.
output_directory
    Path to the directory where the standardized survey should be stored.
    If the directory does not exist, MobiSurvStd will create it (recursively).
survey_type
    String indicating the type of the survey to be converted.
    If the value is omitted, MobiSurvStd will do its best to guess the survey type.
    Possible values: "emc2", "emp2019", "egt2020", "egt2010", "edgt", "edvm".
add_name_subdir
    Whether the standardized survey is stored directly in `output_directory` or within a
    subdirectory of `output_directory`.
    If True, the standardized survey is stored in a subdirectory within `output_directory`. The
    subdirectory name is the survey name.
    If False (default), the standardized survey is stored directly in `output_directory`.

Returns
-------
SurveyData

Example: Read the EGT2020 survey from the original_egt2020.zip file and store the standardized version in the standardized_egt2020 directory.

import mobisurvstd
mobisurvstd.standardize(
    "original_egt2020.zip",
    "standardized_egt2020",
    survey_type="egt2020",
)

bulk_standardize

Standardizes mobility surveys in bulk from a given directory.

MobiSurvStd will explore all directories and zipfiles within directory, try to standardize them and store the standardized data in output_directory.

mobisurvstd.bulk_standardize(
    directory: str,
    output_directory: str,
    survey_type: str | None = None,
)
Parameters
----------
directory
    Path to a directory.
    The directory must contain survey data, stored within directories or zipfiles.
output_directory
    Path to the directory where the standardized surveys should be stored.
    If the directory does not exist, MobiSurvStd will create it (recursively).
    Each survey read is stored in a subdirectory whose name is the survey's name.
survey_type
    String indicating the type of the surveys to be converted.
    If the directory contains surveys of different types, leave this value to None and
    MobiSurvStd will try to guess the type of each survey.
    Possible values: "emc2", "emp2019", "egt2020", "egt2010", "edgt", "edvm".

Example: Read all surveys in the my_surveys directory and store their standardized version in the standardized_surveys directory.

import mobisurvstd
mobisurvstd.bulk_standardize("my_surveys", "standardized_surveys")

SurveyDataReader

Data structure representing a MobiSurvStd survey.

Create a SurveyDataReader from a directory.

import mobisurvstd
>>> data = mobisurvstd.SurveyDataReader("output/emp2019/")

Access the survey’s metadata as a dictionary:

>>> data.metadata
{'type': 'EMP2019',
 'survey_method': 'face_to_face',
 'nb_households': 13825,
 'nb_cars': 18817,
 'nb_motorcycles': 1264,
 'nb_persons': 31694,
 'nb_trips': 45169,
 'nb_legs': 46507,
 'nb_special_locations': 0,
 'nb_detailed_zones': 0,
 'nb_draw_zones': 0,
 'nb_insee_zones': 0,
 'start_date': '2018-05-01',
 'end_date': '2019-04-30',
 'insee': None}

Access the survey’s households as a polars.DataFrame:

>>> data.households
┌─────────────┬─────────────┬────────────┬────────────┬───┬────────────┬────────────┬───────────┬───────────┐
│ household_i ┆ original_ho ┆ survey_met ┆ interview_ ┆ … ┆ nb_persons ┆ nb_persons ┆ nb_majors ┆ nb_minors │
│ d           ┆ usehold_id  ┆ hod        ┆ date       ┆   ┆ ---        ┆ _5plus     ┆ ---       ┆ ---       │
│ ---         ┆ ---         ┆ ---        ┆ ---        ┆   ┆ u8         ┆ ---        ┆ u8        ┆ u8        │
│ u32         ┆ struct[1]   ┆ enum       ┆ date       ┆   ┆            ┆ u8         ┆           ┆           │
╞═════════════╪═════════════╪════════════╪════════════╪═══╪════════════╪════════════╪═══════════╪═══════════╡
│ 1           ┆ {"110000011 ┆ face_to_fa ┆ null       ┆ … ┆ 1          ┆ 1          ┆ 1         ┆ 0         │
│             ┆ 4000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 2           ┆ {"110000011 ┆ face_to_fa ┆ null       ┆ … ┆ 4          ┆ null       ┆ 3         ┆ 1         │
│             ┆ 5000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 3           ┆ {"110000011 ┆ face_to_fa ┆ null       ┆ … ┆ 2          ┆ null       ┆ 2         ┆ 0         │
│             ┆ 6000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 4           ┆ {"110000012 ┆ face_to_fa ┆ null       ┆ … ┆ 2          ┆ null       ┆ 2         ┆ 0         │
│             ┆ 4000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 5           ┆ {"110000012 ┆ face_to_fa ┆ null       ┆ … ┆ 2          ┆ null       ┆ 2         ┆ 0         │
│             ┆ 5000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ …           ┆ …           ┆ …          ┆ …          ┆ … ┆ …          ┆ …          ┆ …         ┆ …         │
│ 13821       ┆ {"940000036 ┆ face_to_fa ┆ null       ┆ … ┆ 1          ┆ 1          ┆ 1         ┆ 0         │
│             ┆ 1000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 13822       ┆ {"940000036 ┆ face_to_fa ┆ null       ┆ … ┆ 1          ┆ 1          ┆ 1         ┆ 0         │
│             ┆ 4000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 13823       ┆ {"940000041 ┆ face_to_fa ┆ null       ┆ … ┆ 2          ┆ null       ┆ 2         ┆ 0         │
│             ┆ 5000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 13824       ┆ {"940000044 ┆ face_to_fa ┆ null       ┆ … ┆ 1          ┆ 1          ┆ 1         ┆ 0         │
│             ┆ 1000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
│ 13825       ┆ {"940000052 ┆ face_to_fa ┆ null       ┆ … ┆ 1          ┆ 1          ┆ 1         ┆ 0         │
│             ┆ 1000"}      ┆ ce         ┆            ┆   ┆            ┆            ┆           ┆           │
└─────────────┴─────────────┴────────────┴────────────┴───┴────────────┴────────────┴───────────┴───────────┘

read_many

Runs a function on all MobiSurvStd surveys found in a directory and aggregates the results.

mobisurvstd.read_many(
    directory: str,
    read_fn: collections.abc.Callable,
    acc_fn: collections.abc.Callable,
)
Parameters
----------
directory
    Path to the directory where the MobiSurvStd surveys to read are stored.
    The directory will be read recursively so the surveys can be stored in subdirectories.
read_fn
    Function to be run on each survey.
    It takes a single argument whose type is `SurveyDataReader`.
acc_fn
    Function to aggregate the results from two surveys.
    This must be a function of two arguments, whose type is the same as the return type of
    `read_fn`.

Examples

Read the total number of households from all surveys in the “output” directory:

import mobisurvstd
mobisurvstd.read_many("output/", lambda d: len(d.households), lambda x, y: x + y)

Concatenate all trips in a single DataFrame:

import mobisurvstd
mobisurvstd.read_many("output/", lambda d: d.trips, lambda x, y: pl.concat((x, y)))

More complex examples on the use of read_many can be found in the analyses directory.