User Guide
Command Line Interface
Usage: python -m mobisurvstd [OPTIONS] SOURCE OUTPUT_DIRECTORY
Mobility Survey Standardizer: a Python command line tool to convert mobility surveys to a clean
standardized format.
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────╮
│ * source TEXT Path to the directory or the zipfile where the survey data is │
│ located. │
│ [default: None] │
│ [required] │
│ * output_directory TEXT Path to the directory where the standardized survey should be │
│ stored. │
│ [default: None] │
│ [required] │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────╮
│ --survey-type TEXT Format of the original survey. Possible values: `emc2`, │
│ `emp2019`, `egt2010`, `egt2020`, `edgt`, `edvm`, `emd`. │
│ [default: None] │
│ --bulk Import surveys in bulk from the given directory │
│ --skip-spatial Do not read spatial data │
│ --no-validation Do not validate the standardized data (some guarantees might not │
│ be satisfied) │
│ --clear-cache Clear the cache data and exit │
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize │
│ the installation. │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
Examples
From directory
Read the EGT2020 survey from the original_egt2020
directory and store the standardized version in
the standardized_egt2020
directory.
python -m mobisurvstd original_egt2020 standardized_egt2020
From zipfile
Read the EGT2020 survey from the original_egt2020.zip
file and store the standardized version in
the standardized_egt2020
directory.
python -m mobisurvstd original_egt2020.zip standardized_egt2020
Bulk import
Read all surveys in the my_surveys
directory and store their standardized version in the
standardized_surveys
directory.
python -m mobisurvstd --bulk my_surveys standardized_surveys
Usage from Python
standardize
Converts a mobility survey to a clean standardized format.
mobisurvstd.standardize(
source: str,
output_directory: str | None = None,
survey_type: str | None = None,
add_name_subdir: bool = False,
skip_spatial: bool = False,
no_validation: bool = False,
) -> mobisurvstd.classes.SurveyData | None
Parameters
----------
source
Path to a directory or zipfile.
When a directory is given, it must be the top-level directory of the survey to be converted.
When a zipfile is given, the directories within the zipfile are read recursively so that the
survey's files can be found no matter how deeply nested the zipfile is.
output_directory
Path to the directory where the standardized survey should be stored.
If the directory does not exist, MobiSurvStd will create it (recursively).
If None, the standardized survey will not be saved.
survey_type
String indicating the type of the survey to be converted.
If the value is omitted, MobiSurvStd will do its best to guess the survey type.
Possible values: "emc2", "emp2019", "egt2020", "egt2010", "edgt", "edvm", "emd".
add_name_subdir
Whether the standardized survey is stored directly in `output_directory` or within a
subdirectory of `output_directory`.
If True, the standardized survey is stored in a subdirectory within `output_directory`. The
subdirectory name is the survey name.
If False (default), the standardized survey is stored directly in `output_directory`.
skip_spatial
If True, MobiSurvStd will not try to read spatial data from the survey.
This means that special locations, detailed zones, and draw zones will not be read and
proposed as an output.
Some variables (e.g., home_lng, home_lat) might also be missing as a result.
no_validation
If True, MobiSurvStd will not validate the standardized data.
This means that guarantees for some variables might not be satisfied.
Returns
-------
SurveyData
Example: Read the EGT2020 survey from the original_egt2020.zip
file and store the
standardized version in the standardized_egt2020
directory.
import mobisurvstd
mobisurvstd.standardize(
"original_egt2020.zip",
"standardized_egt2020",
survey_type="egt2020",
)
bulk_standardize
Standardizes mobility surveys in bulk from a given directory.
MobiSurvStd will explore all directories and zipfiles within directory
, try to standardize
them and store the standardized data in output_directory
.
mobisurvstd.bulk_standardize(
directory: str,
output_directory: str,
survey_type: str | None = None,
skip_spatial: bool = False,
no_validation: bool = False,
)
Parameters
----------
directory
Path to a directory.
The directory must contain survey data, stored within directories or zipfiles.
output_directory
Path to the directory where the standardized surveys should be stored.
If the directory does not exist, MobiSurvStd will create it (recursively).
Each survey read is stored in a subdirectory whose name is the survey's name.
survey_type
String indicating the type of the surveys to be converted.
If the directory contains surveys of different types, leave this value to None and
MobiSurvStd will try to guess the type of each survey.
Possible values: "emc2", "emp2019", "egt2020", "egt2010", "edgt", "edvm", "emd".
skip_spatial
If True, MobiSurvStd will not try to read spatial data from the surveys.
This means that special locations, detailed zones, and draw zones will not be read and
proposed as an output.
Some variables (e.g., home_lng, home_lat) might also be missing as a result.
no_validation
If True, MobiSurvStd will not validate the standardized data.
This means that guarantees for some variables might not be satisfied.
Example: Read all surveys in the my_surveys
directory and store their standardized version
in the standardized_surveys
directory.
import mobisurvstd
mobisurvstd.bulk_standardize("my_surveys", "standardized_surveys")
SurveyDataReader
Data structure representing a MobiSurvStd survey.
Create a SurveyDataReader from a directory.
import mobisurvstd
>>> data = mobisurvstd.SurveyDataReader("output/emp2019/")
Access the survey’s metadata as a dictionary:
>>> data.metadata
{'name': 'EMP2019',
'type': 'EMP2019',
'survey_method': 'face_to_face',
'nb_households': 13825,
'nb_cars': 18817,
'nb_motorcycles': 1264,
'nb_persons': 31694,
'nb_trips': 45169,
'nb_legs': 46507,
'nb_special_locations': 0,
'nb_detailed_zones': 0,
'nb_draw_zones': 0,
'start_date': '2018-05-01',
'end_date': '2019-04-30',
'insee': None}
Access the survey’s households as a polars.DataFrame:
>>> data.households
┌─────────────┬─────────────┬────────────┬────────────┬───┬────────────┬────────────┬───────────┬───────────┐
│ household_i ┆ original_ho ┆ survey_met ┆ interview_ ┆ … ┆ nb_persons ┆ nb_persons ┆ nb_majors ┆ nb_minors │
│ d ┆ usehold_id ┆ hod ┆ date ┆ ┆ --- ┆ _5plus ┆ --- ┆ --- │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ u8 ┆ --- ┆ u8 ┆ u8 │
│ u32 ┆ struct[1] ┆ enum ┆ date ┆ ┆ ┆ u8 ┆ ┆ │
╞═════════════╪═════════════╪════════════╪════════════╪═══╪════════════╪════════════╪═══════════╪═══════════╡
│ 1 ┆ {"110000011 ┆ face_to_fa ┆ null ┆ … ┆ 1 ┆ 1 ┆ 1 ┆ 0 │
│ ┆ 4000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 2 ┆ {"110000011 ┆ face_to_fa ┆ null ┆ … ┆ 4 ┆ null ┆ 3 ┆ 1 │
│ ┆ 5000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 3 ┆ {"110000011 ┆ face_to_fa ┆ null ┆ … ┆ 2 ┆ null ┆ 2 ┆ 0 │
│ ┆ 6000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 4 ┆ {"110000012 ┆ face_to_fa ┆ null ┆ … ┆ 2 ┆ null ┆ 2 ┆ 0 │
│ ┆ 4000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 5 ┆ {"110000012 ┆ face_to_fa ┆ null ┆ … ┆ 2 ┆ null ┆ 2 ┆ 0 │
│ ┆ 5000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 13821 ┆ {"940000036 ┆ face_to_fa ┆ null ┆ … ┆ 1 ┆ 1 ┆ 1 ┆ 0 │
│ ┆ 1000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 13822 ┆ {"940000036 ┆ face_to_fa ┆ null ┆ … ┆ 1 ┆ 1 ┆ 1 ┆ 0 │
│ ┆ 4000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 13823 ┆ {"940000041 ┆ face_to_fa ┆ null ┆ … ┆ 2 ┆ null ┆ 2 ┆ 0 │
│ ┆ 5000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 13824 ┆ {"940000044 ┆ face_to_fa ┆ null ┆ … ┆ 1 ┆ 1 ┆ 1 ┆ 0 │
│ ┆ 1000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
│ 13825 ┆ {"940000052 ┆ face_to_fa ┆ null ┆ … ┆ 1 ┆ 1 ┆ 1 ┆ 0 │
│ ┆ 1000"} ┆ ce ┆ ┆ ┆ ┆ ┆ ┆ │
└─────────────┴─────────────┴────────────┴────────────┴───┴────────────┴────────────┴───────────┴───────────┘
read_many
Runs a function on all MobiSurvStd surveys found in a directory and aggregates the results.
mobisurvstd.read_many(
directory: str,
read_fn: collections.abc.Callable,
acc_fn: collections.abc.Callable,
)
Parameters
----------
directory
Path to the directory where the MobiSurvStd surveys to read are stored.
The directory will be read recursively so the surveys can be stored in subdirectories.
read_fn
Function to be run on each survey.
It takes a single argument whose type is `SurveyDataReader`.
acc_fn
Function to aggregate the results from two surveys.
This must be a function of two arguments, whose type is the same as the return type of
`read_fn`.
Examples
Read the total number of households from all surveys in the “output” directory:
import mobisurvstd
mobisurvstd.read_many("output/", lambda d: len(d.households), lambda x, y: x + y)
Concatenate all trips in a single DataFrame:
import mobisurvstd
mobisurvstd.read_many("output/", lambda d: d.trips, lambda x, y: pl.concat((x, y)))
More complex examples on the use of read_many
can be found in the
analyses
directory.