Skip to content

dylanbahenda/Advanced-Data-Management-Mobility

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Data Management: City Mobility Platform

This repository contains the implementation and scalability analysis of a data management system for a short-term electric vehicle (EV) rental platform in Italy. The project evaluates and compares Relational, Document, Graph, and Distributed database architectures.

Project Overview

The mobility platform tracks Users, Stations, Trips, and telemetry Events (GPS, Errors, Battery, Delays). To determine the optimal architecture for different query workloads, the system was implemented and benchmarked across four distinct paradigms:

  1. Relational Model (PostgreSQL): Highly normalized (3NF) tables utilizing foreign keys and optimized joins.
  2. Document Model (MongoDB): A hybrid schema utilizing referencing for static entities (Users/Stations) and embedding for telemetry (Events) to maximize data locality.
  3. Graph Model (Neo4j): Modeled using Index-Free Adjacency to optimize spatial connectivity and path traversals (finding reachable stations).
  4. Distributed Processing (Apache Spark / GraphFrames): Used for batch-processing iterative algorithms like PageRank and Connected Components on the station sub-graph.

Repository Structure

/
├── scripts/
│   ├── data_generator_1.py       # Generates synthetic CSV data using Faker
│   ├── benchmark_part1.py        # Scalability tests for Postgres & MongoDB
│   ├── benchmark_part2.py        # Scalability tests for Neo4j & GraphFrames
│   └── run_experiments.py        # Automated runner for the 36-experiment matrix
├── notebooks/
│   ├── 01_Relational_PostgreSQL.ipynb
│   ├── 02_Document_MongoDB.ipynb
│   ├── 03_Graph_Neo4j.ipynb
│   ├── 04_Spark_Graphframes.ipynb
│   └── plot_results.ipynb        # Visualization of the scalability benchmarks
├── report.pdf                    # Final project report
├── requirements.txt              # Python dependencies
└── README.md

Setup and Execution

Prerequisites

  • Python 3.9+
  • Java JDK 11+ (Required for PySpark)
  • Running instances of PostgreSQL, MongoDB, and Neo4j.

Installation

Clone the repository:

git clone https://github.com/dylanbahenda/Advanced-Data-Management-Mobility.git
cd Advanced-Data-Management-Mobility

Install the required Python packages:

pip install -r requirements.txt

Run the automated benchmarks (Warning: Depending on hardware, this can take 1-2 hours):

python scripts/run_experiments.py

Author

Bahenda Yvon Dylan Ntegano

Developed for the Advanced Data Management course (A.Y. 2025/2026).

About

This repository contains the implementation and scalability analysis of a data management system for a short-term electric vehicle (EV) rental platform in Italy. The project evaluates and compares Relational, Document, Graph, and Distributed database architectures.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors