Skip to content
View mazavlia's full-sized avatar

Block or report mazavlia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mazavlia/README.md

Hello, I'm Marina Zaporozhets!

Typing SVG

πŸ‘‹ About Me

Data Engineer | DevOps Engineer | BI Systems Architect

With over 2.5 years of experience in designing and implementing end-to-end data solutions, I specialize in building scalable ETL/ELT pipelines, optimizing analytical databases (ClickHouse, PostgreSQL), and migrating BI ecosystems. My expertise lies at the intersection of data engineering, infrastructure automation, and business intelligence.

πŸŽ“ Currently a 2nd year Master's student at Tyumen Industrial University, majoring in "Neural Network Technologies in Automated Control Systems".


πŸ’Ό Professional Experience

Data Engineer / BI Systems Developer

РСтэйл ИВ | Oct 2025 – Present (4 months) | Π•ΠΊΠ°Ρ‚Π΅Ρ€ΠΈΠ½Π±ΡƒΡ€Π³
Retail & Logistics

  • Architected end-to-end ETL/ELT pipelines integrating data from Oracle DWH, Excel/CSV files into ClickHouse, centralizing data and eliminating manual exports
  • Reduced report preparation time from several hours to minutes by implementing automated data pipelines
  • Optimized ClickHouse database structure with storage engines (MergeTree, ReplacingMergeTree), partitioning, and projections, improving query performance 3-5x
  • Led BI migration from Qlik Sense to Apache Superset, maintaining business logic while reducing licensing costs and accelerating new report deployment by 40%
  • Built Airflow orchestration system from scratch in docker-compose, implementing DAGs with incremental loading, retry logic, and error handling, achieving 99.9% data delivery reliability
  • Developed complex analytical SQL queries using window functions, CTE, and self-JOINs for ClickHouse, forming the foundation for real-time KPI dashboards

Data Engineer

ООО "1Π’" | Jun 2023 – Aug 2025 (2 years 3 months) | Москва
EdTech & IT Services

  • Optimized DWH architecture (PostgreSQL + ClickHouse), reducing aggregate report execution time from 15 minutes to 90 seconds (10x improvement)
  • Developed and maintained 15+ Airflow DAGs achieving 99.9% success rate with automated retry, error logging, and Telegram alerts
  • Built CI/CD pipeline on GitLab CI automating Docker builds, Kubernetes deployments, DB backups - eliminating 85% of manual operations
  • Implemented real-time CDC replication PostgreSQL β†’ Kafka via Debezium, improving data freshness 5x
  • Created Python ETL parsers with Pandas, improving raw data processing speed 3x while reducing RAM usage by 40%
  • Integrated Hugging Face LLM into data pipeline via FastAPI, reducing NLP request latency from 8 to 1.2 seconds and saving 30% GPU resources
  • Automated infrastructure monitoring with Prometheus + Grafana, reducing incident response time from 30 to 5 minutes, achieving 99.97% uptime
  • Mentored 7 interns in Data Architecture; 2 hired full-time, 2 received offers from other companies

πŸ“ž Contacts

Telegram Email VK


πŸ“Š GitHub Activity


GitHub Trophies

Snake

πŸ› οΈ Tech Stack

☁️ Cloud & Platforms

AWS GCP Yandex Cloud Kubernetes Docker

πŸ› οΈ DevOps & CI/CD

Terraform Ansible GitLab CI/CD GitHub Actions Jenkins Prometheus Grafana NGINX

πŸ—„οΈ Data Engineering & BI

Apache Airflow Apache Superset Qlik Sense Apache Spark Apache Kafka Debezium PostgreSQL ClickHouse Oracle Redis

πŸ“ Languages & Tools

Python SQL Bash Pandas

πŸ”Œ Backend & ML Integration

FastAPI Hugging Face Jupyter


🎯 Key Achievements

Performance Optimization

  • 10x faster aggregate reports (15min β†’ 90sec) through DWH optimization
  • 3-5x improvement in analytical query performance via ClickHouse optimization
  • 40% reduction in new report deployment time after BI migration

Reliability & Automation

  • 99.9% data pipeline success rate with robust Airflow DAGs
  • 85% reduction in manual operations through CI/CD automation
  • 99.97% infrastructure uptime with Prometheus/Grafana monitoring

Cost & Efficiency

  • Reduced licensing costs by migrating from Qlik Sense to Apache Superset
  • 30% GPU resource savings through LLM integration optimization
  • 40% RAM reduction in ETL processes via Python optimizations

πŸ“ˆ What I Bring to the Team

I build reliable, scalable, and observable data infrastructure that enables businesses to make faster decisions while reducing operational overhead. My solutions are measurable in speed, stability, and resource efficiency.

Open to interviews, technical challenges, and case discussions!

Popular repositories Loading

  1. ITMO_hackathon ITMO_hackathon Public

    Jupyter Notebook 1

  2. ProjectsInDotNet ProjectsInDotNet Public

    Smalltalk

  3. helmwave2 helmwave2 Public

    Forked from helmwave/helmwave

    🌊 Helmwave is the true release manager

    Go

  4. SCV_Git_2001 SCV_Git_2001 Public

    Forked from mischenkovn/SCV_Git_2001

  5. hw_2 hw_2 Public

    Forked from it-kvantum-pk/hw_2

    Python

  6. mlcourse.ai mlcourse.ai Public

    Forked from Yorko/mlcourse.ai

    Open Machine Learning Course

    Python