Marina Zaporozhets mazavlia

Hello, I'm Marina Zaporozhets!

👋 About Me

Data Engineer | DevOps Engineer | BI Systems Architect

With over 2.5 years of experience in designing and implementing end-to-end data solutions, I specialize in building scalable ETL/ELT pipelines, optimizing analytical databases (ClickHouse, PostgreSQL), and migrating BI ecosystems. My expertise lies at the intersection of data engineering, infrastructure automation, and business intelligence.

🎓 Currently a 2nd year Master's student at Tyumen Industrial University, majoring in "Neural Network Technologies in Automated Control Systems".

💼 Professional Experience

Data Engineer / BI Systems Developer

Ретэйл ИТ | Oct 2025 – Present (4 months) | Екатеринбург
Retail & Logistics

Architected end-to-end ETL/ELT pipelines integrating data from Oracle DWH, Excel/CSV files into ClickHouse, centralizing data and eliminating manual exports
Reduced report preparation time from several hours to minutes by implementing automated data pipelines
Optimized ClickHouse database structure with storage engines (MergeTree, ReplacingMergeTree), partitioning, and projections, improving query performance 3-5x
Led BI migration from Qlik Sense to Apache Superset, maintaining business logic while reducing licensing costs and accelerating new report deployment by 40%
Built Airflow orchestration system from scratch in docker-compose, implementing DAGs with incremental loading, retry logic, and error handling, achieving 99.9% data delivery reliability
Developed complex analytical SQL queries using window functions, CTE, and self-JOINs for ClickHouse, forming the foundation for real-time KPI dashboards

Data Engineer

ООО "1Т" | Jun 2023 – Aug 2025 (2 years 3 months) | Москва
EdTech & IT Services

Optimized DWH architecture (PostgreSQL + ClickHouse), reducing aggregate report execution time from 15 minutes to 90 seconds (10x improvement)
Developed and maintained 15+ Airflow DAGs achieving 99.9% success rate with automated retry, error logging, and Telegram alerts
Built CI/CD pipeline on GitLab CI automating Docker builds, Kubernetes deployments, DB backups - eliminating 85% of manual operations
Implemented real-time CDC replication PostgreSQL → Kafka via Debezium, improving data freshness 5x
Created Python ETL parsers with Pandas, improving raw data processing speed 3x while reducing RAM usage by 40%
Integrated Hugging Face LLM into data pipeline via FastAPI, reducing NLP request latency from 8 to 1.2 seconds and saving 30% GPU resources
Automated infrastructure monitoring with Prometheus + Grafana, reducing incident response time from 30 to 5 minutes, achieving 99.97% uptime
Mentored 7 interns in Data Architecture; 2 hired full-time, 2 received offers from other companies

📞 Contacts

📊 GitHub Activity

🛠️ Tech Stack

☁️ Cloud & Platforms

🛠️ DevOps & CI/CD

🗄️ Data Engineering & BI

📝 Languages & Tools

🔌 Backend & ML Integration

🎯 Key Achievements

Performance Optimization

10x faster aggregate reports (15min → 90sec) through DWH optimization
3-5x improvement in analytical query performance via ClickHouse optimization
40% reduction in new report deployment time after BI migration

Reliability & Automation

99.9% data pipeline success rate with robust Airflow DAGs
85% reduction in manual operations through CI/CD automation
99.97% infrastructure uptime with Prometheus/Grafana monitoring

Cost & Efficiency

Reduced licensing costs by migrating from Qlik Sense to Apache Superset
30% GPU resource savings through LLM integration optimization
40% RAM reduction in ETL processes via Python optimizations

📈 What I Bring to the Team

I build reliable, scalable, and observable data infrastructure that enables businesses to make faster decisions while reducing operational overhead. My solutions are measurable in speed, stability, and resource efficiency.

Open to interviews, technical challenges, and case discussions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly