Case Studies

AWS Data Lake Project

Data Engineering
Data Engineering title

Data Engineering

Data Engineering industry

Industry: Consulting

Data Engineering subject

Subject: AWS Cloud Data Lake Development; Cloud Big Data Engineering

[object Object] subject

Description:

Developing and maintaining data lakes on AWS. Data migration from RDBMS and file sources, loading data into S3, Redshift, and RDS. Designing and developing big data batch solutions using AWS Data Pipeline and AWS Glue and EMR. Developing a massive data warehouse using Redshift and Redshift Spectrum.

indicator

Project Task Summary

  • ETL workflows in Data Pipeline, monitoring and management of ETL pipelines .

  • Batch RDBMS data migration using AWS DMS .

  • Batch processing in EMR and Glue using Scala Spark.

  • Designing and developing data warehouse on Redshift.

  • DWH data model and table design .

  • Accessing and processing big data on S3 via SQL using Redshift Spectrum.

  • Python ML implementation with Pandas, scikit-learn using Jupyter on AWS.

  • CI/CD development using Gitlab and Ansible.

indicator

Technologies

  • AWS CLI
  • AWS CloudWatch
  • AWS Data Pipeline
  • AWS DMS
  • Ansible
  • AWS Glue
  • Docker
  • GitLab
  • IAM
  • Jupyter
  • Pandas
  • Python
  • RDS PostgreSQL EMR
  • Redshift
  • Redshift Spectrum
  • S3
  • SBT
  • Scala
  • Scikit-learn
  • Scrum
  • Spark
  • Spark SQL
  • SQL

AWS Data Lake Project

DevOps
DevOps title

DevOps

DevOps industry

Industry: Consulting

DevOps subject

Subject: Cloud Data Lake DevOps; AWS DevOps

[object Object] subject

Description:

Provisioning and deployment of big data solutions on AWS. Operationalize cloud data solutions, implementing infrastructure as code (IaC), using CloudFormation templates for resource management. Provisioning and deploying on-demand Redshift cluster and RDS instances using CloudFormation. Development, management, and deployment of Docker images and containers.

indicator

Project Task Summary

  • Provisioning resources using CloudFormation templates .

  • Provisioning of Redshift, Data Pipeline, and Glue ETL pipelines .

  • User account and access management in IAM.

  • Develop Docker images for batch processing applications and Python, ML models, using AWS Container Registry (AWS ECR) .

  • Docker container deployment using AWS ECS .

  • CI/CD implementation using GitLab.

indicator

Technologies

  • AWS CLI
  • AWS ECR
  • Dockerfile
  • AWS CloudWatch
  • AWS ECS
  • CI/CD
  • AWS CloudFormation
  • Redshift
  • Git
  • AWS Glue
  • Python
  • GitLab
  • IAM
  • SQL
  • Scrum
  • AWS Data Pipeline
  • Docker
  • Slack