Qimia specializes in custom use-case implementation

As an international, dynamic, and top performant service company, we assist our customers in the realization of IT projects with next-generation technologies.

Our services range from small data science applications to enterprise-grade, scalable end-to-end solutions. We offer specialist and generalist data consulting, use-case analysis, tailor-made software products, conception, implementation, testing, production, and deployment of solutions. The standard information cycle includes Data Migration, Data Lake Formation, Data Warehousing and ETL, Analytics, and Machine Learning implementation for Predictive Analytics.

Products

Cost estimation and tuning with maximized ROI

Migration at scale from multiple data sources to your Cloud Data Lake

Dynamic Cloud Migration with standard use-case based architecture

ETL pipelines secure reliable data migration from existing data sources

Services

We effectively guide our customers through the risky and costly journey of enterprise data hub implementation.
Every phase of this journey requires different technologies, skills, and specialists.
Qimia is your reliable technology partner with well-founded expertise, that will lead your project to great success.
Qimia’s big data service model is designed to provide expertise and solutions needed for maximizing your ROI at any phase of a project.

Qimia Enterprise Data HubOff-the-shelf Big Data Platform:
Off-the-shelf Hadoop cluster and cloud data lake deployment.
Hadoop Data Hub: Apache Hadoop, Apache Spark, Apache Hive, HBase, Zeppelin.
Cloud Data Lake: Azure, AWS and GCP infrastructure-based data lake.

Big Data Consulting ServicesBest Practices: We have an in-depth overview and understand the details involved with deploying big data platforms. We help define your strategy and advise you on solution designs based on your business use-cases. Our experts design the architecture and help to establish the processes needed for the successful implementation of these.

Qimia Data Hub Operational ServiceQimia’s Data Hub Operational Service is a subscription service that manages customers’ enterprise data hub by our team of expert engineers. Monitoring and reporting, on-site architecture review, troubleshooting and incident management, capacity planning, multi-tenant and access management.

Standard Data SolutionsData processing solutions and off-the-shelf standard implementations for standard use-cases. Production-ready, tested and verified streaming and ETL pipelines secure reliable data migration from existing data sources (databases).

Company Data

With decades of experience, you will receive innovative, creative, and efficient execution.

5 Worldwide
Offices

Over 15 years of project experience

Certified Scrum
Masters

300+ Company Partners

Cloudera, Hadoop, AWS, Azure partner

No outsourcing to freelancers or subcontractors

80+ permanent Data Engineers and Data Scientists

Internal training & certification of employees

Realization of a variety of customized software solutions

Machine Intelligence, Big Data & AI consulting company

Project References

AWS Data Lake Project

Data Engineering

Industry: Consulting

Subject: AWS Cloud Data Lake Development; Cloud Big Data Engineering

Description: Developing and maintaining data lakes on AWS. Data migration from RDBMS and file sources, loading data into S3, Redshift, and RDS.
Designing and developing big data batch solutions using AWS Data Pipeline and AWS Glue and EMR. Developing a massive data warehouse using Redshift and Redshift Spectrum.

Project Task Summary
  • ETL workflows in Data Pipeline, monitoring and management of ETL pipelines
  • Batch RDBMS data migration using AWS DMS
  • Batch processing in EMR and Glue using Scala Spark
  • Designing and developing data warehouse on Redshift
  • DWH data model and table design
  • Accessing and processing big data on S3 via SQL using Redshift Spectrum
  • Python ML implementation with Pandas, scikit-learn using Jupyter on AWS
  • CI/CD development using Gitlab and Ansible
Technologies
  • AWS CLI
  • AWS Console
  • IAM
  • S3
  • AWS ECS
  • AWS Batch
  • AWS CLI
  • AWS Console
  • IAM
  • S3
  • AWS ECS
  • AWS Batch
  • AWS CLI
  • AWS Console
  • IAM
  • S3
  • AWS ECS
  • AWS Batch
  • AWS CLI
  • AWS Console
  • IAM
  • S3
  • AWS ECS
  • AWS Batch

AWS Data Lake Project

DevOps

Industry: Consulting

Subject: Cloud Data Lake DevOps; AWS DevOps

Description: Provisioning and deployment of big data solutions on AWS. Operationalize cloud data solutions, implementing infrastructure as code (IaC), using CloudFormation templates for resource management. Provisioning and deploying on-demand Redshift cluster and RDS instances using CloudFormation. Development, management, and deployment of Docker images and containers.

Project Task Summary
  • Provisioning resources using CloudFormation templates
  • Provisioning of Redshift, Data Pipeline, and Glue ETL pipelines
  • User account and access management in IAM
  • Develop Docker images for batch processing applications and Python, ML models, using AWS Container Registry (AWS ECR)
  • Docker container deployment using AWS ECS
  • CI/CD implementation using GitLab
Technologies
  • AWS CLI
  • AWS CloudWatch
  • AWS CloudFormation
  • AWS Glue
  • IAM
  • AWS Data
  • Pipeline
  • AWS ECS
  • Redshift
  • Python
  • SQL
  • AWS ECS
  • Redshift
  • Python
  • SQL
  • Docker
  • Dockerfile
  • CI/CD
  • Git
  • Gitlab
  • Scrum
  • Slack

Azure Data Lake Project

Data Engineering

Industry: Consulting

Subject: Azure Cloud Data Lake Development; Azure Big Data Engineering

Description: Data lake development on Microsoft Azure. Data migration from RDBMS and file sources, data loading into Azure Blob storage and Azure SQL. Design and development of big data batch solutions using Data Factory and Databricks. Massive data warehouse development using Azure SQL Data Warehouse.

Project Task Summary
  • Create ETL workflows in Data Factory with data factory ETL pipeline monitoring and management
  • Batch processing in Azure Databricks using Scala Spark
  • Data warehouse design and development using SQL Data Warehouse
  • DWH data model design, featuring index and partitioning table design
  • Accessing and processing big data in Blob storage via Transact-SQL using Polybase
  • CI/CD development using SBT and Gitlab
Technologies
  • Azure CLI
  • Azure Portal
  • Data Factory
  • Polybase
  • Azure SQL DWH
  • Databricks
  • Spark
  • Spark SQL
  • Scala
  • Python
  • SQL
  • Transact-SQL
  • GitLab
  • Scrum
  • SBT

Azure Data Lake Project

DevOps

Industry: Consulting

Subject: Cloud Data Lake DevOps; Azure DevOps

Description: Data Factory and Databricks provisioning and deployment. Operationalization of cloud data solutions and infrastructure as code (IaC) implementation using ARM templates and Azure Python SDK for resource management. Azure SQL data warehouse provisioning and deployment. CI/CD implementation using Azure DevOps tools. Development, management, and deployment of Docker images and containers.

Project Task Summary
  • Azure resources (VM and storage account, SQL DB and network) provisioning using Azure Python SDK and ARM template
  • SQL data warehouse provisioning with Databricks and Data Factory integration, using Python scripts and ARM templates, with Azure Key Vault for deployment
  • User account and role-based (RBAC) access management in Azure Active Directory
  • Docker image development for batch processing applications and ML model APIs, using Azure Container Registry for build, storage, and management of images
  • Azure container deployment on ACI (Azure Container Instances)
  • CI/CD implementation via Azure Repos, Azure Artifacts, Azure Pipelines, and Azure Test Plans
Technologies
  • AWS CLI
  • Azure Python SDK
  • Azure Portal
  • Azure Active Directory
  • Data Factory
  • Databricks
  • Python
  • SQL
  • ARM
  • Docker
  • Dockerfile
  • Azure Container Registry
  • CI/CD
  • Azure Repos
  • Azure Artifacts
  • Azure Pipelines
  • Azure Test Plans
  • Git
  • SBT

ML/AI Projects

NLP Project

Industry: AI/Automatization

Subject: Implementation of supervised Machine Learning Algorithm for automatic keyphrase extraction.

Description: Implementation of automated Context Tagger for a B2B Marketing automated AI solution. Text classification models are implemented in Python using Python Text Mining, NLP and other ML and data analysis libraries (Python Data Science and ML stack). Text mining, data processing, and feature engineering of a massive dataset in Spark.

Project Task Summary
  • Design and implementation of a very fast multi-threaded AKKA-based stream (SAX/Stax) processing of XML data for transforming huge XML data to CSV format.
  • Preprocessing of the data by filtering normalizing text content and applying Spacy and NLTK
  • Text mining and data preprocessing in Spark SQL Scala on Hadoop and S3
  • Training of embedding and language models using fastText, Gensim, and GPT-2
  • Modeling term to tag relations in massive graph networks in Tigergraph
  • Keyphrase extraction (automatic tagging) using N-Grams, Word2vec scoring and PageRank algorithm on massive graphs of tag-to-tag relations
  • FP-Growth association rules learning
  • Distributed CNN training in Docker containers on AWS using GPU instances
Technologies
  • Python
  • Scala
  • Apache Spark
  • Spark SQL
  • Tigergraph
  • Apache Hadoop
  • Apache Hadoop
  • Spacy
  • XML
  • AKKA
  • Word2vec
  • GPT-2
  • Pandas
  • NumPy
  • SciPy
  • Scikit-learn
  • Keras
  • PyTorch
  • CNN
  • Docker
  • Kubernetes
  • FastText
  • Gensim
  • Classification
  • NLTK
  • Jupyter
  • Parquet
  • Apache Arrow
  • IntelliJ
  • SBT
  • Nvidia-docker
  • Linux
  • Git
  • GitLab

ML/AI Projects

Recommendation Prediction Model

Industry: Consulting

Subject: Cloud Data Lake DevOps; AWS DevOps

Description: User tracking data is used in training ML models for user-profiling, recommendation, and prediction. RNN and CNN models are developed and trained for enrichment of user-profiles. Classification GBM (Gradient Boosting Machine) on extracted and learned features. Workflow implementation for data engineering and continuous model training implementation in Airflow.

Project Task Summary
  • Feature engineering using Spark SQL by joining and aggregating user tracking data
  • Keras and TensorFlow implementation for training RNN and CNN models
  • Using Spark ML for training gradient boosting classifiers
  • Cross-validation, F1-score evaluation, hyperparameter optimization
  • Containerized Spark standalone cluster, using Docker Compose for local
  • Deployment and AWS container services
Technologies
  • AWS CLI
  • AWS Console
  • IAM
  • S3
  • AWS ECS
  • AWS Batch
  • Spark Standalone Cluster
  • Docker
  • Docker Compose Keras
  • TensorFlow
  • Pandas
  • Python
  • PySpark
  • Scala
  • Supervised Learning
  • Classification
  • GBM
  • Neural Networks
  • RNN
  • CNN
  • IntelliJ
  • SBT
  • Linux
  • Git
  • GitLab

Customers

Ready to get started?

Germany

contact@qimia.de

+49 221 5796 7940

USA

contact@qimia.io

+1 (858) 286 61 10

Turkey

contact@qimia.io

+90 542 412 02 31