Internship: Comparative Analysis of Streaming Frameworks: Apache Flink vs. Apache Spark in Real-Time Data Processing Neu

in Pully
Praktikum nicht angegeben Student
  • Job Identification: 2295
  • Posting Date: 21.02.2025
  • Job Schedule: Full time
  • Company: ELCA Informatique SA

About Us

We are ELCA, one of the largest Swiss IT tribe with over 2,200 experts. We are multicultural with offices in Switzerland, Spain, France, Vietnam and Mauritius. Since 1968, our team of engineers, business analysts, software architects, designers and consultants provide tailor-made and standardized solutions to support the digital transformation of major public administrations and private companies in Switzerland. Our activity spans across multiples fields of leading-edge technologies such as AI, Machine & Deep learning, BI/BD, RPA, Blockchain, IoT and CyberSecurity.

Job Description

Description

The internship focuses on exploring and comparing Apache Flink and Apache Spark for near real-time data processing. The intern will gain hands-on experience in building data pipelines using Apache Kafka as an event broker and process the data with Flink and Spark. The project involves implementing use cases with real-world scenarios, such as near real-time inference (typically requiring to integrate statistics/features over time), complex event processing, or near real-time legacy system offloading with stateful processing (e.g. consolidating aggregates out of change data capture events).

The final deliverable will include benchmarks and recommendations for using Flink and Spark in various scenarios.

Objectives

  • Understand the fundamental concepts of Kafka, Flink, and Spark, including their architecture and use cases.
  • Implement a pipeline to process streaming data from a single source using Kafka and Flink/Spark, going as far as possible with insights and optimizations, considering dimensions like memory, latency, throughput.
  • Build a second pipeline with a more complex setup:
  • Database → Debezium → Kafka → Flink/Spark → Business events consolidation.
  • Consider transactional integrity vs latency; processing of bigger tables with memory limits; configuration changes, partitioning in particular, in a productive environment.
  • Compare Flink and Spark based on performance, ease of use, and suitability for specific use cases.
  • Document findings and propose guidelines for choosing between the two frameworks.

Our offer

  • A dynamic work and collaborative environment with a highly motivated multi-cultural and international sites team
  • Various internal coding events (Hackathon, Brownbags), see our technical blog
  • Monthly After-Works organized per locations

Skills required

Core Skills:

  • Basics of data engineering and distributed systems.
  • Knowledge of SQL and database concepts (e.g., relational databases, transactions).
  • Understanding of streaming concepts and data pipelines

Technical Skills:

  • Familiarity with Docker and containerized environments.
  • Knowledge of Kafka and concepts like producers, consumers, topics, and partitions.
  • Programming skills in Python, Java or Scala.
  • Understanding of event-driven architectures.
  • Exposure to cloud platforms (e.g., AWS, Azure, or GCP) is an advantage.

Other Skills:

  • Analytical thinking and problem-solving skills.
  • Ability to learn new tools and technologies quickly.
  • Interest in benchmarking and performance evaluation.
Sprich uns an! Unser Recruiting Team freut sich darauf, Dich kennenzulernen! Kontakt
Am 22.02.2025 veröffentlicht. Originalanzeige