Inicio > > Bases de datos > Diseño y teoría de bases de datos > Data Engineering with Scala and Spark
Data Engineering with Scala and Spark

Data Engineering with Scala and Spark

David Radford / Eric Tome / Rupam Bhattacharjee

51,71 €
IVA incluido
Disponible
Editorial:
Packt Publishing
Año de edición:
2024
Materia
Diseño y teoría de bases de datos
ISBN:
9781804612583
51,71 €
IVA incluido
Disponible
Añadir a favoritos

Take your data engineering skills to the next level by learning how to utilize Scala and functional programming to create continuous and scheduled pipelines that ingest, transform, and aggregate dataKey Features- Transform data into a clean and trusted source of information for your organization using Scala- Build streaming and batch-processing pipelines with step-by-step explanations- Implement and orchestrate your pipelines by following CI/CD best practices and test-driven development (TDD)- Purchase of the print or Kindle book includes a free PDF eBookBook DescriptionMost data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount.This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You’ll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You’ll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users.By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.What you will learn- Set up your development environment to build pipelines in Scala- Get to grips with polymorphic functions, type parameterization, and Scala implicits- Use Spark DataFrames, Datasets, and Spark SQL with Scala- Read and write data to object stores- Profile and clean your data using Deequ- Performance tune your data pipelines using ScalaWho this book is forThis book is for data engineers who have experience in working with data and want to understand how to transform raw data into a clean, trusted, and valuable source of information for their organization using Scala and the latest cloud technologies.Table of Contents- Scala Essentials for Data Engineers- Environment Setup- An Introduction to Apache Spark and Its APIs - DataFrame, Dataset, and Spark SQL- Working with Databases- Object Stores and Data Lakes- Understanding Data Transformation- Data Profiling and Data Quality- Test-Driven Development, Code Health, and Maintainability- CI/CD with GitHub- Data Pipeline Orchestration- Performance Tuning- Building Batch Pipelines Using Spark and Scala- Building Streaming Pipelines Using Spark and Scala

Artículos relacionados

  • Hands-On Machine Learning on Google Cloud Platform
    Alexis Perrier / Giuseppe Ciaburro / Kishore Ayyadevara
    Enhance your understanding of Computer Vision and image processing by developing real-world projects in OpenCV 3Key FeaturesGet to grips with the basics of Computer Vision and image processingThis is a step-by-step guide to developing several real-world Computer Vision projects using OpenCV 3This book takes a special focus on working with Tesseract OCR, a free, open-source libr...
    Disponible

    67,00 €

  • MLOps with Red Hat OpenShift
    Faisal Masood / Ross Brigoli
    Build and manage MLOps pipelines with this practical guide to using Red Hat OpenShift Data Science, unleashing the power of machine learning workflowsKey FeaturesGrasp MLOps and machine learning project lifecycle through concept introductionsGet hands on with provisioning and configuring Red Hat OpenShift Data ScienceExplore model training, deployment, and MLOps pipeline buildi...
    Disponible

    61,48 €

  • Data Labeling in Machine Learning with Python
    Vijaya Kumar Suda
    Take your data preparation, machine learning, and GenAI skills to the next level by learning a range of Python algorithms and tools for data labelingKey FeaturesGenerate labels for regression in scenarios with limited training dataApply generative AI and large language models (LLMs) to explore and label text dataLeverage Python libraries for image, video, and audio data analysi...
    Disponible

    83,55 €

  • Machine Learning Infrastructure and Best Practices for Software Engineers
    Miroslaw Staron
    Efficiently transform your initial designs into big systems by learning the foundations of infrastructure, algorithms, and ethical considerations for modern software productsKey FeaturesLearn how to scale-up your machine learning software to a professional levelSecure the quality of your machine learning pipeline at runtimeApply your knowledge to natural languages, programming ...
    Disponible

    62,67 €

  • Database Design and Modeling with Google Cloud
    Abirami Sukumaran
    Build faster and efficient real-world applications on the cloud with a fitting database model that’s perfect for your needsKey FeaturesFamiliarize yourself with business and technical considerations involved in modeling the right databaseTake your data to applications, analytics, and AI with real-world examplesLearn how to code, build, and deploy end-to-end solutions with exper...
    Disponible

    48,37 €

  • Data Stewardship in Action
    Pui Shing Lee
    Take your organization’s data maturity to the next level by operationalizing data governanceKey FeaturesDevelop the mindset and skills essential for successful data stewardshipApply practical advice and industry best practices, spanning data governance, quality management, and compliance, to enhance data stewardshipFollow a step-by-step program to develop a data operating model...
    Disponible

    68,38 €