ETL Wiki

In this Wiki

Data integration is one of the oldest disciplines in computer science, with a long history of technologies, tools and approaches. Since the early days of CORBA, RPC and monolithic Enterprise Data Integration platforms, we’ve progressed to leaner, flexible architectures such as SOA, ESB and JDBC, through to today’s cloud-based iPaaS platforms which promise to connect and migrate massive quantities of data in minutes.

We all need to learn from this rich history of knowledge, and keep up with new technologies which are in constant flux. At Alooma, we built a world-class platform that makes data integration easier, and in a similar vein, we wanted to make the massive knowledge around data integration more easily accessible.

So we created this website - a wiki that pulls together all the concepts, technologies and best practices on data integration from around the world. It’s a carefully curated directory of thousands of resources written by individuals around the world, which we plan to make the world’s biggest source of knowledge on data integration.


Integration Wiki

Key Topics

Data Integration Use Cases

Resources on primary uses and pain data integration solves in modern enterprises, including big data integration, EAI, MDI, hybrid cloud data migration, and virtualization.

Data Architecture and Infrastructure

Resources on all aspects of aarchitecting and managing organizational data, including data modelling, schema mapping, data transformation, SOA, ESB, edh, messaging, CEP, etl , and data ingestion.

Data Integration Tools

Resources on technologies that help store, manage and integrate different data sources, including iPaaS like Ensighten, Alooma and Zapier, on-premise integration tools like Talend, Pentaho and Mule, and stream processing tools such as Kafka.

Extract Transform Load (ETL)

Resources on Extract Transform Load (ETL) practices, including ETL architecture and key concepts, best practices, ETL testing, performance, training and certifications.

Data Warehouse

Resources about data warehouse technology, used to store large amounts of data to facilitate processing, analysis and visualization. Including:

  • Data warehousing best practices, concepts and architecture
  • Data warehousing training and certifications
  • Enterprise data architecture - OLAP and OLTP
  • Data warehouse tools and services - Amazon Redshift, Google BigQuery, Snowflake, IBM PureData, SAP Business Warehouse, Oracle Exadata, and many more
  • Using databases for data warehousing - storing large volumes of data using MemSQL, Oracl, MongoDB, and MySQL

Data Roles and Responsibilities

Resources about key roles in managing and making use of organizational data, including skills and job descriptions, job listings, salaries, training and certifications:

  • Data Engineer
  • Data Scientist and Big Data Analyst
  • BI Developer
  • Chief Data Officer (CDO)

Further Reading

  • Extract Transform Load - ETLResources on Extract Transform Load (ETL), a practice for migrating data between data sources, while making changes to the data, typically on a large scale.
    • The ETL ProcessResources about the Extract, Transform, Load (ETL) process, a process in data warehousing that covers how data is loaded from the source system to the data warehouse.
    • ETL Architecture and ConceptsResources including best practices, overviews and examples of basic ETL concepts and ETL architectures.
    • ETL Best PracticesResources including best practices for successful planning, design and implementation of the ETL process .
    • ETL TestingResources about ETL testing, the process of verifying whether or not the data maintains its integrity and accuracy after being extracted, transformed, and loaded from the source to the data warehouse.
    • ETL PerformanceResources about how to optimize and improve the performance of the ETL process on various platforms, including best practices, performance tuning tips, and more.
    • ETL Training and CertificationResources about ETL training and certification. Formal training in extract, transform, load processes and testing ensures that people performing this critical operation have the necessary skills.
    • ETL ToolsResources about ETL tools, applications designed to make the extract, transform, load process less labor intensive and improve data quality.
  • Data Integration ToolsComing Soon: Resources on tools and technologies that help organizations store, manage and integrate different data sources.
  • Data Architecture and InfrastructureResources about data architecture and data infrastructure. Combining the data environment with the data architecture and infrastructure concepts is key to creating systems where users can easily access the data they need.
    • Data Architecture, Data Modeling and Data TransformationResources about data architecture, data modeling and data transformation. Combining the data environment with the data architecture, modeling and transformation concepts is key to creating systems where users can easily access the data they need.
    • Data LakeResources about data lakes, which hold large amounts of raw data in its native format, using a flat architecture to store the data.
    • Data MartResources about the data mart, which is the read only access layer of the data warehouse environment that provides data to the user. Data marts are usually oriented to a specific business line or team.
    • Integration Platform as a Service - iPaaSResources about Integration Platform as a Service (iPaaS). iPaaS is a set of cloud-based tools which allow software engineers to deploy, manage and integrate applications and services.
    • Data Stream ProcessingResources about data stream processing, the real time processing of data continuously, concurrently and in a record-by-record fashion. Data stream processing treats data as a continuous stream of data integrated from both live and historical sources.
    • Database ReplicationResources about database replication, the frequent electronic copying of data in one physical location to a database in another location. This increases data availability, redundancy and disaster recovery.
    • SOA IntegrationResources about integrating Service-Oriented-Architecture (SOA) applications and services. SOA is a style of software design where services are provided to the other components by application components, through a communication protocol over a network.
    • API Data IntegrationResources about using APIs for data integration and including third-party APIs for the data integration process.
    • Enterprise Service Bus - ESBResources about Enterprise Service Bus (ESB) systems. ESBs implement a communication system between mutually interacting software applications in a service-oriented architecture (SOA).
    • Enterprise Data Hub - EDHResources about Enterprise Data Hub (EDH), a big data management model that uses a Hadoop platform as the central data repository and aims to provide an organization with a centralized, unified data source.
    • Enterprise Messaging PatternsResources about enterprise messaging patterns, types of integration patterns used for messaging. Messaging patterns form the basis of most integration patterns.
    • Enterprise Messaging ComponentsResources about enterprise messaging components such as messaging adapters, messaging bridges, and more.
  • Data Integration Use CasesResources on primary uses and pain points data integration solves in modern enterprises.
    • Big Data IntegrationResources about big data integration. Managing and integration terrabytes or petabytes of data efficiently and cost effectively and in a way accessible to users is a large data challenge today.
    • Enterprise Application Integration - EAIResources about enterprise application integration (EAI), which allows intercommunication between enterprise systems for automation and data sharing. EAI provides the communication framework without applying excessive application or data structure changes.
    • Customer Data IntegrationResources on customer data integration, the process of collecting, organizing and distributing all available customer data throughout an organization.
    • Marketing Data Integration - MDIResources about Marketing Data Integration (MDI), the collecting and organizing of marketing data from disparate sources to provide users with efficient data retrieval.
    • Data MigrationResources about data migration, the process of transferring data between systems, computers or formats. System implementations and upgrades rely on data migration.
    • Edge Data IntegrationResources about edge data integration, also named tactical integration or point-to-point integration. Edge data integration involves integrating data on an ad hoc or tactical basis, for example as a response to a problem.
    • Hybrid Cloud IntegrationResources on hybrid cloud integration, the integration of data between the cloud, on-premise and disparate clouds from vendors or customers.
    • IoT and Mobile Systems IntegrationResources about internet of things and mobile systems integration, which discuss data integration of on-premise data with mobile systems or IoT data.
    • Data VirtualizationResources about data virtualization, which is any approach to data management that allows any application to retrieve data without knowing the technical details about that data.
    • Master Data Management - MDMResources about master data management (MDM), the management of specific key data assets in an organization. MDM uses infrastructure and software to focus on higher level data elements, like broad classifications of customers or assets.
    • Legacy Systems ModernizationResources about legacy systems modernization, which as technology continues to change rapidly, is now a recurring process not a single event. Keeping the data in legacy systems available is critical to data management.
    • CRM IntegrationResources about CRM integration, the process of integrating a CRM to function seamlessly with other systems, processes, and data.
    • ERP IntegrationResources about ERP Integration, integrating the data from an Enterprise Resource Planning system with other business data to achieve a more holistic view of organizational data.
    • Machine Learning and AIResources on data integration using machine learning and artificial intelligence (AI), how cognitive computing can improve and simplify data integration.
    • Data IngestionResources about data ingestion, how data is brought into other systems such as data warehouses, lakes, integrating databases, integrating CRMs.
  • Data WarehouseResources about data warehouse technology, used to store large amounts of data to facilitate processing, analysis and visualization.
    • Database vs Data WarehouseResources about the differences between databases and data warehouses. Each serves a different function within an organization: databases are usually used for operational data, while data warehouses are for analytics.
    • Data Warehousing Best PracticesResources about data warehousing best practices, methods to optimize a data warehouse and improve stability and data access.
    • Data Warehousing ConceptsResources about data warehousing concepts, the building blocks of a data warehouse and its operational fundamentals. Understanding data warehouse architecture and concepts is the first step to creating a data warehouse.
    • Data Warehouse ArchitectureResources about data warehouse architecture, the structure and components that form the data warehouse. A well planned architecture creates a well-performing, high value data warehouse. This is a key step in the process of data warehousing.
    • Amazon RedshiftResources about Amazon Redshift, an internet hosting service and data warehouse product which is part of the Amazon Web Services cloud computing platform.
    • Google Big QueryResources about Google Big Query, a cloud-based big data analytics web service designed to analyze very large read-only datasets.
    • Snowflake ComputingResources about Snowflake Computing, a US company that provides cloud based data warehouses as a service, offering another option for small to medium companies.
    • Enterprise Data ArchitectureResources about enterprise data architecture, protocols and methods to analyze and process data across the enterprise. The large amount of data involved require efficient and speedy processing, using tools designed for that purpose.
    • Other Database and Warehouse Tools and ServicesResources about other database and data warehouse tools and services, additional products, tools and services to improve data management.
    • Oracle Databases as Data WarehouseResources about data warehousing features in Oracle databases.
  • Data Roles and ResponsibilitiesResources about key roles in managing and making use of organizational data, including data engineer, data scientist and data analyst.
    • Data EngineerThe Data Engineer is responsible for the maintenance, improvement, cleaning, and manipulation of data in the business’s operational and analytics databases. This page gathers resources about the differences between a data engineer and a data scientists and guidelines on how to become a data engineer.
    • Data ScientistMost data scientists have advanced degrees and training in math, statistics and/or computer science. Most likely they have experience in data mining, data visualization and/or information management. Put simply, data scientists apply powerful tools and advanced statistical modeling techniques to make discoveries about business problems, processes and platforms.
    • Big Data AnalystA big data analyst is an individual that reviews, analyzes and reports on big data stored and maintained by an organization. Big data analysts have a similar job description and skill set as that of data analysts, but they specialize in the analysis of big data or big data analytics. This page gathers resources about how to become a big data analyst and the difference between data analyst and data scientist.

Easily Move Data Into Your Data Warehouse

Get Started with Alooma

  • No labels