DWBI Architecture with Data quality implemented
Once the data is loaded into stage tables, ETL will trigger Data quality check process on the stage table data. For each stage table, Pentaho Data Integration ETL picks up the DQ Rules (Query) defined in the DQ meta data table and executes to identify which all are the records violating the rules. Accordingly, this process will mark/update those bad quality records with the respective DQ rule IDs. Single record can be failed due to multiple rules and those DQ rule IDs will be updated against those records.
Once the DQ process is completed, there is another ETL/Process that picks up all the bad quality records from Stage table and moves to Reject tables. These records can be identified by applying a filter “.DQ_Rule_ID is NOT NULL”. Same way another ETL/process that picks up good quality data from Stage to Data Quality stage and from there to Data Store and DW/Data Marts.
As an illustration of implementation, following section details how this can be achieved using Pentaho Data Integration (PDI).