An introduction to Data Quality Services
The use case for DQS is simple, a quick way to audit and correct data through an abstraction layer. This abstraction layer can be managed by someone other than a DBA or developer and help improve your database’s data quality and over all employee productivity.
First step is to install the DQS client, you can search for data quality server installer on your machine running your SQL instance.
Once you follow the prompts and install you should to able to access to DQS client.
Lets take a simple set of data as follows :
ID | Fname | Lname |
---|---|---|
1 | Ian | Fogelman |
2 | Bill | Gates |
3 | Jim | Brown |
4 | D | Smith |
5 | J | Green |
Suppose that we want to enforce a business rule that each first name must be 3 characters.
We will solve this using a DQS approach.
First step is to create a knowledge base
Create New Domain
Input a name and a description for your domain
Next click domain rules and enter Length3 for name and add a rule of length equal to 3
Click publish
Now go to data quality projects, New Data Quality Project.
Name the project and select the knowledge base created, for this project we will be cleansing.
On the next screen match each field name with a domain, in this case we only have first name, click next.
Click Start
After the task is complete you can see how many rows of data passed and how many were invalid.
On the following screen we can see which records failed, because the length of the first name field was not 3.
Which we originally stated in the criteria of the domain. You can approve and reject.
On the last screen we can Export our results to a table in the database by defining the table name and clicking export.
Once clicking finish , we can view the data in the Output table in the database we specificied to see the full results
DQS can be used in combination with other technology such as SSIS/PowerBi/SSRS and the process of reporting data issues or automatically fixing them becomes much more interesting!