# Soda Post-ingestion Checks
Use Soda to run a scan that checks the completeness the data, ensuring data is not missing or duplicated.

## Configure connections to the data source and Soda Cloud
For sensitive credential values, this example gets the values from a linked Azure Key Vault.

In [10]:
from notebookutils import mssparkutils

config_str = f"""
data_source azure_sql_data:
  type: sqlserver
  driver: ODBC Driver 18 for SQL Server
  host: soda.sql.azuresynapse.net
  port: xxxx
  username: my_sql_user
  password: {mssparkutils.credentials.getSecret('soda-vault' , 'sql-pw')}
  database: soda_sqlserver
  schema: soda_demo_data_testing
soda_cloud:
  host: cloud.us.soda.io
  api_key_id: {mssparkutils.credentials.getSecret('soda-vault' , 'soda-api-key-id')}
  api_key_secret: {mssparkutils.credentials.getSecret('soda-vault' , 'soda-api-key-secret')}
"""

StatementMeta(tasoda, 30, 2, Finished, Available, Finished)

## Define data quality checks using Soda Checks Language (SodaCL)
This section defines checks that test the completeness of the data after it has been ingested.

In [11]:
check_str = """checks for retail_customers:
- missing_percent(customer_id):
    name: check completeness of customer_id
    fail: when > 5%
    attributes:
        data_quality_dimension: [Completeness]
        pipeline: ADF_pipeline_demo
        pipeline_stage: Ingest
        data_domain: Sales
- duplicate_percent(customer_id):
    name: check uniqueness of customer_id
    fail: when > 5%
    attributes:
        data_quality_dimension: [Uniqueness]
        pipeline: ADF_pipeline_demo
        pipeline_stage: Ingest
        data_domain: Sales
- missing_percent(country_code):
    name: check completeness of country_code
    fail: when > 5%
    attributes:
        data_quality_dimension: [Completeness]
        pipeline: ADF_pipeline_demo
        pipeline_stage: Ingest
        data_domain: Sales
"""

StatementMeta(tasoda, 30, 3, Finished, Available, Finished)

## Run the Soda scan

If `scan.assert_no_checks_fail()` returns an `AssertionError` due to failed checks, then the Azure Data Factory pipeline in which this notebook resides halts.

In [12]:
from soda.scan import Scan
scan = Scan()
scan.set_verbose(True)
scan.set_data_source_name('azure_sql_data')
scan.add_configuration_yaml_str(config_str)
scan.set_scan_definition_name('retail_customers_scan')
scan.add_sodacl_yaml_str(check_str)
scan.execute()
scan.assert_no_checks_fail()

StatementMeta(tasoda, 30, 4, Finished, Available, Finished)