Sunday, August 18, 2024

Different ways to delete duplicate records in SQL Server

 Deleting duplicate records in SQL Server can be done using several different approaches depending on the specific scenario and table structure. Here are a few methods:

Choosing the Right Method

  • ROW_NUMBER(): Preferred for larger datasets and complex deduplication logic.
  • GROUP BY: Simple but can be less efficient for very large datasets.
  • JOIN: Useful when working with smaller datasets or where performance isn’t critical.
  • Temporary Table: Best for smaller tables or where you need to reinsert the data.
  • EXISTS: Suitable when needing a more readable query but can be less performant.

These methods allow flexibility depending on the specific requirements of the deduplication task.

1. Using ROW_NUMBER() with Common Table Expressions (CTE)

The most common and efficient way is to use the ROW_NUMBER() function to identify duplicates and then delete them.

WITH CTE AS (
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY column1, column2, column3 ORDER BY (SELECT NULL)) AS rn
    FROM 
        your_table
)
DELETE FROM CTE WHERE rn > 1;
  • Explanation: This method assigns a unique row number to each record within a group of duplicates based on the specified columns (column1, column2, column3). All rows except the first in each group are then deleted.

2. Using GROUP BY with HAVING and a Subquery

This method uses GROUP BY to find duplicates and then deletes them using a DELETE statement with a subquery.

DELETE FROM your_table
WHERE ID NOT IN (
    SELECT MIN(ID)
    FROM your_table
    GROUP BY column1, column2, column3
);
  • Explanation: This method keeps the row with the minimum ID (or any unique column) for each group of duplicates and deletes the others.

3. Using INNER JOIN

You can delete duplicates by joining the table on itself.

DELETE T1
FROM your_table T1
INNER JOIN your_table T2
ON 
    T1.column1 = T2.column1 AND 
    T1.column2 = T2.column2 AND 
    T1.column3 = T2.column3 AND 
    T1.ID > T2.ID;
  • Explanation: This deletes rows from T1 that have duplicates in T2 based on the specified columns but with a higher ID.

4. Using Temporary Table

You can insert distinct records into a temporary table, truncate the original table, and reinsert the distinct records.

SELECT DISTINCT * INTO #temp_table FROM your_table;

TRUNCATE TABLE your_table;

INSERT INTO your_table SELECT * FROM #temp_table;

DROP TABLE #temp_table;
  • Explanation: This method creates a temporary table containing only distinct records, then clears the original table and repopulates it with the unique records.

5. Using DELETE with EXISTS

This method uses the EXISTS clause to find and delete duplicates.

DELETE FROM your_table
WHERE EXISTS (
    SELECT 1 
    FROM your_table T2
    WHERE 
        your_table.column1 = T2.column1 AND 
        your_table.column2 = T2.column2 AND 
        your_table.column3 = T2.column3 AND 
        your_table.ID > T2.ID
);
  • Explanation: This method deletes records that have duplicates with a lower ID value.

Saturday, August 17, 2024

Apply Operator in SQL Server

 In SQL Server, the APPLY operator is used to invoke a table-valued function for each row returned by an outer query. The APPLY operator comes in two forms: CROSS APPLY and OUTER APPLY.

[Click on the link below for detail Q & A]

1. CROSS APPLY

  • CROSS APPLY returns only those rows from the outer table that produce a result set from the table-valued function. If the function returns no result, the corresponding row from the outer table is not included in the final result set.

2. OUTER APPLY

  • OUTER APPLY returns all rows from the outer table, regardless of whether the table-valued function returns a result for that row. If the function returns no result for a row, NULL values are returned for the columns from the table-valued function.

Example Usage

Let's say you have two tables:

  1. Employees:

    • EmployeeID
    • EmployeeName
  2. Orders:

    • OrderID
    • EmployeeID
    • OrderDate

And a table-valued function GetRecentOrder which takes an EmployeeID as a parameter and returns the most recent order for that employee.

Using CROSS APPLY:

SELECT e.EmployeeID, e.EmployeeName, o.OrderID, o.OrderDate FROM Employees e CROSS APPLY dbo.GetRecentOrder(e.EmployeeID) o;
  • This query will return only employees who have placed at least one order, showing their most recent order.

Using OUTER APPLY:

SELECT e.EmployeeID, e.EmployeeName, o.OrderID, o.OrderDate FROM Employees e OUTER APPLY dbo.GetRecentOrder(e.EmployeeID) o;
  • This query will return all employees, including those who haven't placed any orders. For employees without orders, the OrderID and OrderDate fields will be NULL.

When to Use CROSS APPLY vs. OUTER APPLY

  • Use CROSS APPLY when you want to include only rows that have corresponding results from the applied function.
  • Use OUTER APPLY when you want to include all rows from the outer query, even if there are no corresponding results from the applied function.

Benefits of APPLY

  • It provides a way to apply complex logic or filtering on a per-row basis.
  • It can be more efficient than using a JOIN in scenarios where you need to correlate rows from the outer query with results from a table-valued function.

This functionality is particularly useful when dealing with subqueries, table-valued functions, or any scenario where you need to filter or calculate results based on individual rows from the outer query.