Find Duplicate Files in SharePoint Online – Quick Tips for Cleanup

  Mohit Kumar Jha
Written By Mohit Kumar Jha
Anuraag Singh
Approved By Anuraag Singh
Modified On September 19th, 2025
Reading Time 8 Min Read

Keeping the SharePoint Online environment uncluttered by finding and deleting duplicate files is a crucial task for every SharePoint admin, including me. But the question is, how can this be possible? Don’t fret, if you are unaware of the process to find duplicate files in SharePoint Online, I’m here to guide you with the same.

Today, I’ll explain the complete step-by-step procedure for keeping your SharePoint environment clutter-free and staying within your SharePoint site storage limit. I guarantee, by the end of this discussion, you will have all the answers to your question related to the same query.

So, let’s explore:

Why Finding Duplicates in SharePoint Online Matters

When you’re collaborating with multiple people, duplicate files can:

To avoid these scenarios, I explored and found some practical solutions to find duplicates in a SharePoint site. Next, I’ll share the same in detail.

How to Find Duplicate Files in SharePoint Online

There are several solutions to identify duplicates that I found. Below are all of them with their details, and step-by-step instructions:

Method #1. Leverage SharePoint’s Version History Feature

Before assuming you have duplicates, check whether multiple versions exist instead of duplicate files:

  • Open the document library.
  • Right-click a file > Version history.
  • If you see multiple versions, you’re not dealing with separate duplicates.
  • Delete or restore older versions if needed.

Method #2. Use SharePoint View Settings

Customise views to reveal duplicates based on metadata like Title, Modified, or File Size. Here’s how to find:

  • Go to Library settings > Create view > Standard view.
  • Sort by Name or Modified By.
  • Group by Title or File Size.
  • Manually check grouped items with identical names or sizes.

Method #3. Find Duplicate Files in SharePoint Online Manually

For a small set of data, you can search for duplicate files manually in SharePoint Online:

  • Open the Document Library to review the files.
  • Check files based on details like modified date, file name, and file size.
  • Open individual documents if needed to verify whether the content is duplicated.
  • Use the advanced filter options to sort and organise files.

This method works only for small libraries; for larger datasets, it can be time-consuming.

Method #4. Microsoft’s SharePoint Duplicate Analysis Tool (DeDup)

Microsoft offered its own duplicate files finder tool for SharePoint Online. It is available on the Azure Marketplace. But remember, it is a paid tool and also requires technical expertise to operate. Below are the steps of the Deduplicator tool:

  • Step 1. Download the DeDup tool on your machine. Then log in with the Microsoft credentials.
  • Step 2. Hit the Accept button.
  • Step 3. From the User menu, open the Credentials. Then add new credentials.
  • Step 4. Open the User menu, then Add Sites, and press the Rescan button to find duplicate files in SharePoint Online.
  • Step 5. Now, select the site for scanning and track the progress information through the dashboard.
  • Step 6. Open the More Details option to get a detailed overview of the duplicate files.
  • Step 7. Click on the Audit option to save the DeDup’s SharePoint duplicate analysis tool’s generated report in the Excel file format for further analysis.

Method #5. List All the Duplicate Files Using a PowerShell Script

If you are working with PowerShell and want to automate tasks, then run these commands. But remember, these cmdlets should be executed sequentially to get the expected results:

#Load SharePoint CSOM Assemblies
Add-Type -Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"
Add-Type -Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"

#Parameters
$URLofSharePointSite = "enter here"
$CSV_Loc = "C:\Temp\Duplicates.csv"
$Batch_Size = 1000

$DataCollection = @()

#Get credentials to connect
$AllCredt = Get-Credential

Try {
#Setup the Context
$Ctx = New-Object Microsoft.SharePoint.Client.ClientContext($URLofSharePointSite)
$Ctx.Credentials = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($AllCredt.UserName, $AllCredt.Password)

#Get the Web
$Web = $Ctx.Web
$ListsContainer = $Web.Lists
$Ctx.Load($Web)
$Ctx.Load($ListsContainer)
$Ctx.ExecuteQuery()

ForEach($Li in $ListsContainer)
{

If($Li.BaseType -eq "DocumentLibrary" -and $Li.Hidden -eq $False -and $Li.ItemCount -gt 0 -and $Li.Title -Notin("PagesofSite","Style Library", "Preservation Hold Library"))
{

$Query = New-Object Microsoft.SharePoint.Client.CamlQuery
$Query.ViewXml = "@

$Batch_Size
"

$Cnt = 1

Do {
$LiItems = $Li.GetItems($Query)
$Ctx.Load($LitItems)
$Ctx.ExecuteQuery()

ForEach($Item in $LiItems)
{
#Fiter Files
If($Item.FileSystemObjectType -eq "File")
{

$File = $Item.File
$Ctx.Load($File)
$Ctx.ExecuteQuery()
Write-Progress -PercentComplete ($Cnt / $Li.ItemCount * 100) -Activity "File Processing $Cnt of $($Li.ItemCount) in $($Li.Title) of $($Web.URL)" -Status "Scan Files '$($File.Name)'"

#Get The File Hash
$Bytes = $File.OpenBinaryStream()
$Ctx.ExecuteQuery()
$MD5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$HashCode = [System.BitConverter]::ToString($MD5.ComputeHash($Bytes.Value))

#Collect data
$Data = New-Object PSObject
$Data | Add-Member -MemberType NoteProperty -name "Name_ofFile" -value $File.Name
$Data | Add-Member -MemberType NoteProperty -Name "File_Hash_Code" -value $HashCode
$Data | Add-Member -MemberType NoteProperty -Name "URL_ofFile" -value $File.ServerRelativeUrl
$Data | Add-Member -MemberType NoteProperty -Name "Size_ofFile" -value $File.Length
$DataCollection += $Data
}
$Cnt++
}
#Update Position of the ListItemCollectionPosition
$Query.ListItemCollectionPosition = $LiItems.ListItemCollectionPosition
}While($Query.ListItemCollectionPosition -ne $null)
}
}
#Export All Data to CSV
$DataCollection | Export-Csv -Path $CSV_Loc -NoTypeInformation
Write-host -f Green "Exported to a CSV File $CSV_Loc"

$SharePointDuplicates = $DataCollection | Group-Object -Property HashCode | Where {$_.Count -gt 1} | Select -ExpandProperty Group
Write-host "Duplicates as per the Hashcode:"
$SharePointDuplicates | Format-table -AutoSize

#Group Based on File Name
$Duplicates_FileName = $DataCollection | Group-Object -Property FileName | Where {$_.Count -gt 1} | Select -ExpandProperty Group
Write-host “Duplicates Files in SharePoint Online according to the File Name:"
$Duplicates_FileName| Format-table -AutoSize

#Group Based on File Size
$Duplicates_FileSize = $DataCollection | Group-Object -Property FileSize | Where {$_.Count -gt 1} | Select -ExpandProperty Group
Write-host "Duplicates Files as per the File Size:"
$Duplicates_FileSize| Format-table -AutoSize
}
Catch {
write-host -f Red "Error:" $_.Exception.Message
}

}

Delete Duplicate Files in SharePoint Online

When you find duplicate files in SharePoint Online, execute the script below to delete them. It is a way better approach to delete the duplicate files rather than manually.

# Define the source path
$sourceLocation = "C:\Temp\New"

# Get all files with the same size
$AllFiles = Get-ChildItem -Path $sourceLocation -File -Recurse | Sort-Object LastWriteTime -Descending | Group-Object -Property Length | Where-Object {$_.Count -gt 1}

# Group files by their hash and find duplicates
$AllduplicateFiles = $AllFiles | Select -ExpandProperty Group | Get-FileHash | Group-Object -Property Hash | Where-Object {$_.Count -gt 1}

#Delete the Duplicate files
if ($AllduplicateFiles.Count -eq 0) {
Write-Output "No duplicate files found in SharePoint."
} else {
Write-Output "Founded Duplicate files are deleted successfully:"
$AllduplicateFiles | ForEach-Object {
$AllfilesforDelete = $_.Group | Select-Object -Skip 1
$AllfilesforDelete | ForEach-Object {
Write-Output "Deleting: $($_.Path)"
Remove-Item -Path $_.Path -Force
}
}

Now, by using the methods above, you can identify duplicate files in SharePoint Online. However, deleting duplicates directly without taking a backup SharePoint Online to local storage or another SharePoint tenant can be unsafe. Accidental deletion causes permanent data loss. To reduce this risk, I always back up or migrate important files before making changes.

Move SharePoint Online Data to Another Site or Tenant

Instead of permanently deleting files, you can first move your SharePoint data to another account or site as a precaution. For this step, I recommend SysTools SharePoint Migrator. It doesn’t scan for duplicates, but it serves as a reliable tool for backing up SharePoint data to another site or tenant before cleanup. If anything goes wrong during duplicate removal, you’ll have a safe copy of your content to restore.

Download Now Purchase Now

You can quickly create a backup to another site or tenant using these steps:

  1. Open the solution and choose both source and target platforms as Microsoft 365.
  2. Select the Site option and enter the required platform details.
  3. Add your Users and Sites, then hit Start Migration.

This backup process is necessary before any deletion, as you can’t restore a deleted SharePoint site if it is deleted permanently.

Best Practices to Prevent Future Duplicates

If you don’t want to find duplicate files in SharePoint Online, and delete them in future, follow this checklist:

  • Establish clear file naming conventions.
  • Use SharePoint’s co-authoring feature instead of saving multiple versions.
  • Enable versioning and retention policies to avoid manual duplicates.
  • Regularly audit your libraries using PowerShell.
  • Train team members on proper upload and editing practices.

Author’s Verdict

Now, this is clear how to find duplicate files in SharePoint Online. Plus, you have many options to choose from. Remember, don’t forget to take a backup of your data, whether offline or online. Please share this information with those who actually need this.

People Also Ask

Q1. Can SharePoint Online automatically remove duplicates?
No, SharePoint Online doesn’t have a built-in de-duplication feature. I use PowerShell for automation.

Q2. Will version history count as duplicate files?
No, version history stores multiple versions within one file, not as separate duplicates.

Q3. Can I use Microsoft 365 Compliance tools to find duplicate files in SharePoint Online?
Yes, I sometimes use Microsoft Purview Content Search to identify duplicate file names or metadata patterns.

  author

By Mohit Kumar Jha

Mohit is a Microsoft Certified expert for all things Microsoft. He brings a unique perspective gained from nearly a decade of active participation in various IT forums, blogs, and social media. Known in admin circles as the go-to guru for solving user queries in the domain of cloud migration, data backup, and digital forensics. The secret to his core expertise lies in solving problems practically. Through this hands-on experience, he has acquired knowledge in diverse domains like Microsoft 365 Cloud, On-Premise Exchange Server, AD, and Entra ID.