I recently transitioned to a newer PC for my main computer. It’s considerably faster (and lighter and cooler) than the 5+ year old Dell desktop it’s replacing. But it has a considerably smaller drive. I just did not want to pay a premium for a big SSD. So that means I need to be smart(er) about my storage usage. Sure I can keep everything “in the cloud” and I do. But I am old enough that I still like to keep a copy of my most precious data local on my PC. I can then back up the local copy to any number of external drives and even back that back up to another cloud service like Backblaze. It’s an illness I know.
As I started to set up my precious photo collection on the new PC, I noticed that it was consuming nearly 100 GB of my scarce 256 GB drive. No bueno – as we say in the InfoSec business.
The cure turned out to be sweeping duplicate files from my photo library. I won’t bore you with the details, but let’s just say I’ve been promiscuous in my use of photo apps and services – very promiscuous. Enough so that I know that I have duplicate copies of the same photos stored in various sub-directories on my drive. So I knew that I wanted to discover these dupes and deal with them. The question of course is how would I find them?
There are any number of applications you can download that claim to be the answer to your duplicate file woes. But I have to say that many of the ones I found were hosted on dodgy looking websites and I feared would be crawling with spyware, adware and perhaps even worse bits. So I decided to use my Google-fu to look for any PowerShell scripts that might serve my needs.
And sure enough I found a great resource at a site called “Read Only Maio” http://www.readonlymaio.org/rom/2017/10/09/finding-duplicated-identical-files-with-powershell-the-fast-way/. This person had already done all the heavy lifting for me. I just needed to apply a minor tweak here or there and create a workflow for myself. If was really very easy.
For each file location I wanted to review, I went through the following process. So for example, I cleaned up my photos by opening up the PowerShell console and changing directory to c:\Users\Kevin\OneDrive\Pictures\ and then ran the following steps:
Stage one:
gci -file -recurse | Group-Object Length | Where-Object { $_.Count -gt 1 } | select -ExpandProperty group | foreach {get-filehash -literalpath $_.fullname} | group -property hash | where { $_.count -gt 1 } | select -ExpandProperty group | select hash, path | Out-File c:\dupe\duplicated_files.txt -width 510
This outputs a text file to c:\dupe\ that will show the detected duplicate files. After reviewing and sanity checking the list I then moved on to Stage two.
Stage two:
gci -file -recurse | Group-Object Length | Where-Object { $_.Count -gt 1 } | select -ExpandProperty group | foreach {get-filehash -literalpath $_.fullname} | group -property hash | where { $_.count -gt 1 } | foreach { $_.group | select -skip 1 } | select -ExpandProperty path | foreach {Move-Item -LiteralPath $_ -Destination C:\dupe}
Now for each detected duplicate, one file is moved to the c:\dupe directory. Note that if you have more than two of the same file, only one will be moved and you will see error messages in the PowerShell console advising you that a file cannot be created in the c:\dupe folder with the same name. This means that if you have more than two copies of the same file you will need to repeat Stage two multiple times.
Stage three:
Review the files in c:\dupe and spot check if you want. If you are comfortable that these are indeed dupes, you can then empty c:\dupe and you will have freed up some space on your drive.
You can repeat Stage two and three as many times as it takes to eliminate your duplicate files.
Please note, this process does not take into account your preferred location for files. If you want to make sure that you keep the primary copy of the file in a certain location this process may not be right for you. But this worked a treat for me and eliminated thousands of duplicate files that were just wasting space on my drive. Hopefully this can do you some good as well.