The Money Mustache Community

Learning, Sharing, and Teaching => Ask a Mustachian => Topic started by: Sibley on December 09, 2016, 08:32:05 AM

Title: Tech question - finding duplicate file folders
Post by: Sibley on December 09, 2016, 08:32:05 AM
So, I'm working on some stuff at work and am hitting a tech issue, hoping someone can suggest something. Old data, with duplicates. Except it's not so easy to tell sometimes and there's thousands of folders.

Example:
DMC-2007-012-089
DMC-2007-012-091

These are actually duplicate file folders. But because of the last 3 digits not matching, they won't be picked up as duplicates. Any ideas on how to easily find these things? They're all in one root directory, but multiple sub folders, etc.
Title: Re: Tech question - finding duplicate file folders
Post by: bobechs on December 09, 2016, 08:58:54 AM
If you are willing to use a program that looks at content of files rather than the names of folders to detect duplicates, there is a plethora of free-share-pay solutions out there.

An example of a rundown of some of them:  http://www.howtogeek.com/200962/how-to-find-and-remove-duplicate-files-on-windows/

or:  http://lifehacker.com/the-best-duplicate-file-finder-for-windows-1696492476

I have used the free version of Duplicate File Finder with satisfaction, but there may be better ones out there.
Title: Re: Tech question - finding duplicate file folders
Post by: Sibley on December 09, 2016, 10:13:18 AM
I tried the duplicate file finder based on the content of the folders. Unfortunately, based on what's actually in these folders, the results I'm getting are invalid. They're not word/excel files or anything like that, these are extracts of projects from our project management software, so they do have the same file names in many cases.

I'm just manually removing the last suffix from the file names for now, since I don't have anything else to do and I don't need those suffixes anyway. And have found duplicates :(
Title: Re: Tech question - finding duplicate file folders
Post by: GuitarStv on December 09, 2016, 10:21:14 AM
Never, ever do anything manually on a computer.  Software engineers are lazy as fuck.  If you have a computer problem that seems to require manual work, my experience is that someone out there has already solved it. :P

Use ultraedit to find duplicate files: https://www.ultraedit.com/support/tutorials_power_tips/ultracompare/find-duplicates.html (https://www.ultraedit.com/support/tutorials_power_tips/ultracompare/find-duplicates.html)

Download ultraedit for free: https://www.ultraedit.com/downloads/ultraedit_download.html (https://www.ultraedit.com/downloads/ultraedit_download.html)
Title: Re: Tech question - finding duplicate file folders
Post by: jeromedawg on December 09, 2016, 11:24:31 AM
So it seems like what you're saying is that there are multiple folders which have duplicate data?


Try CloneSpy - I've had good success with that. Now if you're trying to just delete particular folder(s) containing the duplicate data that might be tricky. But if Folders A, B, and C all contain Files A, B, C, D, E then you should be able to easily delete those files from Folders B and C. Though I'm not sure you would just be able to delete Folders B and C unless you manually did that in Explorer or perhaps from the duplicate finder program/app itself... even then, if you want to be extra-careful, marking the folders you want to delete first and then manually verifying/deleting is the "safest" way to go but you will have to sacrifice the time to do so. If it's pretty important data though, it's probably warranted. It would really suck if you checked auto-delete and deleted a certain folder or contents that you weren't intending to.
Title: Re: Tech question - finding duplicate file folders
Post by: AZDude on December 09, 2016, 11:27:42 AM
So, I'm working on some stuff at work and am hitting a tech issue, hoping someone can suggest something. Old data, with duplicates. Except it's not so easy to tell sometimes and there's thousands of folders.

Example:
DMC-2007-012-089
DMC-2007-012-091

These are actually duplicate file folders. But because of the last 3 digits not matching, they won't be picked up as duplicates. Any ideas on how to easily find these things? They're all in one root directory, but multiple sub folders, etc.

Do you have an IT department? Even in the most complicated scenario, this seems like something that would take 10 minutes for a skilled Sys Admin or programmer.

Otherwise, others have good suggestions.
Title: Re: Tech question - finding duplicate file folders
Post by: Heroes821 on December 09, 2016, 01:23:56 PM
if the files have different names but are identical on the inside you can rush file hashes MD5 or SHA and if they match they are a duplicate. If even 1 letter is different that won't work. 

I second the ask IT if they have a solution as well. What system the files are on (i.e. windows, Linux, unix, mac) is also important for the limited assistance we can give over the forum.
Title: Re: Tech question - finding duplicate file folders
Post by: Sibley on December 09, 2016, 06:54:44 PM
Yes, there is a significant IT department, however getting their involvement isn't easy. And they're way overburdened right now, so this would never get done. I've been waiting a year for them to work on a bigger project, not asking for this right now :)  Although, ironically I'm doing this in prep for what they're working on.

This is critical data. I can de-dup and reorganize, but can't delete anything completely. In fact, I've already deleted 150 gb of duplicate data, but the pieces I'm working on right now are not that easy. Add in the fact that I have nothing else to do right now (project based, in between projects, and new projects won't start until Jan), so manual checking is not a problem.

jplee3 - I'll check out CloneSpy, thanks. Sounds like that might help. The other dup finders I tried weren't working for what I need, this sounds different.
Title: Re: Tech question - finding duplicate file folders
Post by: With This Herring on December 09, 2016, 07:21:27 PM
Just checking - you backed up everything before starting to media that won't be overwritten after a five-day cycle, right?  If not, please do so now.  (I don't have an answer to your question, though.)

OldJob had an automatic backup of all network OldJob files twice a day, but I think the backups were only kept for a certain number of days and then overwritten.  So, if we deleted a folder on Jan 1 but didn't realize that we still needed it until Jan 31, it would have been...bad.
Title: Re: Tech question - finding duplicate file folders
Post by: Sibley on December 09, 2016, 07:40:20 PM
Just checking - you backed up everything before starting to media that won't be overwritten after a five-day cycle, right?  If not, please do so now.  (I don't have an answer to your question, though.)

OldJob had an automatic backup of all network OldJob files twice a day, but I think the backups were only kept for a certain number of days and then overwritten.  So, if we deleted a folder on Jan 1 but didn't realize that we still needed it until Jan 31, it would have been...bad.

Oh yes. IT backs everything up nightly. Beyond that, I actually copied all 68 GB of data from the network drive to my local and am working on that copy. Original is not being touched until I'm sure that we're ok!
Title: Re: Tech question - finding duplicate file folders
Post by: swick on December 09, 2016, 07:46:06 PM
I've had great progress with the paid version (I think 30.00?) of Duplicate cleaner. http://www.duplicatecleaner.com/  (http://www.duplicatecleaner.com/)

You can set a variety of different parameters to compare against and it goes by the data and not just file names. I had to go through 3 TB of mostly duplicate data. Saved me hours and hours and hours. Best 30.00 I've spent in a long while.
Title: Re: Tech question - finding duplicate file folders
Post by: jeromedawg on December 09, 2016, 08:38:42 PM
Yes, there is a significant IT department, however getting their involvement isn't easy. And they're way overburdened right now, so this would never get done. I've been waiting a year for them to work on a bigger project, not asking for this right now :)  Although, ironically I'm doing this in prep for what they're working on.

This is critical data. I can de-dup and reorganize, but can't delete anything completely. In fact, I've already deleted 150 gb of duplicate data, but the pieces I'm working on right now are not that easy. Add in the fact that I have nothing else to do right now (project based, in between projects, and new projects won't start until Jan), so manual checking is not a problem.

jplee3 - I'll check out CloneSpy, thanks. Sounds like that might help. The other dup finders I tried weren't working for what I need, this sounds different.

Yep, CloneSpy is relatively flexible. You can compare files between different folders and have the option to auto-delete or not. I *think* there might be an option to even move the dupes into a different folder as well if that'll make the initial clean-up easier. There are various ways you can set it up to find dupes too - via date, name, etc metadata but also via md5 hash. It doesn't make the overall work much less tedious but definitely helps with the potential reorganization.
Title: Re: Tech question - finding duplicate file folders
Post by: pablos on August 09, 2018, 08:17:21 AM
i think if you are looking foward to remove duplicate images then here are some listed which you can use
https://youprogrammer.com/duplicate-photo-finders-remove-duplicate-photos/