Python check if files are identical

How to compare files in Python

The filecmp module in python can be used to compare files and directories. 1.

filecmp Compares the files file1 and file2 and returns True if identical, False if not. By default, files that have identical attributes as returned by os.stat() are considered to be equal. If shallow is not provided (or is True), files that have the same stat signature are considered equal.

cmpfiles(dir1, dir2, common[, shallow])

Compares the contents of the files contained in the list common in the two directories dir1 and dir2. cmpfiles returns a tuple containing three lists — match, mismatch, errors of filenames.

  • match — lists the files that are the same in both directories.
  • mismatch — lists the files that dont match.
  • errors — lists the files that could not be compared for some reason.
dircmp(dir1, dir2 [, ignore[, hide]])

Creates a directory comparison object that can be used to perform various comparison operations on the directories dir1 and dir2.

  • ignore — ignores a list of filenames to ignore, default value of [‘RCS’,’CVS’,’tags’].
  • hide — list of filenames to hide, defaults list [os.curdir, os.pardir] ([‘.’, ‘..’] on UNIX.

Instances of filecmp.dircmp implement the following methods that print elaborated reports to sys.stdout:

  • report() : Prints a comparison between the two directories.
  • report_partial_closure() : Prints a comparison of the two directories as well as of the immediate subdirectories of the two directories.
  • report_full_closure() :Prints a comparison of the two directories, all of their subdirectories, all the subdirectories of those subdirectories, and so on (i.e., recursively).
  • left_list: files and subdirectories found in directory path1, not including elements of hidelist.
  • right_list: files and subdirectories found in directory path2, not including elements of hidelist.
  • common: files and subdirectories that are in both directory path1 and directory path2.
  • left_only: files and subdirectories that are in directory path1 only.
  • right_only: files and subdirectories that are in directory path2 only.
  • common_dirs: subdirectories that are in both directory path1 and directory path2.
  • common_files: files that are in both directory path1 and directory path2.
  • same_files: Paths to files whose contents are identical in both directory path1 and directory path2.
  • diff_files: Paths to files that are in both directory path1 and directory path2 but whose contents differ.
  • funny_files: paths to files that are in both directory path1 and directory path2 but could not be compared for some reason.
  • subdirs: A dictionary that maps names in common_dirs to dircmp objects.
Читайте также:  Wordpress adding php pages

Preparing test data for comparsion.

import os # prepare test data def makefile(filename,text=None): """ Function: make some files params : input file, body """ with open(filename, 'w') as f: f.write(text or filename) return # prepare test data def makedirectory(directory_name): """ Function: make directories params : input directory """ if not os.path.exists(directory_name): os.mkdir(directory_name) # Get current working directory present_directory = os.getcwd() # change to directory provided os.chdir(directory_name) # Make two directories os.mkdir('dir1') os.mkdir('dir2') # Make two same subdirectories os.mkdir('dir1/common_dir') os.mkdir('dir2/common_dir') # Make two different subdirectories os.mkdir('dir1/dir_only_in_dir1') os.mkdir('dir2/dir_only_in_dir2') # Make a unqiue file one each in directory makefile('dir1/file_only_in_dir1') makefile('dir2/file_only_in_dir2') # Make a unqiue file one each in directory makefile('dir1/common_file', 'Hello, Writing Same Content') makefile('dir2/common_file', 'Hello, Writing Same Content') # Make a non unqiue file one each in directory makefile('dir1/not_the_same') makefile('dir2/not_the_same') makefile('dir1/file_in_dir1', 'This is a file in dir1') os.mkdir('dir2/file_in_dir1') os.chdir(present_directory) return if __name__ == '__main__': os.chdir(os.getcwd()) makedirectory('example') makedirectory('example/dir1/common_dir') makedirectory('example/dir2/common_dir')
  • filecmp example Running filecmp example. The shallow argument tells cmp() whether to look at the contents of the file, in addition to its metadata.

The default is to perform a shallow comparison using the information available from os.stat(). If the results are the same, the files are considered the same. Thus, files of the same size that were created at the same time are reported as the same, even if their contents differ.

When shallow is False, the contents of the file are always compared.

import filecmp print('Output \n *** Common File :', end=' ') print(filecmp.cmp('example/dir1/common_file', 'example/dir2/common_file'), end=' ') print(filecmp.cmp('example/dir1/common_file', 'example/dir2/common_file', shallow=False)) print(' *** Different Files :', end=' ') print(filecmp.cmp('example/dir1/not_the_same', 'example/dir2/not_the_same'), end=' ') print(filecmp.cmp('example/dir1/not_the_same', 'example/dir2/not_the_same', shallow=False)) print(' *** Identical Files :', end=' ') print(filecmp.cmp('example/dir1/file_only_in_dir1', 'example/dir1/file_only_in_dir1'), end=' ') print(filecmp.cmp('example/dir1/file_only_in_dir1', 'example/dir1/file_only_in_dir1', shallow=False))

Output

*** Common File : True True *** Different Files : False False *** Identical Files : True True

Use cmpfiles() to compare a set of files in two directories without recursing.

import filecmp import os # Determine the items that exist in both directories. dir1_contents = set(os.listdir('example/dir1')) dir2_contents = set(os.listdir('example/dir2')) common = list(dir1_contents & dir2_contents) common_files = [f for f in common if os.path.isfile(os.path.join('example/dir1', f))] print(f' *** Common files are : ') # Now, let us compare the directories match, mismatch, errors = filecmp.cmpfiles( 'example/dir1', 'example/dir2', common_files,) print(f' *** Matched files are : ') print(f' *** mismatch files are : ') print(f' *** errors files are : ')
*** Common files are : ['file_in_dir1', 'not_the_same', 'common_file'] *** Matched files are : ['common_file'] *** mismatch files are : ['file_in_dir1', 'not_the_same'] *** errors files are : []
import filecmp dc = filecmp.dircmp('example/dir1', 'example/dir2') print(f"output \n *** Printing detaile report: \n ") print(dc.report()) print(f"\n") print(dc.report_full_closure())

Output

*** Printing detaile report: diff example/dir1 example/dir2 Only in example/dir1 : ['dir_only_in_dir1', 'file_only_in_dir1'] Only in example/dir2 : ['dir_only_in_dir2', 'file_only_in_dir2'] Identical files : ['common_file'] Differing files : ['not_the_same'] Common subdirectories : ['common_dir'] Common funny cases : ['file_in_dir1'] None diff example/dir1 example/dir2 Only in example/dir1 : ['dir_only_in_dir1', 'file_only_in_dir1'] Only in example/dir2 : ['dir_only_in_dir2', 'file_only_in_dir2'] Identical files : ['common_file'] Differing files : ['not_the_same'] Common subdirectories : ['common_dir'] Common funny cases : ['file_in_dir1'] diff example/dir1\common_dir example/dir2\common_dir Common subdirectories : ['dir1', 'dir2'] diff example/dir1\common_dir\dir1 example/dir2\common_dir\dir1 Identical files : ['common_file', 'file_in_dir1', 'file_only_in_dir1', 'not_the_same'] Common subdirectories : ['common_dir', 'dir_only_in_dir1'] diff example/dir1\common_dir\dir1\common_dir example/dir2\common_dir\dir1\common_dir diff example/dir1\common_dir\dir1\dir_only_in_dir1 example/dir2\common_dir\dir1\dir_only_in_dir1 diff example/dir1\common_dir\dir2 example/dir2\common_dir\dir2 Identical files : ['common_file', 'file_only_in_dir2', 'not_the_same'] Common subdirectories : ['common_dir', 'dir_only_in_dir2', 'file_in_dir1'] diff example/dir1\common_dir\dir2\common_dir example/dir2\common_dir\dir2\common_dir diff example/dir1\common_dir\dir2\dir_only_in_dir2 example/dir2\common_dir\dir2\dir_only_in_dir2 diff example/dir1\common_dir\dir2\file_in_dir1 example/dir2\common_dir\dir2\file_in_dir1 None

You can further try all the commands mentioned in Point1 to see how each method behaves.

Источник

filecmp — File and Directory Comparisons¶

The filecmp module defines functions to compare files and directories, with various optional time/correctness trade-offs. For comparing files, see also the difflib module.

The filecmp module defines the following functions:

filecmp. cmp ( f1 , f2 , shallow = True ) ¶

Compare the files named f1 and f2, returning True if they seem equal, False otherwise.

If shallow is true and the os.stat() signatures (file type, size, and modification time) of both files are identical, the files are taken to be equal.

Otherwise, the files are treated as different if their sizes or contents differ.

Note that no external programs are called from this function, giving it portability and efficiency.

This function uses a cache for past comparisons and the results, with cache entries invalidated if the os.stat() information for the file changes. The entire cache may be cleared using clear_cache() .

filecmp. cmpfiles ( dir1 , dir2 , common , shallow = True ) ¶

Compare the files in the two directories dir1 and dir2 whose names are given by common.

Returns three lists of file names: match, mismatch, errors. match contains the list of files that match, mismatch contains the names of those that don’t, and errors lists the names of files which could not be compared. Files are listed in errors if they don’t exist in one of the directories, the user lacks permission to read them or if the comparison could not be done for some other reason.

The shallow parameter has the same meaning and default value as for filecmp.cmp() .

For example, cmpfiles(‘a’, ‘b’, [‘c’, ‘d/e’]) will compare a/c with b/c and a/d/e with b/d/e . ‘c’ and ‘d/e’ will each be in one of the three returned lists.

Clear the filecmp cache. This may be useful if a file is compared so quickly after it is modified that it is within the mtime resolution of the underlying filesystem.

The dircmp class¶

Construct a new directory comparison object, to compare the directories a and b. ignore is a list of names to ignore, and defaults to filecmp.DEFAULT_IGNORES . hide is a list of names to hide, and defaults to [os.curdir, os.pardir] .

The dircmp class compares files by doing shallow comparisons as described for filecmp.cmp() .

The dircmp class provides the following methods:

Print (to sys.stdout ) a comparison between a and b.

Print a comparison between a and b and common immediate subdirectories.

Print a comparison between a and b and common subdirectories (recursively).

The dircmp class offers a number of interesting attributes that may be used to get various bits of information about the directory trees being compared.

Note that via __getattr__() hooks, all attributes are computed lazily, so there is no speed penalty if only those attributes which are lightweight to compute are used.

Files and subdirectories in a, filtered by hide and ignore.

Files and subdirectories in b, filtered by hide and ignore.

Files and subdirectories in both a and b.

Files and subdirectories only in a.

Files and subdirectories only in b.

Subdirectories in both a and b.

Names in both a and b, such that the type differs between the directories, or names for which os.stat() reports an error.

Files which are identical in both a and b, using the class’s file comparison operator.

Files which are in both a and b, whose contents differ according to the class’s file comparison operator.

Files which are in both a and b, but could not be compared.

A dictionary mapping names in common_dirs to dircmp instances (or MyDirCmp instances if this instance is of type MyDirCmp, a subclass of dircmp ).

Changed in version 3.10: Previously entries were always dircmp instances. Now entries are the same type as self, if self is a subclass of dircmp .

List of directories ignored by dircmp by default.

Here is a simplified example of using the subdirs attribute to search recursively through two directories to show common different files:

>>> from filecmp import dircmp >>> def print_diff_files(dcmp): . for name in dcmp.diff_files: . print("diff_file %s found in %s and %s" % (name, dcmp.left, . dcmp.right)) . for sub_dcmp in dcmp.subdirs.values(): . print_diff_files(sub_dcmp) . >>> dcmp = dircmp('dir1', 'dir2') >>> print_diff_files(dcmp) 

Источник

Оцените статью