File compare with python

filecmp — File and Directory Comparisons¶

The filecmp module defines functions to compare files and directories, with various optional time/correctness trade-offs. For comparing files, see also the difflib module.

The filecmp module defines the following functions:

filecmp. cmp ( f1 , f2 , shallow = True ) ¶

Compare the files named f1 and f2, returning True if they seem equal, False otherwise.

If shallow is true and the os.stat() signatures (file type, size, and modification time) of both files are identical, the files are taken to be equal.

Otherwise, the files are treated as different if their sizes or contents differ.

Note that no external programs are called from this function, giving it portability and efficiency.

This function uses a cache for past comparisons and the results, with cache entries invalidated if the os.stat() information for the file changes. The entire cache may be cleared using clear_cache() .

filecmp. cmpfiles ( dir1 , dir2 , common , shallow = True ) ¶

Compare the files in the two directories dir1 and dir2 whose names are given by common.

Returns three lists of file names: match, mismatch, errors. match contains the list of files that match, mismatch contains the names of those that don’t, and errors lists the names of files which could not be compared. Files are listed in errors if they don’t exist in one of the directories, the user lacks permission to read them or if the comparison could not be done for some other reason.

The shallow parameter has the same meaning and default value as for filecmp.cmp() .

For example, cmpfiles(‘a’, ‘b’, [‘c’, ‘d/e’]) will compare a/c with b/c and a/d/e with b/d/e . ‘c’ and ‘d/e’ will each be in one of the three returned lists.

Clear the filecmp cache. This may be useful if a file is compared so quickly after it is modified that it is within the mtime resolution of the underlying filesystem.

The dircmp class¶

Construct a new directory comparison object, to compare the directories a and b. ignore is a list of names to ignore, and defaults to filecmp.DEFAULT_IGNORES . hide is a list of names to hide, and defaults to [os.curdir, os.pardir] .

The dircmp class compares files by doing shallow comparisons as described for filecmp.cmp() .

The dircmp class provides the following methods:

Print (to sys.stdout ) a comparison between a and b.

Print a comparison between a and b and common immediate subdirectories.

Print a comparison between a and b and common subdirectories (recursively).

The dircmp class offers a number of interesting attributes that may be used to get various bits of information about the directory trees being compared.

Note that via __getattr__() hooks, all attributes are computed lazily, so there is no speed penalty if only those attributes which are lightweight to compute are used.

Files and subdirectories in a, filtered by hide and ignore.

Files and subdirectories in b, filtered by hide and ignore.

Files and subdirectories in both a and b.

Files and subdirectories only in a.

Files and subdirectories only in b.

Subdirectories in both a and b.

Names in both a and b, such that the type differs between the directories, or names for which os.stat() reports an error.

Files which are identical in both a and b, using the class’s file comparison operator.

Files which are in both a and b, whose contents differ according to the class’s file comparison operator.

Files which are in both a and b, but could not be compared.

A dictionary mapping names in common_dirs to dircmp instances (or MyDirCmp instances if this instance is of type MyDirCmp, a subclass of dircmp ).

Changed in version 3.10: Previously entries were always dircmp instances. Now entries are the same type as self, if self is a subclass of dircmp .

List of directories ignored by dircmp by default.

Here is a simplified example of using the subdirs attribute to search recursively through two directories to show common different files:

>>> from filecmp import dircmp >>> def print_diff_files(dcmp): . for name in dcmp.diff_files: . print("diff_file %s found in %s and %s" % (name, dcmp.left, . dcmp.right)) . for sub_dcmp in dcmp.subdirs.values(): . print_diff_files(sub_dcmp) . >>> dcmp = dircmp('dir1', 'dir2') >>> print_diff_files(dcmp) 

Источник

How to Compare Two Files in Python Line by Line

This tutorial examines the various methods of how to compare two files in Python. We’ll cover reading two files and comparing them line by line, as well as using available modules to complete this common task.

There are many ways of comparing two files in Python. Python comes with modules for this very purpose, including the filecmp and difflib modules.

The following Python 3 examples contrast the various methods of determining whether or not two files contain the same data. We’ll use functions and modules that come built-in with Python 3, so there’s no need to download additional packages.

Compare Two Text Files Line by Line

We can compare two text files using the open() function to read the data contained in the files. The open() function will look for a file in the local directory and attempt to read it.

For this example, we’ll compare two files that contain email data. These two lists of emails, we’re told, may not be identical. We’ll let Python check the files for us. Using the readlines() method, it’s possible to extract the lines from the text file.

Once the data is extracted, a for loop is used to compare the files line by line. If the lines don’t match, the user receives a message telling them where the mismatch occurred. We’ll include the data itself so the user can easily track down the different lines.

Example: Using Python to compare email lists

file1 = open("emails_A.txt",'r') file2 = open("emails_B.txt",'r') file1_lines = file1.readlines() file2_lines = file2.readlines() for i in range(len(file1_lines)): if file1_lines[i] != file2_lines[i]: print("Line " + str(i+1) + " doesn't match.") print("------------------------") print("File1: " + file1_lines[i]) print("File2: " + file2_lines[i]) file1.close() file2.close() 
Line 1 doesn't match. ------------------------ File1: [email protected] File2: [email protected] Line 3 doesn't match. ------------------------ File1: [email protected] File2: [email protected] Line 4 doesn't match. ------------------------ File1: [email protected] File2: [email protected] 

Using the filecmp Module to Compare Files

The filecmp module includes functions for working with files in Python. Specifically, this module is used to compare data between two or more files. We can do this using the filecmp.cmp() method. This method will return True if the files match, or False if they don’t.

This example uses three files. The first and third are identical, while the second is slightly different. We’ll use the filecmp.cmp() method to compare the files using Python.

punctuation1.txt
Eat your dinner.
I’d like to thank my parents, Janet and God.
I’m sorry I care about you.
She’s really into cooking, her family, and her cats.

punctuation2.txt
Eat. You’re dinner!
I’d like to thank my parents, Janet, and God.
I’m sorry. I care about you.
She’s really into cooking her family and her cats.

punctuation3.txt
Eat your dinner.
I’d like to thank my parents, Janet and God.
I’m sorry I care about you.
She’s really into cooking, her family, and her cats.

Before we can use the filecmp module, we’ll need to import it. We also need to import the os module, which will allow us to load a file using the path in the directory. For this example, a custom function was used to complete the comparison.

After we compare the files, we can see if the data matches, Finally, we’ll alert the user to the outcome.

Example: Compare two files with the filecmp.cmp()

import filecmp import os # notice the two backslashes file1 = "C:\\Users\jpett\\Desktop\\PythonForBeginners\\2Files\\punctuation1.txt" file2 = "C:\\Users\jpett\\Desktop\\PythonForBeginners\\2Files\\punctuation2.txt" file3 = "C:\\Users\jpett\\Desktop\\PythonForBeginners\\2Files\\punctuation3.txt" def compare_files(file1,file2): compare = filecmp.cmp(file1,file2) if compare == True: print("The files are the same.") else: print("The files are different.") compare_files(file1,file2) compare_files(file1,file3) 
The files are different. The files are the same. 

Compare Two Files Using the difflib Module

The difflib module is useful for comparing texts and finding the differences between them. This Python 3 module comes pre-packaged with the language. It contains many useful functions for comparing bodies of texts.

Firstly, we’ll use the unified_diff() function to pinpoint mismatches between two data files. These files contain the information for fictitious students, including their names and grade point averages.

Secondly, we’ll compare these student records and examine how the student’s grades change from the years 2019 through 2020. We can do this using the unified_diff() function. The following example makes use of the with statement to read the file data. By using the Python with statement, we can safely open and read files.

student_gpa_2019.txt
Chelsea Walker 3.3
Caroline Bennett 2.8
Garry Holmes 3.7
Rafael Rogers 3.6
Patrick Nelson 2.1

student_gpa_2020.txt
Chelsea Walker 3.6
Caroline Bennett 2.7
Garry Holmes 3.7
Rafael Rogers 3.7
Patrick Nelson 2.1

Example: Comparing Student GPA’s

import difflib with open("student_gpa_2019.txt",'r') as file1: file1_contents = file1.readlines() with open("student_gpa_2020.txt",'r') as file2: file2_contents = file2.readlines() diff = difflib.unified_diff( file1_contents, file2_contents, fromfile="file1.txt", tofile="file2.txt", lineterm='') for line in diff: print(line) 
--- file1.txt +++ file2.txt @@ -1,5 +1,5 @@ -Chelsea Walker 3.3 -Caroline Bennett 2.8 +Chelsea Walker 3.6 +Caroline Bennett 2.7 Garry Holmes 3.7 -Rafael Rogers 3.6 +Rafael Rogers 3.7 Patrick Nelson 2.1

Looking at the output, we can see that the difflib module does much more than compare text files line by line. The unified_diff() function also provides some context about the differences found.

Compare Two .csv Files in Python Line by Line

Comma separated value files are used for exchanging data between programs. Python provides tools for working with these files as well. By using the csv module, we can quickly access the data within a csv file.

Using the csv module, we’ll compare two files of data and identify the lines that don’t match. These files contain employee records, including the first name, last name, and email of each employee. This data was generated randomly, but we’ll pretend our employee urgently needs us to complete the comparison.

Once we have the employee data, we can read it using the reader() function. Contained within the csv module, the reader() function can interpret csv data. With the data collected, we can use Python to convert the data to a list.

Finally, using a for loop, we’ll compare the elements of the two lists. Each element will hold a line from the employee data files. This way, we can iterate over the lists and discover which lines aren’t identical.

The Python program will compare the files line by line. As a result, we can identify all the differences between the employee data files.

Example: Using the csv module to compare employee data files

import csv file1 = open("employeesA.csv",'r') file2 = open("employeesB.csv",'r') data_read1= csv.reader(file1) data_read2 = csv.reader(file2) # convert the data to a list data1 = [data for data in data_read1] data2 = [data for data in data_read2] for i in range(len(data1)): if data1[i] != data2[i]: print("Line " + str(i) + " is a mismatch.") print(f" doesn't match ") file1.close() file2.close() 
Line 1 is a mismatch. ['David', 'Crawford', '[email protected]'] doesn't match ['Andrew', 'Crawford', '[email protected]'] Line 4 is a mismatch. ['Aida', 'Alexander', '[email protected]'] doesn't match ['Agata', 'Anderson', '[email protected]'] Line 5 is a mismatch. ['Valeria', 'Douglas', '[email protected]'] doesn't match ['Miley', 'Holmes', '[email protected]'] 

In Conclusion

Python provides many tools for comparing two text files, including csv files. In this post, we’ve discussed many of the functions and modules that come with Python 3. Moreover, we’ve seen how to use them to compare files line by line in Python.

By discovering new modules, we can write programs that make our lives easier. Many of the programs and web apps that we use on a daily basis are powered by Python.

Источник

Читайте также:  Python get from array
Оцените статью