- How To Merge Large CSV files Into A Single File With Python
- Dive into Python and learn how to automate tasks like merging chunky CSV or Excel files using few lines of code.
- Suggested On-Demand Courses:
- Introduction
- How to Join Two CSV Files in Python Using Pandas ? 3 Steps Only
- Steps By Step to Merge Two CSV Files
- Step 1: Import the Necessary Libraries
- Step 2: Load the Dataset
- Step 3: Merge the Sheets
- Other Things you can Do
- Conclusion
- Other Questions
- 1. You are getting the error ” Columns not found in either dataset: …”
- 2. Getting MergeError: No common columns to perform merge on”
- Join our list
- Saved searches
- Use saved searches to filter your results more quickly
- calebrob6/csv-joins
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- How to combine CSV files using Python?
- Combining Multiple CSV Files together
- Different Ways to Combine CSV Files in Python
- Method 1: append()
- Method 2: concat()
- Method 3: merge()
- Conclusion
- References
How To Merge Large CSV files Into A Single File With Python
Dive into Python and learn how to automate tasks like merging chunky CSV or Excel files using few lines of code.
Suggested On-Demand Courses:
Many of you contacted me asking for valuable resources to automate Excel (and in general spreadsheets) tasks with Python. Below I share four courses that I would recommend:
- Intermediate Python (Nano-Degree)ORData Analysts | Python + SQL (Nano-Degree)VERY HIGH quality courses for people committed to learn more advanced Python!→Obtain 70% Discount Through This Link
- Python Programming For Excel Users (Numpy & Pandas)
- Python For Spreadsheet Users (Pandas & Others)
- Python For Data Analysis & Visualisation (Pandas, Matplotlib, Seaborn)
Hope you’ll find them useful too! Now enjoy the article 😀
Introduction
Believe it or not, in 2022 there are still companies that hire external data consultants to perform tasks that would require minimal effort (even for a newbie) using a small Python script.
“…in 2022 there are still companies that hire external data consultants to perform tasks that would require minimal effort using a small Python script.”
Funnily enough, the same consultants pretend to use some black magic to perform straightforward jobs and charge unbelievably high fees. Money that could definitely be invested more wisely.
For instance, picture this: a big sales team having to merge multiple CSV or Excel files each month, coming from different departments, to create a unified performance report.
Despite these files often come with a similar format, at times they are so chunky, that a manual copy and paste is not even an option and could also lead to errors or missing data.
If this sounds familiar and you wish to learn how to automate such tasks with Python, you are in the right place!
How to Join Two CSV Files in Python Using Pandas ? 3 Steps Only
Somethings We have the dataset that is provided not in single CSVs files. These are in separate excel sheets. And you already know that Its better that We should do all the computational or preprocessing tasks on a Single Dataset that more than one datasets. It reduces our time for doing all the preprocessing tasks. If you want to do so then this entire post is for you. In this tutorial, you will Know to Join or Merge Two CSV files using the Popular Python Pandas Library.
Steps By Step to Merge Two CSV Files
Step 1: Import the Necessary Libraries
Here all things are done using pandas python library. So I am importing pandas only.
Step 2: Load the Dataset
I have created two CSV datasets on Stocks Data one is a set of stocks and the other is the turnover of the stocks. Read it using the Pandas read_csv() method. I have included all the datasets in the Conclusion Section.
Step 3: Merge the Sheets
Now to merge the two CSV files you have to use the dataframe.merge() method and define the column, you want to do merging. If the data is not available for the specific columns in the other sheets then the corresponding rows will be deleted. You can verify using the shape() method. Use the following code.
Other Things you can Do
Now there is a case when you want to append the rows only of one sheet to another sheet and vice-versa. To this, you have to use concate() method. Suppose I have two sheets of the same dataset and I want to work on a single sheet. Then I have to first add all the rows of one sheet to another. After that I can do anything from that dataset. Below is the code for appending the rows in a Dataframe.

Conclusion
Most of the Data Scientist do data analysis on the single sheets. When you search online for any Dataset then you will mostly see the dataset in a single sheet. You should also do this as doing analysis on a single sheet increase efficiency and reduce computational task.
I hope you have understood how to Join Two CSV Files in Python Using Pandas. If you have any query please contact us for more information. Below is the dataset for all the examples taken here.
Other Questions
1. You are getting the error ” Columns not found in either dataset: …”
You may get this error while joining two CSV files. You are getting this error as one of two CSV files does not have a columns name on which you are merging. To solve it you have to make sure the columns exist in CSV files before joining.
2. Getting MergeError: No common columns to perform merge on”
If you are getting this error then interpreter is telling you that all the CSV files you want to join do not have any common columns. To solve this issue you have to do merging on different columns or you have to add a common column on the CSV files you want to perform the merge.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.
We respect your privacy and take protecting it seriously
Thank you for signup. A Confirmation Email has been sent to your Email Address.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Python script for performing different join operations on CSV files.
calebrob6/csv-joins
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Small python script for performing different join operations on CSV files.
This script can perform left, right, inner, and full outer joins on two arbitrary CSV files. It takes two filenames, and the column names of the columns to use as the primary key values. It requires all the values in each primary key column to be unique.
usage: join.py [-h] [-v] [-t ] leftFn leftPK rightFn rightPK outputFn CSV join script positional arguments: leftFn Filename of left table leftPK Name of column to use as the primary key for the left table rightFn Filename of the right table rightPK Name of column to use as the primary key for the right table outputFn Output filename optional arguments: -h, --help show this help message and exit -v, --verbose Show output from latex and dvipng commands -t , --type Type of join (default is 'left join')
The following are examples that operate on the «a.csv» and «b.csv» files (shown below).
a,b,c,d,e 97,38,15,7,23 8,41,15,85,50 83,94,10,84,21 43,29,68,87,4 85,54,37,7,24
l,m,n,o,p 12,18,9,54,76 24,92,61,42,9 26,72,62,14,23 53,61,49,92,26 16,83,53,41,75
python join.py a.csv «a» b.csv «m» output.csv -t left
a,b,c,d,e,l,m,n,o,p 97,38,15,7,23,null,null,null,null,null 8,41,15,85,50,null,null,null,null,null 83,94,10,84,21,16,83,53,41,75 43,29,68,87,4,null,null,null,null,null 85,54,37,7,24,null,null,null,null,null
python join.py a.csv «a» b.csv «m» output.csv -t right
a,b,c,d,e,l,m,n,o,p null,null,null,null,null,12,18,9,54,76 null,null,null,null,null,24,92,61,42,9 null,null,null,null,null,26,72,62,14,23 null,null,null,null,null,53,61,49,92,26 83,94,10,84,21,16,83,53,41,75
python join.py a.csv «a» b.csv «m» output.csv -t inner
a,b,c,d,e,l,m,n,o,p 83,94,10,84,21,16,83,53,41,75
python join.py a.csv «a» b.csv «m» output.csv -t full
a,b,c,d,e,l,m,n,o,p null,null,null,null,null,12,18,9,54,76 43,29,68,87,4,null,null,null,null,null null,null,null,null,null,53,61,49,92,26 null,null,null,null,null,26,72,62,14,23 8,41,15,85,50,null,null,null,null,null 83,94,10,84,21,16,83,53,41,75 null,null,null,null,null,24,92,61,42,9 97,38,15,7,23,null,null,null,null,null 85,54,37,7,24,null,null,null,null,null
About
Python script for performing different join operations on CSV files.
How to combine CSV files using Python?
Often while working with CSV files, we need to deal with large datasets. Depending on the requirements of the data analysis, we may find that all the required data is not present in a single CSV file. Then the need arises to merge multiple files to get the desired data. However, copy-pasting the required columns from one file to another and that too from large datasets is not the best way to around it.
To solve this problem, we will learn how to use the append , merge and concat methods from Pandas to combine CSV files.
Combining Multiple CSV Files together
To begin with, let’s create sample CSV files that we will be using.
Notice that, all three files have the same columns or headers i.e. ‘name’, ‘age’ and ‘score’. Also, file 1 and file 3 have a common entry for the ‘name’ column which is Sam, but the rest of the values are different in these files.
Note that, in the below examples we are considering that all the CSV files are in the same folder as your Python code file. If this is not the case for you, please specify the paths accordingly while trying out the examples by yourself.
All the examples were executed in a Jupyter notebook.
Different Ways to Combine CSV Files in Python
Before starting, we will be creating a list of the CSV files that will be used in the examples below as follows:
import glob # list all csv files only csv_files = glob.glob('*.<>'.format('csv')) csv_files
['csv_file_1.csv', 'csv_file_2.csv', 'csv_file_3.csv']
Method 1: append()
Let’s look at the append method here to merge the three CSV files.
import pandas as pd df_csv_append = pd.DataFrame() # append the CSV files for file in csv_files: df = pd.read_csv(file) df_csv_append = df_csv_append.append(df, ignore_index=True) df_csv_append
The append method, as the name suggests, appends each file’s data frame to the end of the previous one. In the above code, we first create a data frame to store the result named df_csv_append. Then, we iterate through the list and read each CSV file and append it to the data frame df_csv_append.
Method 2: concat()
Another method used to combine CSV files is the Pandas concat() method. This method requires a series of objects as a parameter, hence we first create a series of the data frame objects of each CSV file and then apply the concat() method to it.
import pandas as pd df_csv_concat = pd.concat([pd.read_csv(file) for file in csv_files ], ignore_index=True) df_csv_concat
An easier-to-understand way of writing this code is:
l = [] for f in csv_files: l.append(pd.read_csv(f)) df_res = pd.concat(l, ignore_index=True) df_res
Both the above codes when executed produce the same output as shown below.
Notice that the resulting data frame is the same as that of the append() method.
Method 3: merge()
The merge method is used to join very large data frames. A join can be performed on two data frames at a time. We can specify the key based on which the join is to be performed.
It is a good practice to choose a key that is unique for each entry in the data frame, in order to avoid duplication of rows. We can also specify the type of join we wish to perform i.e. either of ‘inner’, ‘outer’, ‘left’, ‘right’ or ‘cross’ join.
We need to first read each CSV file into a separate data frame.
import pandas as pd df1 = pd.read_csv('csv_file_1.csv') df2 = pd.read_csv('csv_file_2.csv') df3 = pd.read_csv('csv_file_3.csv')
Joining df1 and df2:
df_merged = df1.merge(df2, how='outer') df_merged
Joining df1 and df3 based on the key ‘name’.
df_merged = df1.merge(df3, on="name", how='outer') df_merged
df1 and df3, both have an entry for the name ‘Sam’ and the age and score values for both of them are different. Hence, in the resulting data frame, there are columns for representing the entries from both df1 and df3. Since John and Bob are not common in the data frames df1 and df3, their values are NaN wherever applicable.
Conclusion
In this article, we learned about the Pandas methods namely concat, merge and append and how to use them to combine CSV files using Python.