MATLAB - Batch Processing a Group of Files
09 Mar 2009 Quan Quach 18 comments 2,515 views
If there’s one thing that computers are good at, it’s performing a task many times over. In my experience, being able to write code that automates a process is a gigantic time saver. In this tutorial, we will discuss how to take a specific process, and how to loop it effectively so that the process can be repeated as many times as necessary.
Contents
- The Task: Renaming 265 Files
- Do it Once, but Do it Perfectly
- The Input and Output Directory
- The DIR Command
- Creating the Wrapper
- Alternative: The UIGETFILE Command
- Next Time
The Task: Renaming 265 Files
My boss recently gave me the less than enviable task of renaming 265 files. I was told to append each filename with “_test_data”. So for example, if the original data file was called, water_02252009.txt, it was supposed to be renamed to water_02252009_test_data.txt.
You can imagine that doing such a thankless task would require a lot of time if it were done manually. Luckily, MATLAB is available for sticky situations such as these.
Do it Once, but Do it Perfectly
The power of computers allows us to replicate a task as many times as we need it. Thus, it’s important to create a function that can do the desired task to a single file, and it’s equally important to make sure that it works perfectly. Let’s write a function that will append a given file name with our desired text.
function processFile(fileName,inputDir,outputDir) %fileName is the name of the input file name %inputDir is the directory where the input file is located %outputDir is the directory where we'll put the renamed file %the fullfile command makes sure that the file path is created correctly with %the appropiate number of "/" fullFilePath_input = fullfile(inputDir,fileName); %get the different parts of the file name [pathstr,name,ext] = fileparts(fullFilePath_input); %append the new file name newFileName = [name 'text_data.' ext]; %create the full path of the output file name fullFilePath_output = fullfile(outputDir,newFileName); %create a copy in the output folder, so that the original %file remains intact copyfile(fullFilePath_input,fullFilePath_output)
If you noticed, the CD command was never used throughout the code. There is no need to change directories if you use the absolute path name of a file, instead of the relative path name. The code would have been just fine if we had used the CD command, but I find that it can be sloppy and unnecessary. More on this in a later post!
The Input and Output Directory
Since the input and output directory are two of the inputs to the function we just wrote, we’re going to need a way to query the user for this information.
This part is easily taken care of with the following two commands:
%prompt the user to select the input directory inputDir= uigetdir; %prompt the user to select the output directory outputDir = uigetdir;
The DIR Command
Next, we need to get a list of the files within the input directory. Let’s assume that we have the following files located in some folder:
water_02212009.txt
water_02222009.txt
water_02232009.txt
water_02242009.txt
water_02252009.txt
water_02262009.txt
.
.
.
water_08142009.txt
Let’s say that we wanted to process each file within that folder.
%obtain a structure that contains all the text files within the directory %pwd is the present working directory fileList = dir(pwd,'*.txt'); %the fileList structure will only take the text files that start with "water" %if we do the following: fileList = dir(pwd,'water*.txt'); %this gives added flexibility in your file selection
Creating the Wrapper
Now that we have all the inputs to our processFile function, we’re ready to complete the rest of the code. The rest of the code is referred to as the “Wrapper”, as it’s the high level code used to run the processFile function. Writing a wrapper function is convenient because it can be used again in the future, saving you from re-inventing the wheel. Let’s say your boss decided that he wanted you to do 10 different operations to each of the 265 files. With the wrapper function intact, it’s just a matter of writing some more modular functions that can perform each of those 10 operations.
My wrapper function looks like this:
function wrapperFunction %prompt the user to select the input directory inputDir= uigetdir; %prompt the user to select the output directory outputDir = uigetdir; %pull the list of files from the input directory fileList = dir(inputDir,'water*.txt'); for x=1:length(fileList) %get the filename of the first file within the input directory fileName = fileList(x).name; processFile(fileName,inputDir,outputDir); end
Alternative: The UIGETFILE Command
Another great way to obtain a list of files that you want to process is by using the UIGETFILE command. It has a cool feature called “multiselect” which allows you to select as many files as you desire.
[files2get path2get] = uigetfile('*.txt','MultiSelect','on');
This is an extremely flexible command that lets the user choose which files to process. If you want to have greater control on which files are selected, this is probably your best bet. It should be noted that the list of file names that are returned by this command will be stored in a cell array. In addition, if only ONE file is selected, than the data will NOT be stored as a cell array, but as a string, so be careful!
Next Time
Did you notice that the command cd was never used? In my experience, using this command leads to more trouble than it’s worth. By using full file paths, you can prevent a lot of heartache in the future. More to come on that topic next time!
18 Responses to “MATLAB - Batch Processing a Group of Files”
Leave a Reply
Include MATLAB code in your comment by doing the following:
<pre lang="MATLAB">
%insert code here
</pre>


How exactly is this desirable as opposed to using a simple shell command? Even provided that you’re unlucky enough to have to be doing this on a win32 platform, wouldn’t it save you far more pain in the long run to install a sane command shell rather than use a math software for file maintenance?
Again, how exactly is this desirable to:
“Those who don’t understand UNIX are condemned to reinvent it, poorly”
You are absolutely right Josh. Shell scripting would be ideal in this case. Others might choose to use Perl, Python, and lists goes on…
The post was to illustrate that this functionality can be performed in MATLAB. As MATLAB is becoming more prominent even outside of engineering field, not everyone is familiar with shell scripting, Perl, Python, etc.
This functionality paired with the easy GUI building in MATLAB, an average MATLAB user who read 3-4 tutorials on our site can quickly build a GUI to change file formats, process the data, and visualize the result all in one package. That is a convenience that many people like to have.
Hello Josh,
Thanks for the input. This tutorial used the simple example of changing file names, but it can be extended to many other tasks.
Lets say for instance, that instead of renaming each of those files, you instead had to read in the data, perform some caluclations, and then output the data into excel format. The wrapper would remain the same, but the inner function would change to reflect the new process.
This method allows for coding flexibility for a wide range of tasks.
Quan
good site! I learn a lot from you. Thank you!
I recently used the command ‘uigetfile’.You are right. There is some trick we should pay attention to.
Hey Quan,
I love modular code. But you know me, I’m a purist when it comes to keeping all the file handling stuff in “the wrapper”. This is a nice, quick way to start off new code that you know will handle multiple files.
Just FYI, I thought I’d offer up my standard wrapper below. Notice that I always use uigetfile with multiselect on. You already caught that it returns a cell array unless only one file is picked. Note how I account for this below, then all my code can be written to handle cell arrays.
HTH,
Rob
OHHHHH, elegant solution. I like it Robmeister.
Hey,
I am kinda new to MATLAB, so I am going to need some help in altering this code. If I wanted to rename files so I can sort them numerically, how would I change the code? Say like I want to have files like:
test 1.txt
test 2.txt
.
.
.
test 128.txt
This is an amazing piece of code that you guys wrote!!
I just solved the question that I asked
Heres the changes that I made:
The processFile function:
The wrapper function:
If anybody finds an error in this, please reply back here. This code may not be perfect yet.
It’s cool to see that MATLAB has this kind of flexibility for file handling, and to see an example of it… That said, though, I can’t help but mention that Z-Shell (zsh) comes with a ‘zmv’ command that can do this directly without even a loop:
autoload zmv
zmv -W “*.txt” “*_test_data.txt”
It supports multiple wildcards and can do some very complex renaming with very few keystrokes…
Why I can not use the command DIR like you said, (I am in MATLAB2008a)
D = DIR(’directory_name’) returns the results in an M-by-1
structure
fileList = dir(inputDir,’water*.txt’); %%wrong command
I think I see what you’re talking about Jia. I notice it in Arif’s post as well.
‘dir’ can only take one input hence your error. To get the desired files, I think this should work:
to replace:
HTH,
Zane
Yea, the dir command takes in only one input. I had to change it, so I noticed my code only works when the m-files are in the same folder as the original files.
@Ron
That’s a neat trick! MATLAB can be clunky at times, but learning how to file handle in MATLAB can save you from learning a whole new language!
@Arif
I’m not sure your solution would work because the file list might not be arranged in the manner that you’re expecting. Also, I think there might be some syntax errors in your processFile function, but I’d have to look closer to be sure.
Hey Quan,
So yea, this isn’t going to be for universal application.
Yea your right about the files being in a sequence to begin with. I am using this code to rename files from TV shows and other stuff in a format I like
Hey Quan,
I got inspired over the weekend and wrote a file RENaming utility for you - it’s called “REN”. I know it’s not as sleek as 2 lines of UNIX or shell code, but for those of us that do this with MATLAB - it works. I felt sorry for you renaming hundreds of files at a time, so I also put it in GUI format (with GUIDE).
Here’s a link to the file exchange where ppl can download it. Let me know if you find any bugs since my testing of it was cursory.
http://www.mathworks.com/matlabcentral/fileexchange/23401
Enjoy,
Rob
Your assignment was to append “test_data” while your code appends “text_data”
“newFileName = [name 'text_data.' ext];”
Not that it changes what the code does, just a slight inconsistency.
i have this code that creates an xml file and then writes docNode to it.
xmlFileName = fullfile(’/home/2008/aaslam1′, ‘dataOutFile.xml’);
xmlwrite(xmlFileName,docNode);
Everytime i run my matlab program it overwrites the same file.Is there any way that i can append to the existing file so that previous data is not lost.
Thanks
I was wondering if I can use the above code to do the following.
Say that there are some files like : abc0001.tif, abc0002.tif, abc0003.tif ….so on, and they all reside in a folder named “abc_imagefiles”. How can I define an array like “nimage” that will spit out the tif files like:
nimage(1) = abc0001.tif
nimage(2) = abc0002.tif
nimage(3) = abc0003.tif …………..so on.
Hope you all can help me out
Thanking in advance
Chari