If there’s one thing that computers are good at, it’s performing a task many times over. In my experience, being able to write code that automates a process is a gigantic time saver. In this tutorial, we will discuss how to take a specific process, and how to loop it effectively so that the process can be repeated as many times as necessary.
Contents
The Task: Renaming 265 Files
My boss recently gave me the less than enviable task of renaming 265 files. I was told to append each filename with “_test_data”. So for example, if the original data file was called, water_02252009.txt, it was supposed to be renamed to water_02252009_test_data.txt.
You can imagine that doing such a thankless task would require a lot of time if it were done manually. Luckily, MATLAB is available for sticky situations such as these.
Do it Once, but Do it Perfectly
The power of computers allows us to replicate a task as many times as we need it. Thus, it’s important to create a function that can do the desired task to a single file, and it’s equally important to make sure that it works perfectly. Let’s write a function that will append a given file name with our desired text.
function processFile(fileName,inputDir,outputDir)
fullFilePath_input = fullfile(inputDir,fileName);
[pathstr,name,ext] = fileparts(fullFilePath_input);
newFileName = [name 'text_data.' ext];
fullFilePath_output = fullfile(outputDir,newFileName);
copyfile(fullFilePath_input,fullFilePath_output)
If you noticed, the CD command was never used throughout the code. There is no need to change directories if you use the absolute path name of a file, instead of the relative path name. The code would have been just fine if we had used the CD command, but I find that it can be sloppy and unnecessary. More on this in a later post!
The Input and Output Directory
Since the input and output directory are two of the inputs to the function we just wrote, we’re going to need a way to query the user for this information.
This part is easily taken care of with the following two commands:
inputDir= uigetdir;
outputDir = uigetdir;
The DIR Command
Next, we need to get a list of the files within the input directory. Let’s assume that we have the following files located in some folder:
water_02212009.txt
water_02222009.txt
water_02232009.txt
water_02242009.txt
water_02252009.txt
water_02262009.txt
.
.
.
water_08142009.txt
Let’s say that we wanted to process each file within that folder.
fileList = dir(pwd,'*.txt');
fileList = dir(pwd,'water*.txt');
Creating the Wrapper
Now that we have all the inputs to our processFile function, we’re ready to complete the rest of the code. The rest of the code is referred to as the “Wrapper”, as it’s the high level code used to run the processFile function. Writing a wrapper function is convenient because it can be used again in the future, saving you from re-inventing the wheel. Let’s say your boss decided that he wanted you to do 10 different operations to each of the 265 files. With the wrapper function intact, it’s just a matter of writing some more modular functions that can perform each of those 10 operations.
My wrapper function looks like this:
function wrapperFunction
inputDir= uigetdir;
outputDir = uigetdir;
fileList = dir(inputDir,'water*.txt');
for x=1:length(fileList)
fileName = fileList(x).name;
processFile(fileName,inputDir,outputDir);
end
Alternative: The UIGETFILE Command
Another great way to obtain a list of files that you want to process is by using the UIGETFILE command. It has a cool feature called “multiselect” which allows you to select as many files as you desire.
[files2get path2get] = uigetfile('*.txt','MultiSelect','on');
This is an extremely flexible command that lets the user choose which files to process. If you want to have greater control on which files are selected, this is probably your best bet. It should be noted that the list of file names that are returned by this command will be stored in a cell array. In addition, if only ONE file is selected, than the data will NOT be stored as a cell array, but as a string, so be careful!
Next Time
Did you notice that the command cd was never used? In my experience, using this command leads to more trouble than it’s worth. By using full file paths, you can prevent a lot of heartache in the future. More to come on that topic next time!