Matlab LogoIf there’s one thing that computers are good at, it’s performing a task many times over. In my experience, being able to write code that automates a process is a gigantic time saver. In this tutorial, we will discuss how to take a specific process, and how to loop it effectively so that the process can be repeated as many times as necessary.

Contents

The Task: Renaming 265 Files

My boss recently gave me the less than enviable task of renaming 265 files. I was told to append each filename with “_test_data”. So for example, if the original data file was called, water_02252009.txt, it was supposed to be renamed to water_02252009_test_data.txt.

You can imagine that doing such a thankless task would require a lot of time if it were done manually. Luckily, MATLAB is available for sticky situations such as these.

Do it Once, but Do it Perfectly

The power of computers allows us to replicate a task as many times as we need it. Thus, it’s important to create a function that can do the desired task to a single file, and it’s equally important to make sure that it works perfectly. Let’s write a function that will append a given file name with our desired text.

function processFile(fileName,inputDir,outputDir)
%fileName is the name of the input file name
%inputDir is the directory where the input file is located
%outputDir is the directory where we'll put the renamed file

%the fullfile command makes sure that the file path is created correctly with
%the appropiate number of "/"
fullFilePath_input = fullfile(inputDir,fileName);

%get the different parts of the file name
[pathstr,name,ext] = fileparts(fullFilePath_input);

%append the new file name
newFileName = [name 'text_data.' ext];

%create the full path of the output file name
fullFilePath_output = fullfile(outputDir,newFileName);

%create a copy in the output folder, so that the original
%file remains intact
copyfile(fullFilePath_input,fullFilePath_output)

If you noticed, the CD command was never used throughout the code. There is no need to change directories if you use the absolute path name of a file, instead of the relative path name. The code would have been just fine if we had used the CD command, but I find that it can be sloppy and unnecessary. More on this in a later post!

The Input and Output Directory

Since the input and output directory are two of the inputs to the function we just wrote, we’re going to need a way to query the user for this information.

This part is easily taken care of with the following two commands:

%prompt the user to select the input directory
inputDir= uigetdir;

%prompt the user to select the output directory
outputDir = uigetdir;

The DIR Command

Next, we need to get a list of the files within the input directory. Let’s assume that we have the following files located in some folder:

water_02212009.txt
water_02222009.txt
water_02232009.txt
water_02242009.txt
water_02252009.txt
water_02262009.txt
.
.
.
water_08142009.txt

Let’s say that we wanted to process each file within that folder.

%obtain a structure that contains all the text files within the directory
%pwd is the present working directory
fileList = dir(pwd,'*.txt');

%the fileList structure will only take the text files that start with "water"
%if we do the following:
fileList = dir(pwd,'water*.txt');

%this gives added flexibility in your file selection

Creating the Wrapper

Now that we have all the inputs to our processFile function, we’re ready to complete the rest of the code. The rest of the code is referred to as the “Wrapper”, as it’s the high level code used to run the processFile function. Writing a wrapper function is convenient because it can be used again in the future, saving you from re-inventing the wheel. Let’s say your boss decided that he wanted you to do 10 different operations to each of the 265 files. With the wrapper function intact, it’s just a matter of writing some more modular functions that can perform each of those 10 operations.

My wrapper function looks like this:

function wrapperFunction

%prompt the user to select the input directory
inputDir= uigetdir;

%prompt the user to select the output directory
outputDir = uigetdir;

%pull the list of files from the input directory
fileList = dir(inputDir,'water*.txt');

for x=1:length(fileList)

    %get the filename of the first file within the input directory
    fileName = fileList(x).name;

    processFile(fileName,inputDir,outputDir);
end

Alternative: The UIGETFILE Command

Another great way to obtain a list of files that you want to process is by using the UIGETFILE command. It has a cool feature called “multiselect” which allows you to select as many files as you desire.

[files2get path2get] = uigetfile('*.txt','MultiSelect','on');

This is an extremely flexible command that lets the user choose which files to process. If you want to have greater control on which files are selected, this is probably your best bet. It should be noted that the list of file names that are returned by this command will be stored in a cell array. In addition, if only ONE file is selected, than the data will NOT be stored as a cell array, but as a string, so be careful!

Next Time

Did you notice that the command cd was never used? In my experience, using this command leads to more trouble than it’s worth. By using full file paths, you can prevent a lot of heartache in the future. More to come on that topic next time!