Matlab - Parsing Data Files With Header Content
25 Oct 2007 Daniel Sutoyo 37 comments 8,077 views
Part 1: Preparing Data Set to be Read Into Matlab
Often times in Matlab we would like to read data from a *.txt file. The importdata function is convenient as long as you don’t have text or an inconsistent number of columns in your data set. However, if you’re dealing with large volumes of data, it is inconvenient to delete the header content by hand. One option is to use php to automatically remove the contents in all your files. However, most data sets that are generated from data acquisition systems will usually put header content that provide key information on measurement parameters such as: number of points, sampling rate, etc. Thus, we need to be able to extract this key information as well! In this tutorial I will show you how you can easily read in data files with header content.
Suppose your data file looks something along the lines of
1 2 3 4 5 6 7 8 9 10 11 | n := 9 d := 5 k := 4 param p:1 2 3 4 5 := 1 0.4 0.5 0.6 0.12 0.6 2 0.4 0.5 0.6 0.12 0.6 3 0.1 0.2 0.4 0.22 0.3 4 0.2 0.3 0.3 0.12 0.6 5 0.3 0.4 0.2 0.42 0.4 6 0.2 0.5 0.6 0.12 0.6 ... data set continues |
n = number of data points, d = number of data columns, and k = whatever other parameters you might need. You can read in the data using importdata on the input file as it is, but be warned that Matlab will not store the data as you would expect. Matlab will read in the data and store it like a matrix and treat the whole thing as a matrix, including your text! So the way to get around this is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | % Read in File id = fopen('filename.txt'); % Read in Header Content for i=1:3 readin = fgetl(id); para(i) = str2num(readin(6:length(readin))); end fgetl(id); % Read in Data for i=1:para(1) data(i,:) = str2num(fgetl(id)); end fclose('all') |
Don’t worry, I’ll explain how the code works now.
id = fopen('filename.txt');
The function fopen creates a file identifier (numerical name tag) that Matlab uses for its functions. It essentially “opens” the file within Matlab, allowing Matlab to start reading in the file. The file is read in line by line using the function fgetl. You can remeber it as “get line”. To use the fgetl function, just call the function on the file identifier value of the input file. In this case, the file identifier is id. Each time you call the fgetl function, Matlab will automatically move onto the next line in the file, so that you don’t keep reading in the same line!
37 Responses to “Matlab - Parsing Data Files With Header Content”
Leave a Reply
Include MATLAB code in your comment by doing the following:
<pre lang="MATLAB">
%insert code here
</pre>


Also, if you don’t know how many rows your numerical data set consists of, you can use a while loop instead of a for loop.
r = 1
while 1
readin = fgetl(id);
if readin == -1 break; end
data(r,:) = str2num(readin);
r = r+1
end
I can only say
You Made My day
thanx a lot
Hi,
your matlab help is super useful for me. thank you.
Q: i have a file that has a row of text every 1000 lines of numeric data. looks like that:
text
int1
int2
…
int1000
text
int1
int2
…
int1000
text
etc etc.
if there a way to read the last line of text (it has useful information) and get numeric data. well, i’m sure there is a way, i’m wondering if you know how to do it.
thank you
for the previous post-
i forgot to mention a few things.
1. the last line of text is not at the very end of the file
2. the original file is a .txt file
Misha,
the examples in the tutorial should lay out the framework you need… Since you already know there are 1000 numeric lines, this makes it straightforward
nblocks = # of txt + 1000 lines (if you don’t know this, change this to outer loop to while loop, and make it true when there is no empty lines read in)
for j=1:nblocks
1. call fgetl to read in text ( I don’t know how many lines or how your text are)
2. Then call 1000 times to read in numeric data
end
You can switch the order around if necessary. If you want your code to be more dynamic, you can always add ‘if’ statements to check if the content are text headers or numerical numbers.
hi
thanks for the previous postings…very helpful.
however, i have some data in a .txt file. i don’t know when the data row finish (i.e. i don’t know which row is the last row!) and i have some lines of text in between every (for instance) 10 or 20 rows of data. could you please help me with that?
cheerZ
behzad
behzad–
I’m doing something very similar, where the # of lines of numbers between headers is variable. What I do is use str2double to check if the current line is data or a string. I also put in state variables to keep track of where to put the data, ignore whitespace and empty lines, etc., but this is the core of it.
while ~feof(fid) % go until EOF
line = fgetl(fid);
if isnan(str2double(line)) % it’s a text header
disp(line); % do something with it
else % else, it’s data
disp(str2double(line)); % do something
end
end
Or if you need to input matrices, not just doubles and ints, use str2num:
while ~feof(fid) % go until EOF
line = fgetl(fid);
[x status] = str2num(line);
if ~status % it’s the text header
disp(line); % do something with it
else % else, it’s data
disp(x); % do something
end
end
Cheers!
Another Parsing Question: I have a text file where the first column of data is a string of the date (’2008-08-01 12:00:00′). I can import the data no problem but then I want to make this array into 6 different numerical arrays. I can’t seem to be able to figure out how to do this. How do I split a text array of time stamps into a year, month, day, hour, min, and sec numerical arrays? Any help will be greatly appreciated.
Hi Ellen
It sounds like you have quite a few dates in that format.
one quick way is to simply use indices
Now if your date format changes for some reason, or not all the values are in the same dimension you would probably use some regexp which is a little bit more complicated.
Thanks for the tutorial.
Is there any tutorial describing the fileopen-fileclose procedures in basic level?
great!
@ Ellen and Dan,
For the parsing of the date/time string, what about applying textscan? Have I mentioned how much I like that function 8^)
Something like this:
… will give you an array ‘a’ where the elements are year, month, day, hour, minute, second. If the number of elements per line changes, you just need to adjust how many ‘%f’ s there are.
HTH,
Rob
Text scan owns! LONG LIVE MATLAB!
I copied your code and saved it and ran but it gives me the following error
In an assignment A(I) = B, the number of elements in B
and
I must be the same.
Error in ==> process at 7
para(i) = str2num(readin(6:length(readin)));
I am sorry, I did a mistake. It is working now
sumit,
are you using the sample text file as well?
the problem lies with the number of characters in your readin does not match with your index i. It can be easily fixed by designating right number of lines with your index i
np
I have files with header rows and data with date and time in the first columns. e.g.
15-10-2008 15:20:16 10 -5 3 8
I can use Daniel’s code to solve the header problem, but then have problems reading the data. I tried textscan and it generally works on the date and time columns. But the delimiter in the date string is - and therefore textscan considers the minus of negative numbers as a delimiter and turns negative numbers in the data into positive ones.
I would like to use it as a loop in a script to process several hundret files at a time. Therefore importing manually is not an option.
Has anybody an idea how I can get around this?
Cheers,
Maike
Can’t say for certain, but maybe you can make the delimiter a space?
hi
can i ask how i can use parsing data when the text file is in this form
157.557 -333.854 -16.455 -409.143 -339.403 0.168
157.600 -333.897 -16.497 -408.748 -339.045 0.169
157.570 -333.928 -16.490 -408.781 -339.440 0.173
157.539 -333.824 -16.460 -409.163 -339.866 0.179
………………………………………………………………….
thx
@ Quan Quach
I solved it. Thanks.
I used fscanf instead of textscan which allows me to type delimiters in the format definition.
Hi ,
I have excel file which contain 10 columns and 1000 rows. I am trying to write a program such that it will read all data one row by one row and will display the same one row y one row automatically once command is initiated can any body help me.
Hello,
I am trying to read in large data files but only take specific data. I have 52 lines of initial file header which I want to ignore. Then I have an initial header for 34 rows of data which I want to ignore where the initial header is like the following:
000 10 15 00 72 RELHUM_2 2mg
The 34 rows of subsequent data look like:
78.99 86.17 85.88 85.22 85.50 92.87 96.86 96.58 100.00 96.73
…
93.16 87.24 93.65
So, the last row is not complete.
These 35 lines are repeated for different data parameters. I want to take the 2nd, 3rd, 4th, 17th, 18th, and 20th set of 35 rows of data (minus the header row) and store them.
Does anyone have any suggestions?
I am completely stuck. THANK YOU IN ADVANCE!
@dizz: use fopen to open the txt file, and use fgetl() to get the first four lines only
@Hrishi: fgetl() does read the row line by line. You can put use display() and pause() so you can see them on your output screen
@Cara: Use fopen() open file, use fgetl() to read the lines.
for i= 1:#lines_I_want_to_skip
fgetl()
end
Now the tricky part is how to get those specified sets of data. One way to do it base on the functions mentioned in the tutorial. I will just give you the overall structure.
Thanks
Can any one help?
Glad to see the tutorials and helping to whoever who needs help.
Hi, I have to read a txt file with different types of inputs. Actually it is to be used for draw a graph. So those datas are repersent for vertexes , edges and names of vertexes. Text file will be look like as below. (remark: ( <<<< notes ) are not apart of text file )
5 (<<<< total # of vertexes) …. want to keep it in “v”
7 (<<<< total # of edges) ….. want to keep it in “e”
1 a (<<< name of vertexes start from this line , total lines = # of vertexes)
2 c ……….. want to keep them in “nodesID”. any idea to keep as it’s relations?
3 e
4 b
5 z
1 2 (<<< edges .. ( n by 2 array ) total lines = # of edges)
3 4 …………… want to keep them in “nodelist” ( n by 2 array )
3 5
1 4
2 5
2 4
1 5
I have to read above data and want to keep them in associated variables. Frankly saying that I am a beginner for matlab. So please give me some idea along with codes.
Thanks to everyone.
Nay Zaw
Hi, I forgot one thing to mention about name of vertexes. It will be string, not a char. By the way, I’ve tried according to the tutorial shown. I just could read the first two lines rightly. And there was also a problem when I tried to call this read file from main program by passing “myfile.txt”. But It itself did work.
Thanks for the tutorials and helps.
Nay Zaw
try textscan or importdata
Hi Anonymous,
Thanks you for your advice. I guess that importdata method needs uniform table. textscan method might work. I have been trying now.
Thank you.
Hi,
What is the best way to import and parse this this text file in Matlab?
$GPRMC,141801.00,A,4654.13098,N,09648.49429,W,0.010,,140409,,,D*62
$GPVTG,,T,,M,0.010,N,0.018,K,D*2E
$GPGGA,141801.00,4654.13098,N,09648.49429,W,2,12,0.78,284.9,M,-27.5,M,,*67
$GPGSA,A,3,24,29,30,48,51,02,26,10,21,18,15,16,1.35,0.78,1.10*02
$GPGSV,4,1,15,01,11,214,42,02,17,094,42,07,01,034,38,10,56,058,48*7E
$GPGSV,4,2,15,15,31,149,43,16,11,325,41,18,08,228,38,21,33,283,46*7A
$GPGSV,4,3,15,24,76,319,48,25,01,015,35,26,20,158,42,29,73,250,49*77
$GPGSV,4,4,15,30,12,218,39,48,26,225,41,51,35,194,44*4A
$GPGLL,4654.13098,N,09648.49429,W,141801.00,A,D*72
$GPRMC,141802.00,A,4654.13094,N,09648.49427,W,0.009,,140409,,,D*6B
$GPVTG,,T,,M,0.009,N,0.017,K,D*29
$GPGGA,141802.00,4654.13094,N,09648.49427,W,2,12,0.78,284.9,M,-27.5,M,,*66
$GPGSA,A,3,24,29,30,48,51,02,26,10,21,18,15,16,1.35,0.78,1.10*02
$GPGSV,4,1,15,01,11,214,42,02,17,094,42,07,01,034,38,10,56,058,48*7E
$GPGSV,4,2,15,15,31,149,43,16,12,324,41,18,08,228,38,21,33,283,46*78
$GPGSV,4,3,15,24,76,319,48,25,01,015,35,26,20,158,42,29,73,250,49*77
$GPGSV,4,4,15,30,12,218,39,48,26,225,41,51,35,194,44*4A
$GPGLL,4654.13094,N,09648.49427,W,141802.00,A,D*73
To read a txt file with different types of inputs. Text format is described as above or you can see in code.
Hey QQ
thanks!
Could you tell me how to pass handles through a function ?
I am looking at an 14 bit image file (2048 x 2048) saved as a .txt file.
using tic and toc using your code works to get rid of the 1 line header however, the time is 21.45 seconds.
Removing the header manually inside my windows directory and then using the load function takes
5.1 seconds.
is there any change I can just dump the first line and “load” the remainder of my .txt file (i.e. increasing the speed)?
this code only took 1.6 sec!! me happy now!!
Thanks a lot! very useful tutorial.
I have a text file with the format:
HIS
1
[Stimulus Def'n]
0
8296 8296
0 0
0 0
0 0
0 0
[Period,Lag,Duration,Count_Window,Rise_Time,Presentations]
250 10 75 150 5 250
[Tickle_type,freq,atten,timing]
0 8296 29 0
[IPD,incrment,fixed,incremented]
0 0 0 0
[ILD,incrment,incremented]
0 0 0
[Attenuation,increment,incremented]
29 0 0
[Ear,0=L,1=R,2=Both]
0
[Response_to_stimulus1]
426 Spikes
0
….4356….
0
UNIT INFORMATION
0.1
STIMS PRESENTED
250
Where I want to extract the list of date after the word ‘Spikes’ and stop at ‘UNIT INFORMATION’. The 0…4356…0 is variable from 100-2000 lines with values from 0 to 150000.
I can read in the txt file but parsing the headers & footers with the while loop is above my knowledge. Can you help please?
hi
does anyone know how to parse other formats of files which contain text and data..such as a *.dbc file??? what type of function could be written for the same.??