2024년 5월 21일 화요일

using binary file read/write between fortran and python

Reference: post in stackoverflow. 

Problem: Let us suppose the fortran program stores integer*4, real*8, complex*16 type of data as a binary files. How one can read the binary files in python? 


(1) In Fortran : store data as a binary file   

   In this example, an real array 'fn' with dimension(100,100) is stored in a file. 

   Because the double precision real number is 8 byte, writing 'fn' corresponds to writing 8*100*100= 80000 bytes. When writing binary file ('direct' access), one have to specify a length of data and position to write the data. 'rec_array' is a length of data and 'rec' in write command tells where the 'rec_array' length have to be written in a file. For example, rec = 3 corresponds to the position after 2*80000 bytes.  

program binary_out
implicit none
integer :: i,j,t,rec_array
double precision, dimension(100,100) :: fn
double precision, parameter :: p=2*3.1415929

INQUIRE(IOLENGTH=rec_array) fn
write(*,*) 'rec_array=',rec_array ! 100*100*8 byte
open(unit=10,file='temp.dat',status='unknown', &
     form='unformatted',access='direct',recl=rec_array)
fn=0
write(10,rec=1) fn
do t=1,3
  do i=1,100 ! filling all fn(i,j)
    do j=1,100
      fn(i,j)=sin(i*p*t/100)*cos(j*p*t/100)
    enddo
  enddo
  write(10,rec=t+1) fn ! t-th rec_array size
enddo
! rec=1-4 fn
! Thus total size of file is 80000*4 bytes
close(10)
end program binary_out

Note: Writing to disk is actually done when the buffer is full or when the file is closed. 

        To write in real-time, one have to use "flush(10)" 


(2) In python, one can open/read the file as a binary. One have to specify the type/size of the data to read (using 'np.fromfile') and position to start reading the data(using 'f.seek') . (Note that one cannot read the same position twice. One have to specify the starting position for each reading.)   


#!/usr/bin/env python
import scipy
import glob
import numpy as np
import matplotlib.pyplot as plt
import os, sys
from pylab import *

# binary file is written as (8byte real)*(100*100 elements)*(4 cycles)
def readslice(inputfilename,field,nx,ny,timeslice):
   f = open(inputfilename,'rb') # open as binary
   f.seek(8*timeslice*nx*ny) # find starting byte position for reading
   field = np.fromfile(f,dtype='float64',count=nx*ny) # 8byte = 64 bit
   field = np.reshape(field,(nx,ny))
   f.close()
   return field

a=np.dtype('d')
a=readslice('temp.dat',a,100,100,1)
print(a[0])
im=plt.imshow(a)
plt.show()

In other words, the binary file have to be write/read by specifying the starting position of the data and length and type of the data. ( Since binary is a collection of '0' and '1', same binary can be read in different ways depending on the length and format.) 

It is important to know the exact type/order of the data to read/write. 


Note: In case of multi-dimensional array from fortran, reshaping could be different from original one because of the convention difference. To have the same indexing, one can use '"order='F' " in the np.reshape. For example, fortran wrote a array 'f(1:Nx,1:Ny,1:Nz)' to get the same array in python 'f(0:Nx-1,0:Ny-1,0:Nz-1)', use 

a = np.fromfile(f,dtype='float64',count=Nx*Ny*Nz)
a = np.reshape(a,(Nx,Ny,Nz),order='F')

In case of c, order='C'

댓글 없음:

댓글 쓰기