Quickly parsing Cobol fixed-length data from Copybook definitions into Python lists

Here is a really simple module for converting fixed-length Cobol data into a Python list … you can find this code and related modules at:

You can pipe the results of copybook2csv.py into this module to quickly parse the data into a list, or once you already know the structure you can call parse_data(struct_fmt_string) directly. If the copybook field and actual record lengths don’t match it will still parse the data, but it will display a warning indicating that the data could be truncated or needed to be padded to fit the field definitions.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__version__ = """COBOL Fixed-length Data Parser ver 0.2
Note: This version does not work with OCCURS in Copybook files,
but is a lot faster than the varaible length data parser modules.

License: GPLv3, Copyright (C) 2010 Brian Peterson
This is free software.  There is NO warranty; 
not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
"""
USAGE = """copybook2list.py CopybookFile"""

import load
import csv, struct, sys

def parse_data(struct_fmt, lines):
    try:
      return [ struct.unpack(struct_fmt, i) for i in lines ]
    except struct.error:
        sys.stderr.write('Record layout vs. record size mismatch\n')
        size = sum([ int(i) for i in struct_fmt.split('s')[:-1] ])
        return [ struct.unpack(struct_fmt, i.ljust(size)[:size]) 
          for i in lines ]

def main(args):  
    copybook = load.csv_(args.copybook.readlines(), strip_=True)[1:]
    field_lengths = [ int(i[2]) for i in copybook ]
    struct_fmt = 's'.join([ str(i) for i in field_lengths ]) + 's'
    if args.struct:
        print struct_fmt
    else:
        for record in parse_data(struct_fmt, load.lines(args.datafile)):
            print record

if __name__ == '__main__':
    from cmd_line_args import Args
    args = Args(USAGE, __version__)
    args.allow_stdin()
    args.add_files('datafile', 'copybook')
    args.parser.add_argument('-s', '--struct', action='store_true',
        help='show structure format')
    main(args.parse())

Advertisement

Tags: , , ,

2 Responses to “Quickly parsing Cobol fixed-length data from Copybook definitions into Python lists”

  1. Ravi Says:

    Thank you very much for sharing this.

    When I tried to read data, the cobol2py does not recognize the BCD keyword (for packed decimal?) as supported. Can you please help with that?

    • bpeterso2000 Says:

      I see that the copybook parser supports it, but it appears I never got around to coding it the data parser module; probably because I don’t have any BCD fields in my data to test with. If you send me some sample BCD data, I can update the code so that it supports it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.