I have started releasing the COBOL data conversion tools to Sourceforge. Cobol2dms.py is not included in the package yet, but I plan to include it within the next few weeks. The project is located here.
Archive for the ‘Django’ Category
pyCOBOL is now available on Sourceforge
July 6, 2010Troubleshooting Django SQL & wildcards in QuerySet filters
July 1, 2010This post uses the MySQL Sample database. The following items are just to setup a quick test environment for demonstration purposes.
Imports the database:
c:\> mysql -u root -p < c:\work\sampledatabase.sql
Set the DATABASE_NAME in Django settings.py
c:\> notepad settings.py DATABASE_NAME = 'classicmodels'
Automatically builds all the Django models from the classicmodels database
c:\> python manage.py inspectdb >> models.py
Loads the Python shell with the Django environment variables
c:\> python manage.py dbshell
Troubleshooting database queries in Django
connection.queries contains a list of the SQL statements run by Django, including auto-generated SQL from Django filters, etc.
>>> from myproject.myapp.models import *
>>> from django.db import connection
>>> from pprint import pprint
>>> pprint(list(Products.objects.values_list('productname').filter(productname__icontains='Harley')))
[(u'1969 Harley Davidson Ultimate Chopper',),
(u'2003 Harley-Davidson Eagle Drag Bike',),
(u'1936 Harley Davidson El Knucklehead',)]
>>> print connection.queries[-1]['sql']
SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE %Harley% LIMIT 21
Issues with wildcards in Django filters
The Django docs state that it handles the wildcards for you, so you don’t need to worry about them. What it should say is that you don’t need to worry about them if you’re not allowing wildcards in your filters. The % and _ are automatically escaped by Django and will match the literal % and _ characters, they will not be treated like standard SQL wildcard characters. I haven’t found a way to un-escape them.
>>> Products.objects.values_list('productname').filter(productname__icontains='H%y')
[(u'1969 H%y',)]
>>> print connection.queries[-1]['sql']
SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE %H\%y% LIMIT 21
>>>
c:\> python manage.py dbshell
mysql> use classicmodels;
mysql> SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE '%h\%y%' LIMIT 21;
+-------------+
| productName |
+-------------+
| 1969 H%y |
+-------------+
1 row in set (0.00 sec)
mysql> SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE '%H%y%' LIMIT 21;
+---------------------------------------+
| productName |
+---------------------------------------+
| 1969 Harley Davidson Ultimate Chopper |
| 2003 Harley-Davidson Eagle Drag Bike |
| 1957 Chevy Pickup |
| 1998 Chrysler Plymouth Prowler |
| 1936 Harley Davidson El Knucklehead |
| 18th Century Vintage Horse Carriage |
| 1948 Porsche Type 356 Roadster |
| 1970 Chevy Chevelle SS 454 |
| 1966 Shelby Cobra 427 S/C |
| 1928 British Royal Navy Airplane |
| 18th century schooner |
| 1958 Chevy Corvette Limited Edition |
| 2002 Chevy Corvette |
| 1992 Porsche Cayenne Turbo Silver |
| 1936 Chrysler Airflow |
| 2002 Yamaha YZR M1 |
| The Mayflower |
| HMS Bounty |
| The Queen Mary |
| 1969 H%y |
+---------------------------------------+
20 rows in set (0.00 sec)
mysql> SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE '%Harl_y%' LIMIT 21;
+---------------------------------------+
| productName |
+---------------------------------------+
| 1969 Harley Davidson Ultimate Chopper |
| 2003 Harley-Davidson Eagle Drag Bike |
| 1936 Harley Davidson El Knucklehead |
+---------------------------------------+
3 rows in set (0.00 sec)
>>> Products.objects.values_list('productname').filter(productname__icontains='Harl_y')
[]
If you want to allow % or _ wildcards in a search you’ll need to build a work-around using QuerySet methods like filter(regex__fieldname), extra(where=’???’), or raw(‘SQL statement’). Here is a sample of allowing % as a wildcard to search a description field using a QuerySet regex filter method. By default I try to use the contains filter option first since it appears to be faster than regex:
import re
from myproject.myapp.models import *
MIN_NUM_CHARS = 3
LEGAL_SEARCH_CHARS_RE = re.compile(r'[\w%, ]')
def search_desc(search_str):
model, kwds = MyModel(), {}
if len(search_str) >= MIN_NUM_CHARS:
if '%' in search_str:
search = '%{0}%'.format(search_str.strip('%'))
search = ''.join([ i.replace('%', '.*') for i in search
if LEGAL_SEARCH_CHARS_RE.match(i) ])
kwds.update({'desc__iregex': search})
else:
kwds.update({'desc__icontains': search})
return model.objects.filter(**kwds)
A better method is to use the Django search filter
For example:
Entry.objects.filter(desc__search="+Django -jazz Python")
Django isn’t just for web frameworks, it’s great for database scripting
June 27, 2010Thanks to Django, within a couple short weeks we were able to literally convert scores of tangled Copybooks files, thousands of fields, and millions of inter-agency records over to our new LDAP role-based data viewer. The challenge was not in creating Cobol & ADABAS data parsers and importing data, but in creating loosely coupled command-line tools that allowed for detailed data analysis when Copybooks don’t match actual data, when data files are truncated, when data records are littered with non-printable characters, when cryptic fields names hide the true functions, i.e. ‘flag-1′, ‘flag-2′, ‘flag-3′. With free-form text fields and 30-years of code mutation things get interesting: Several free-form fields were repurposed in creative ways during different time periods
… and no, I’m not just telling fish stories.

Note: The models.py module shown in the above diagram ties into post-data-conversion functions, but for the sake of this post I have limited it to just project phase #1 data-conversion functions.
In the coming months I’ll blog about some of the pre-existing tools we used, and then open-source the Django tools we developed for the project. We kept our solution very simple and de-coupled from application specific details; learning and repurposing the tools should be a cinch.
Django Memory Error – How-to work with large databases
June 26, 2010If you’re working with resource intensive operations on large databases in Django, chances are you’ll encounter a ”Memory Error”. I’ve seen this error numerous times over the last couple of weeks as part of a large data conversion project and wanted to share some the tips I learned. Number one, DON’T PANIC!!! It’s not Django’s fault, Django’s just doing some caching for you to speed things up. This is a good thing, seriously… you want Django optimized for performance by default since most resource intense operations are one-off tasks, like data imports, bulk updates, special ad-hoc reports, etc. In these types of tasks all we need to do is slow Django down a bit and ease up on the memory, here’s how…
- Use MyModel.objects.all().iterator() instead of MyModel.objects.all()
- If you don’t need all of the fields in a query use values_list
- Use the Memory efficient Django Queryset Iterator, written by Thierry Schellenbach
- set DEBUG = False in your settings.py file. If DEBUG is True, then Django saves a copy of every SQL statement it has executed. See Django FAQ: Why is Django leaking memory? (spoiler: Django’s not really leaking memory).
Ranked in the order of highest performance, but also in the order of most memory intensive.
- MyModel.objects.all() – fastest, for standard queries
- MyModel.objects.all().iterator() – medium, for large queries
- queryset_iterator(MyModel.objects.all()) – slowest, for monster queries
One size doesn’t fit all, use the best method for your particular task. By the way, the Django Snippet: “Memory Efficient Django Queryset Iterator” is solid. I’ve been using it a lot over the past couple of weeks and been validating the results along the way – works like a champ
Windows quick & easy Django + MySql installation recipe
May 28, 2010The following instructions are written for Windows Django development installations. Have fun…