etlutils package

Module contents

etlutils root contains utility functions that do not currently have a home in any submodules.

etlutils.dict_to_ordereddict(unordered_dict)

Converts a dict into an OrderedDict object sorrted by key

gets the keys from the dict, sorts them and then inserts them in order into a new OrderedDict object

Parameters:unordered_dict – the dictionary to sort
Returns:An OrderedDict object with the keys/values from the input in key order
etlutils.get_interactive_history()

Prints all the history from the Python interpreter

Submodules

etlutils.datafiles module

etlutils.datafiles contains methods useful for producing and consuming data for structured data

etlutils.datafiles.dump_to_daily_json_file(directory, year, month, day, data, datatype='')

saves data to a daily file stored in JSON

this function will create the directories if needed and will set up the filename in a consistent way (using the get_daily_file_path method). the format is of the structure <directory>/<year>/<year>-<month>-<day>/<datatype>.<file_extension>

Parameters:
  • directory – the directory under which the path to the day is created
  • year – the integer year to append, will expand to four digits if fewer, does not validate
  • month – the integer month to append, will expand to two digits if fewer, does not validate
  • day – the integer day to append, will expand to two digits if fewer, does not validate
  • data – a JSON serializable data structure
  • datatype – the root of the filename
Returns
The path to the filename that was created
etlutils.datafiles.dump_to_monthly_json_file(directory, year, month, data, datatype='')

saves data to a monthly file stored in JSON

will create the directories if needed and set up the filename in a consistent way. doesn’t do any error checking, so empty strings are not illegal, but will produce crummy filenames

the format of the path is <directory>/<year>/<datatype>_<year>-<month>.json

Parameters:
  • directory – the directory under which the file will be created
  • year – the integer year to append, will expand to four digits if fewer, does not validate
  • month – the integer month to append, will expand to two digits if fewer, does not validate
  • data – a JSON serializable data structure
  • datatype – the root of the filename
Returns:

The path to the file that was created

etlutils.datafiles.dump_to_yearly_json_file(directory, year, data, datatype='')

saves data to a yearly file stored in JSON

will create the directories if needed and set up the filename in a consistent way, doesn’t do any error checking, so empty strings are not illegal, but will product crummy filenames

the format of the path is <directory>/<datatype>_<year>.json

Parameters:
  • directory – the directory under which the file will be created
  • year – the integer year to append, will expand to four digits if fewer, does not validate
  • data – a JSON serializable data structure
  • datatype – the root of the filename
Returns:

The path to the file that was created

etlutils.datafiles.find_newest_saved_month(directory, end_year, datatype='')

finds the last saved file for data stored by month

walks backwards in time starting from the current date looking for the last saved data by month

Parameters:
  • directory – the directory under which to look
  • end_year – the year to end at
  • datatype – the root of the filename
Returns:

The year and month of the last saved file, if end_year is reached, None, None is returned

etlutils.datafiles.get_daily_file_path(directory, datatype, year, month, day, file_extension='json', make_directories=True)

puts together a path to a file for data that is saved by day

will create the directories if needed and set up the filename in a consistent way. doesn’t do any error checking, so empty strings are not illegal, but will produce crummy filenames

the format is of the structure <directory>/<year>/<year>-<month>-<day>/<datatype>.<file_extension>

Parameters:
  • directory – the directory under which the file will be created
  • datatype – the root of the filename
  • year – the integer year to append, will expand to four digits if fewer, does not validate
  • month – the integer month to append, will expand to two digits if fewer, does not validate
  • day – the integer day to append, will expand to two digits if fewer, does not validate
  • file_extension – the extention (without the .)
  • make_directories – will create directory if it does not exist when True
Returns:

A string path composed of the arguments

etlutils.datafiles.get_monthly_file_path(directory, datatype, year, month, file_extension='json', make_directories=True)

puts together a path to a file for data that is saved by month

will create the directory if needed and set up the filename in a consistent way. doesn’t do any error checking, so empty strings are not illegal, but will produce crummy filenames

the format is of the path is <directory>/<year>/<datatype>_<year>-<month>.<file_extension>

Parameters:
  • directory – the directory under which the file will be created
  • datatype – the root of the filename
  • year – the integer year to append, will expanded to four digits if fewer, does not validate
  • month – the integer month to append, will expanded to two digits if fewer, does not validate
  • file_extension – the extention (without the .)
  • make_directories – will create directory if it does not exist when True
Returns:

A string path composed of the arguments

etlutils.datafiles.get_yearly_file_path(directory, datatype, year, file_extension='json', make_directories=True)

puts together a path to a file for data that is saved by year

will create the directory if needed and set up the filename in a consistent way. doesn’t do any error checking, so empty strings are not illegal, but will produce crummy filenames

the format is of the structure <directory>/<datatype>_<year>.<file_extension>

Parameters:
  • directory – the directory under which the file will be created
  • datatype – the root of the filename
  • year – the integer year to append, will expanded to four digits if fewer, does not validate
  • file_extension – the extention (without the .)
  • make_directories – will create directory if it does not exist when True
Returns:

A string path composed of the arguments

etlutils.date module

etlutils.date consists of functions useful for manipulating or converting dates between different formats

etlutils.date.datetime_from_zulutime_string(utc_time_string)

Given a utc_time_string, create a datetime.datetime object

creates a datetime object from a utc-formatted time string

Parameters:utc_time_string – utc formatting string
Returns:a datetime.datetime object
etlutils.date.get_date_from_timestamp(timestamp, offset=0)

creates a datetime.datetime object from a timestamp with offset

Using a twitter-style timestamp and offset return a datetime object

Parameters:
  • timestamp – a unix-type timestamp
  • offset – a timezone offset in minutes from GMT
Returns:

a datetime.datetime object

etlutils.date.mkdate(datestr)

Creates a date object for a string in the format YYYY-MM-DD

useful for parsing a data from the command line or from a filename or in a file

Parameters:datestr – a string in the format “YYYY-MM-DD”
Returns:a datetime.date object
Raises:ValueError – if the string is in the wrong format