OFAC compliance with ElasticSearch

19 01 2015

The mission: To create a search engine capable of detecting possible matches from our contact entry from with the OFAC Specially Designated Nationals List (SDN).

The background: I was hired to develop a highly compliant software that tracks and facilitates the movement of funds between the U.S. and Israel. Recently, the U.S. government has taken a hard-line against corporations moving US dollars offshore on behalf of US citizens. Compliance demands that customers wishing to move funds must be checked against the OFAC SDN list, as well as the new consolidated non-SDN list which contains smaller lists such as the Palestinian Legislative Council (NS-PLC) list.

OFAC does not provide an API for querying this list, but has launched a search page alongside their publication (and regular updates) of plain text files. The search page description provides adequate details regarding their matching algorithms and basic expectations, those being character and word matching, and phonetics. Phonetics allows for basic spelling mistakes or discrepancies, particularly when dealing with non-English names.

Given our development phase and low starting volume, I developed the front-end and accessed a free API available online. Our testing, however, showed that the fuzzy logic implemented was not sufficient for our purposes (and perhaps other issues such as alias matching, list updates, closed source). A cup of coffee and some Google/Stackoverflow searches later, and I was sold on ElasticSearch.

Getting started: I put my frugality aside and decided to test development on a new AWS instance. I was happy to find that Bitnami had just the machine image I needed to get up and running in a snap.

For some reason, Pageant wasn’t working for me, so I loaded my private key directly into PuTTY and was able to log in with the “bitnami” user.
bitnami ppk

Going a bit out of development order, let’s first create our script to download the text files:

vi /opt/bitnami/elasticsearch/scripts/fetch_ofac.sh

#! /bin/sh
cd /opt/bitnami/elasticsearch/scripts/
wget http://www.treasury.gov/ofac/downloads/sdn.csv -O sdn_prim.csv
wget http://www.treasury.gov/ofac/downloads/add.csv -O sdn_add.csv
wget http://www.treasury.gov/ofac/downloads/alt.csv -O sdn_alt.csv
wget http://www.treasury.gov/ofac/downloads/sdn_comments.csv -O sdn_comments.csv
wget http://www.treasury.gov/ofac/downloads/consolidated/cons_prim.csv -O cons_prim.csv
wget http://www.treasury.gov/ofac/downloads/consolidated/cons_add.csv -O cons_add.csv
wget http://www.treasury.gov/ofac/downloads/consolidated/cons_alt.csv -O cons_alt.csv
wget http://www.treasury.gov/ofac/downloads/consolidated/cons_comments.csv -O cons_comments.csv
echo "Done fetching OFAC files"

We are downloading the files right into the ES scripts folder, and renaming some so we get a consistent “list_type.ext” file name.

Configuring ElasticSearch

The first step was installing the plugins I needed. Namely, the CSV-river plugin and the phonetic plugin. The first will allow us to run our download script and index our CSV files, and repeat as often as we need. The latter lets us to conduct searches using our choice of the most popular “sounds-like” algorithms, not just Soundex as used by the OFAC search tool.

cd /opt/bitnami/elasticsearch/
sudo bin/plugin -install elasticsearch/elasticsearch-analysis-phonetic/2.4.1
sudo bin/plugin -install river-csv -url https://github.com/AgileWorksOrg/elasticsearch-river-csv/releases/download/2.1.1/elasticsearch-river-csv-2.1.1.zip

*As of the time of this writing, elasticsearch-river-csv-2.1.1 did not support parent-child relationships, but luckily I found an old fork of the code which did, and did not break my code, so I later ran:

sudo /opt/bitnami/elasticsearch/bin/plugin -uninstall river-csv
sudo /opt/bitnami/elasticsearch/bin/plugin -install river-csv -url https://github.com/Trax-air/elasticsearch-river-csv/releases/download/2.1.1/elasticsearch-river-csv-2.1.1-SNAPSHOT.zip

To modify the ES settings, and make the phonetics search the default, I ran
sudo vi /opt/bitnami/elasticsearch/config/elasticsearch.yml

Bitnami had the “network.publish_host” setting configured to some IP, so I just commented it out to automatically derive that setting. Then I added to the end of the file:

index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: standard
index.analysis.analyzer.default.filter: [standard, lowercase, my_metaphone]
index.analysis.filter.my_metaphone.type: phonetic
index.analysis.filter.my_metaphone.encoder : doublemetaphone
index.analysis.filter.my_metaphone.replace: false

which is probably the same as this YAML, but I wasn’t sure if the YAML was getting picked up, and figured I’d best stick to the conventions in the file.

index :
  analysis :
    analyzer :
      default :
        type : custom
        tokenizer : standard
        filter : [standard, lowercase, my_metaphone]
    filter :
      my_metaphone :
        type : phonetic
        encoder : doublemetaphone
        replace : false

Restart the server using
sudo /opt/bitnami/ctlscript.sh restart

or just ES with
sudo /opt/bitnami/elasticsearch/scripts/ctl.sh stop
sudo /opt/bitnami/elasticsearch/scripts/ctl.sh start

Since the SDN and non-SDN lists have the same format and do not repeat unique IDs, I was able to load each list into the same type.
My mappings are:

curl -XPOST 'http://localhost:9200/ofac' -d '{
    "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 0
    },
    "mappings" : {
      "prim" : {
        "properties" : {
          "ent_num":{"type":"integer"},
          "SDN_Name":{
			  "type":"string",
			  "fields": {
				  "raw": {"type": "string", "index": "not_analyzed" }
			  }
		  },
		  "SDN_Type":{"type" : "string", "index" : "not_analyzed"}
        }
      },
      "alt" : {
        "_parent" : {
          "type" : "prim"
        },
        "properties" : {
          "ent_num":{"type" : "integer", "index" : "not_analyzed"},
		  "alt_num":{"type" : "integer"},
		  "alt_type": {"type": "string"},
		  "alt_name":{
			  "type":"string",
			  "fields": {
				  "raw": {"type": "string", "index": "not_analyzed" }
			  }
		  },
		  "alt_remarks":{"type": "string"}
        }
      },
      "add" : {
        "_parent" : {
          "type" : "prim"
        },
        "properties" : {
          "ent_num":{"type" : "integer", "index" : "not_analyzed"},
		  "add_num":{"type" : "integer"},
		  "address":{"type" : "string"},
		  "citystatezip":{"type": "string"},
		  "country":{"type": "string"},
		  "add_remarks": {"type": "string"}
        }
      },
      "comments" : {
        "_parent" : {
          "type" : "prim"
        },
        "properties" : {
          "ent_num":{"type" : "integer", "index" : "not_analyzed"},
		  "comment":{"type" : "string"}
        }
      }
    }
}'    

A few points here on the mappings:

  1. I’m using a single index “ofac”, and the different files get their own types: prim (primary record), alt (aliases), add (address details), comments
  2. All types besides prim have prim set as their parent. This tell ES that we will be specifying a parent ID for all those documents indexed by the forked version of CSV-River
  3. The prim.SDN_Name field will have a sub-field called “raw” that has the non-tokenized non-analyzed name data. We will use this for our Aggregation grouping later.

I then set up my CSV-Rivers to download and grab the files. Since the River renames the files after it indexes them, further downloads will not cause collisions.

curl -XPUT localhost:9200/_river/prim/_meta -d '
{
    "type" : "csv",
    "csv_file" : {
        "folder" : "/opt/bitnami/elasticsearch/scripts",
        "filename_pattern" : ".*prim\\.csv$",
        "poll":"6h",
        "fields" : [ "ent_num","SDN_Name","SDN_Type","Program","Title","Call_Sign","Vess_type","Tonnage","GRT","Vess_flag","Vess_owner","Remarks" ],
		"first_line_is_header" : "false",
		"field_id" : "ent_num",
		"script_before_all": "/opt/bitnami/elasticsearch/scripts/fetch_ofac.sh"
	}, 
	"index" : { 
		"index" : "ofac", 
		"type": "prim" 
	}
}'
curl -XPUT localhost:9200/_river/alt/_meta -d '
{
    "type" : "csv",
    "csv_file" : {
        "folder" : "/opt/bitnami/elasticsearch/scripts",
        "filename_pattern" : ".*alt\\.csv$",
        "poll":"6h",
        "fields" : [ "ent_num","alt_num","alt_type","alt_name","alt_remarks"],
		"first_line_is_header" : "false",
		"field_id" : "alt_num",
		"field_parent": "ent_num"
	}, 
	"index" : { 
		"index" : "ofac", 
		"type": "alt" 
	}
}'
curl -XPUT localhost:9200/_river/add/_meta -d '
{
    "type" : "csv",
    "csv_file" : {
        "folder" : "/opt/bitnami/elasticsearch/scripts",
        "filename_pattern" : ".*add\\.csv$",
        "poll":"6h",
        "fields" : [ "ent_num","add_num","address","citystatezip","country","add_remarks"],
		"first_line_is_header" : "false",
		"field_id" : "add_num",
		"field_parent": "ent_num"
	}, 
	"index" : { 
		"index" : "ofac", 
		"type": "add" 
	}
}'
curl -XPUT localhost:9200/_river/comments/_meta -d '
{
    "type" : "csv",
    "csv_file" : {
        "folder" : "/opt/bitnami/elasticsearch/scripts",
        "filename_pattern" : ".*comments\\.csv$",
        "poll":"6h",
        "fields" : [ "ent_num","comment"],
		"first_line_is_header" : "false",
		"field_parent": "ent_num"
	}, 
	"index" : { 
		"index" : "ofac", 
		"type": "comments" 
	}
}'

For better or worse, I think the before_script_all is asynchronous, which is a strange choice for a “before” hook, which means the downloads usually complete after the data is already indexed. But if my data is actually updated every 6 or 12 hours doesn’t make a big difference to our needs.

 The Search

There is much of the Query DSL that I didn’t learn in my crash-course in ElasticSearch, but the docs and examples online finally got me to the correct result. The most challenging part was returning the alias data along with the matching results. The latest stable release I was using was version 1.4.  I believe 1.5 will have some JOIN functionality, but I had to choose between recreating the index with the aliases, addresses, and comments nested in the primary data, or using aggregates to pull out the information I needed. I chose aggregates and the result looked like this:

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must_not": {
            "term": {
              "SDN_Type": "individual"
            }
          }
        }
      },
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "prim.SDN_Name": {
                  "query": "cubanator",
                  "operator": "and"
                }
              }
            },
            {
              "has_child": {
                "type": "alt",
                "query": {
                  "match": {
                    "alt.alt_name": {
                      "query": "cubanator",
                      "operator": "and"
                    }
                  }
                }
              }
            }
          ],
          "minimum_number_should_match": "65%"
        }
      }
    }
  },
  "aggs": {
    "fullmatches": {
      "terms": {
        "field": "prim.SDN_Name.raw",
        "size": "10"
      },
      "aggs": {
        "aliases": {
          "children": {
            "type": "alt"
          },
          "aggs": {
            "names": {
              "top_hits": {
                "size": "10"
              }
            }
          }
        }
      }
    }
  }
}

The filter will dynamically remove either individual (“must”) or non-individual (“must_not”) results, depending on our needs. Then we run our tokenized, phonetics matching against the names and aliases. Finally, we also return a top-hit aggregate of at most 10 matching alias records per name match.

Our PHP code using the official ElasticSearch client looks like:

$q = strtolower( trim($_REQUEST['last_name'].' '.$_REQUEST['first_name'].' '.$_REQUEST['name']));
if(empty($q)) exit('[]');

require 'vendor/autoload.php';

$params = array();
$params['hosts'] = array ( 'YOUR IP HERE');
$client = new Elasticsearch\Client($params);

$searchParams = array();
$searchParams['index'] = 'ofac';
$searchParams['type']  = 'prim';
$searchParams['body']['query']['filtered']['filter']['bool'][($_REQUEST['api']=='companySearch'?'must_not':'must')]['term']['prim.SDN_Type'] = 'individual';
$searchParams['body']['query']['filtered']['query']['bool']['should'][0]['match']['prim.SDN_Name']=array('operator'=>'and', 'query'=>$q);
$searchParams['body']['query']['filtered']['query']['bool']['should'][1]['has_child'] = 
	array('type'=>'alt', 'query'=>
		array('match'=>
			array('alt.alt_name'=>
				array('operator'=>'and', 'query'=>$q) 
			)
		)
	);
$searchParams['body']['query']['filtered']['query']['bool']['minimum_number_should_match'] = '65%';

$searchParams['body']['aggs']['fullmatches'] = 
	array('terms' =>
			array('field' => "prim.SDN_Name.raw",
				"size"=> "10"
			),
		  'aggs' =>
			array('aliases' => array(
				'children' => array('type'=>'alt'),
				'aggs' => array(
					'names' => array('top_hits'=> array("size"=> "10")),
					//'top_hit' => array('max' => array('script' => '_score'))
				)
			  )
			)
	);


$retDoc = $client->search($searchParams);

if($retDoc['hits']['total'] > 0) {
	
	if(count($retDoc['aggregations']['fullmatches']['buckets'])>0){
		$aliases=array();
		foreach($retDoc['aggregations']['fullmatches']['buckets'] as $i=>$b){
			if($b['aliases']['doc_count'] > 0) {
				$prim_aliases = array();
				foreach($b['aliases']['names']['hits']['hits'] as $j=> $alt) {
					$prim_aliases[] = $alt['_source']['alt_name'];
				}
				$aliases[$b['key']] = implode(';',$prim_aliases);
			}
		}

	}
	
	$json = array();
	foreach($retDoc['hits']['hits'] as $i=>$result) {
		$s = str_replace('-0- ','',$result['_source']);
		$json[] = array('ofac_sdn_ent_num'	=> $result['_id'],
						'ofac_sdn_name' 	=> $s['SDN_Name'],
						'ofac_sdn_type'		=> $s['SDN_Type'],
						'ofac_sdn_program'	=> $s['Program'],
						'ofac_sdn_title'	=> $s['Title'],
						'ofac_sdn_call_sign'=> $s["Call_Sign"],
						'ofac_sdn_vess_type'=> $s["Vess_type"],
						'ofac_sdn_tonnage'	=> $s["Tonnage"],
						'ofac_sdn_grt'		=> $s["GRT"],
						'ofac_sdn_vess_flag'=> $s["Vess_flag"],
						'ofac_sdn_vess_owner'=>$s["Vess_owner"],
						'ofac_sdn_remarks'	=> trim($s["Remarks"].(isset($aliases[$s['SDN_Name']])?'     AKAs: '.$aliases[$s['SDN_Name']]:'')),
						'type'				=> 'Ofac'.($result['_type']=='prim' ? "Sdn":"Alt")
				);
	}
	exit(json_encode($json));
}		
else
	exit('[]');

We set up our client, pass in the query, run through the aliases creating an easy local lookup, and return the same response structure as the Free API. And my front-end was never the wiser.

ofac





[Fixed] jQuery News Ticker doesn’t show overflow text

10 07 2013

The ticker at http://www.jquerynewsticker.com/ is great, but what do you do when your text extends beyond the viewing window of the ticker? Shouldn’t the ticker scroll so you can see the rest of the text?

Yup!

tickerscroll

Github Fork: https://github.com/IsraelWebDev/jQuery-News-Ticker

Demo: http://jsfiddle.net/israelwebdev/YsSpY/8/





Local JavaScript Scripting with Node.JS

11 03 2013

It’s not much, but it’s my first little scriptlet with Node.JS,

specific purpose is to move files from one directory tree to an empty directory, one at a time, while another script processes each file as it enters the empty directory.


var fs = require('fs');
var path = require('path');
//Check if target dir is empty every 10 seconds
var walk_scan = function(dir1, dir2) {
var ws=this;
this.timer = setInterval(function(){
walk(dir2, function(err, results) {
if(results.length>0)
return;
walk(dir1, function(err, results) {
var file;
if(err) throw err;
// move 1 zip file over at a time
for(var i in results) {
file = results[i];
if(file.slice(-4) == '.zip') {
console.log('zip',file, path.basename(file));
fs.rename(file, dir2+"/"+ path.basename(file));
err = true;
break;
}
}
//end script when source directory has no zips
if(!err)
clearInterval(ws.timer);
});
});
}, 10000);
};
//usage:
//var ws = walk_scan("D:\\Scan", "D:\\RecordingsInput");
// walk() taken from http://stackoverflow.com/questions/5827612/node-js-fs-readdir-recursive-directory-search
var walk = function(dir, done) {
var results = [];
fs.readdir(dir, function(err, list) {
if (err) return done(err);
var i = 0;
(function next() {
var file = list[i++];
if (!file) return done(null, results);
file = dir + '/' + file;
fs.stat(file, function(err, stat) {
if (stat && stat.isDirectory()) {
walk(file, function(err, res) {
results = results.concat(res);
next();
});
} else {
results.push(file);
next();
}
});
})();
});
};

view raw

move_slow.js

hosted with ❤ by GitHub

 





BuddyPress Metas & Title using xprofile field data

17 08 2011

Populating the BuddyPress user Profile page’s title, meta description, and meta keywords with some of the user-generated content, delivered via xprofile fields, seems so much better than the default options, or even what’s available in seopress, the only current BuddyPress SEO plugin.

While we’re at it, setting the on-page SEO data for Activities based on the content is a great move too.

You’ll have to modify the fields based on the data you have available, and what fields feed into which page data, but this code, placed in your theme’s functions.php file, is a good start.
Want to make a plugin based on this code for the lay-men? Feel free.

Note: extractCommonWords function based on the function here.

function bp_xprofile_title($title=null, $without_site_name=null){
	global $bp, $current_blog;

	if(defined('BP_ENABLE_MULTIBLOG')){
		$blog_title=get_blog_option($current_blog->blog_id,'blogname');
	}else {
		$blog_title=get_blog_option(BP_ROOT_BLOG,'blogname');
	}

	if(bp_is_user_profile()){
		$title_array = array();
		if($data = xprofile_get_field_data( 'Company Name', $bp->displayed_user->id))
			$title_array[] = "$data ({$bp->displayed_user->fullname})";
		if($data = xprofile_get_field_data( 'Business Overview', $bp->displayed_user->id))
			$title_array[] = $data;
		$title_array[] = $blog_title;
		$title = mb_strimwidth(strip_tags( esc_attr( implode( ' | ', $title_array))), 0, 70, '...');
	}
	if (bp_is_activity_component() && is_numeric( $bp->current_action )) {
		$activity = bp_activity_get_specific( array( 'activity_ids' => $bp->current_action ) );
		if ( $activity = $activity['activities'][0])
			if(!empty($activity->content))
				$title = mb_strimwidth(preg_replace("/[^A-Za-z0-9\s\s+\-]/", "", strip_tags( $activity->content)), 0, 70-3-3-strlen($blog_title), '...') . " | $blog_title";
	}
	return $title;
}
add_filter('bp_page_title', 'bp_xprofile_title',0);

function extractCommonWords($string){
	$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
    
	$string = strip_tags($string);
	$string = trim($string); // trim the string
	$string = preg_replace('/[^a-zA-Z0-9\s\s+\-]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
	$string = strtolower($string); // make it lowercase
    
	preg_match_all('/\b.*?\b/i', $string, $matchWords);
	$matchWords = $matchWords[0];
       
	foreach ( $matchWords as $key=>$item ) {
		if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
			unset($matchWords[$key]);
		}
	}   
	$wordCountArr = array();
	if ( is_array($matchWords) ) {
		foreach ( $matchWords as $key => $val ) {
			$val = strtolower($val);
			if ( isset($wordCountArr[$val]) ) {
				$wordCountArr[$val]++;
			} else {
				$wordCountArr[$val] = 1;
			}
		}
	}
	arsort($wordCountArr);
	$wordCountArr = array_slice($wordCountArr, 0, 10);
	return array_keys($wordCountArr);
}

function bp_xprofile_meta()
{
	global $bp;

	if(bp_is_user_profile()){
		$array = array();
		if($data = xprofile_get_field_data( 'Business Category', $bp->displayed_user->id))
			$array[] = htmlspecialchars_decode($data);
		if($data = xprofile_get_field_data( 'Years of Experience', $bp->displayed_user->id))
			$array[] = "$data years experience";
		if($data = xprofile_get_field_data( 'Location', $bp->displayed_user->id))
			$array[] = $data;
		if($data = xprofile_get_field_data( 'Business Overview', $bp->displayed_user->id))
			$array[] = $data;

		$desc = mb_strimwidth(preg_replace("/[^A-Za-z0-9\s\s+\-&]/", "", strip_tags( implode( ' - ', $array))), 0, 150, '...');

		echo '<meta name="description" content="'.$desc.'" />';

		if($data = xprofile_get_field_data( 'Profile Details', $bp->displayed_user->id))
			$array[] = $data;
		if($data = xprofile_get_field_data( 'Company Name', $bp->displayed_user->id))
			$array[] = $data;
		$array[] = $bp->displayed_user->fullname;

		$key_text = preg_replace("/[^A-Za-z0-9\s\s+\-]/", "", strip_tags( implode( ' ', $array)));
		$keys = extractCommonWords($key_text);
		echo '<meta name="keywords" content="'.implode(',', $keys).'" />';
	}

	if (bp_is_activity_component() && is_numeric( $bp->current_action )) {
		$activity = bp_activity_get_specific( array( 'activity_ids' => $bp->current_action ) );
		if ( $activity = $activity['activities'][0])
			if(!empty($activity->content)){
				$content = preg_replace("/[^A-Za-z0-9\s\s+\-]/", "", strip_tags( $activity->content));
				echo '<meta name="description" content="'. mb_strimwidth($content, 0, 150, '...').'" />';
	
				$keys = extractCommonWords($content);
				echo '<meta name="keywords" content="'.implode(',', $keys).'" />';
			}
	}
}
add_action('wp_head', 'bp_xprofile_meta',1);




WordPress Network store in an instant!

28 03 2011

I’m getting really excited to use WPMU‘s MarketPress e-commerce plugin on the new JobShuk website.

As I wrote in this Tweet to the guy behind WordPress’ most popular e-commerce plugin for standalone installs, the 2 most exciting features of MarketPress are the unified marketplace across all users and revenue sharing. That means visitors can search/browse products contained in all the individual stores. Additionally, the Marketplace admin (me) can take a small percentage of the purchase price without getting into drop-shipping, affiliate payments, etc. That’s some easy money.

Now if only I could get MarketPress free for life. That would be a dream…





Android Spinner, ArrayAdapter, and NullPointerException

12 01 2011

This might be obvious, if you’re bothering to read this, but when populating an Android Spinner with an ArrayAdapter, sourced by an array, the source array must not contain uninitialized elements.

For example, if you’re doing this:

String[] colors = new String[colors_json.length()+1];
for(int m=1; m<=colors_json.length(); m++){
	if(colors_json.has("label")){
		colors[m] = colors_json.getString("label");
	}
}

then you might encounter a case when there are uninitialized elements in colors.

Instead, you can try something like this:

String[] colors0 = new String[colors_json.length()+1];
int index = 1;
for(int m=1; m<=colors_json.length(); m++){
	if(colors_json.has("label")){
		colors0[index] = colors_json.getString("label");
		index++;
	}
}
colors = new String[index];
System.arraycopy(colors0, 1, colors, 1, index-1);

You can then continue safely to set:

colors[0] = "Color";
color_adapter = new ArrayAdapter(this, android.R.layout.simple_spinner_item, colors);
color_adapter.setDropDownViewResource(android.R.layout.simple_spinner_dropdown_item);
color_spinner.setAdapter(color_adapter);

without concern of hitting a nasty NullPointerException.





Automatically update your php_browscap.ini

19 10 2010

My website uses the PHP function get_browser to see if a client is Javascript enabled. (I use this primarily to see if the visitor is a search engine bot and should see a long list of data or a real person and can handle AJAX-loaded data in tabs – most real people these days are using Javascript-enabled browsers.)

get_browser relies on the php_browscap.ini file, and the problem is that keeping the php_browscap.ini file up-to-date can be a chore, and should certainly be done automatically. (I notice the problem now that I use Chrome, and with every major release, the browscap file has no clue about my new browser, which is silly.)

So I finally sat down and wrote a basic script to fetch the updated file and restart my web server, which I’ll run weekly. This works for my FreeBSD box – change the paths and download command to fit your installation.

/usr/local/bin/wget -Ophp_browscap.ini "http://browsers.garykeith.com/stream.asp?PHP_BrowsCapINI"
mv -f /usr/local/etc/php/php_browscap.ini /usr/local/etc/php/php_browscap.old.ini
mv -f php_browscap.ini /usr/local/etc/php/
/usr/local/etc/rc.d/apache22 restart

I saved to: usr/local/www/mysite.com/get_php_browscap.sh
Then add it to your crontab:

# php_browscap update
5       4       *       *       6       root    /usr/local/www/mysite.com/get_php_browscap.sh >> /var/log/cron.log

I run it Saturday morning.
Any questions?





How to browse PHP’s print_r output the right way

16 06 2010

I’ve had the pleasure recently of working with some complex object oriented PHP with massive objects or lists of objects. The easiest way to have a look at your data for analyzing and debugging is to print_r it. Unfortunately, skimming through this data can be tedious, especially if you want to skip a couple of large, nested objects that are irrelevant to your data analysis.

When programming, I always keep Notepad++ open so I can keep a bunch of data accesible in tabs, such as notes, texts, data files, etc (and my tabs of saved files are seamlessly preserved between sessions, which is critical, of course). Notepad++ can handle bunches of different file formats out of the box, but PHP print_r isn’t one of them.

So I’ve pasted my print_r output into Notepad++, and the new file language is “Normal Text”. Immediately, bracket matching/highlighting works, which is great. I can go to an opening parens and see where it ends. So that’s it, right? Well, Notepad++ can also do code folding, like when you’re browsing a class, and you want to see all the class functions without all that pesky function code cluttering your view. So if you’ve got a PHP file open, for example, you can either click the [-]  to the left of the code to fold that function, or go to View->Fold All, and then just click [+] to open the class and see the top-level items in the class in plain view.

So what about the print_r? Matlab! I haven’t used Matlab since my first or second year in engineering school, and never thought it would useful, but here it is. I don’t remember what the code looks like, and I don’t care. All I know is that when I select Language->Matlab, it let’s me do code folding on my print_r output, and that’s all that matters. A bunch of the other languages work well too, or the same as Matlab, but it was the first and best for my needs.

FYI, this breaks if you have variable data containing a chunk of text with newlines and whatnot, but so it goes.





(PHP) Timestamp for a WORKING date/time a number of minutes/days from a given start date

3 12 2009

That title long enough? It’s hard to describe this function, but you get it now, right?

Based on this function here.

/**
* getTimePlusOffsetDuringWork
*
* The function returns the timestamp of a date + business minutes
* @param timestamp|string startDate Beginning date/time as base for offset
* @param numeric offsetMinutes Work minutes to add to the startDate
* @param mixed Array of holiday dates, preferrably in the 'MM-DD' format
* @return integer Timestamp for the resulting time during Work hours,
*     given the simple equation StartDate + offsetMinutes.
*     Ignores after-hours, weekends, and holidays [for the first year after the startDate].
* Limitations: too many to list entirely.
*     9am-5pm work day only
*     1 year of holidays only
*     no week/weekday dependent holidays
*     Need to set timezone prior to calling this function if used outside of the server's timezone
*     Relies on English locale setting
* @author Zvi Landsman
**/
function getTimePlusOffsetDuringWork($startDate,$offsetMinutes,$holidays=array("12-25","12-26","01-01","07-04")){
    $j = $i = 0; //changed this to 0 or you were always starting one day ahead
    $tmp1 = validation::is_timestamp($startDate)?$startDate:strtotime($startDate); //worked out a timstamp to start with
    $offsetMinutes = round($offsetMinutes);
    $simple_time = strtotime("+$offsetMinutes minutes", $tmp1);
    if($simple_time <= strtotime('17:00', $tmp1))
    {
        //offset is still today
        return $simple_time;
    }
    //Check if start time is after-hours 
    if ($tmp1 > strtotime('17:00', $tmp1))
    {
        $tmp1 = strtotime("09:00 +1 day", $tmp1);
    }
    //This checks if the minute offset puts us into tomorrow
    $newdate = $tmp1;
    $minutes_left = $offsetMinutes%480;
    $newtime = strtotime("+ $minutes_left minutes", $newdate);
    if($newtime > strtotime('17:00', $newdate))
    {
        $j++;
        $minutes_left = round(($newtime - strtotime('17:00', $newdate))/60);
        $tmp1 = strtotime("+ $minutes_left minutes", strtotime("09:00 +1 day", $tmp1));
    }
    else
    {
        $tmp1 = strtotime("+ $minutes_left minutes", $tmp1);
    }

    //create the holiday list for the first upcoming occurances only
    $holidayList = array();
    foreach ($holidays as $holiday)
    {
        $time_stamp=strtotime($holiday, $tmp1);

        $date=($time_stamp < $tmp1)?
            strftime("%d-%m-%Y",strtotime($holiday.' +1 year')):
            strftime("%d-%m-%Y",$time_stamp);
        $holidayList[] = $date;
    }

    $days_to_offset = floor($offsetMinutes/480);
    while($i < $days_to_offset)
    {
        $tmp2 = strtotime("+$j day", $tmp1);
        $day = strftime("%A",$tmp2);

        $tmp = strftime("%d-%m-%Y",$tmp2);
        if(($day != "Sunday") && ($day != "Saturday" )&&(!in_array($tmp, $holidayList)))
        {
            $i = $i + 1;
            $j = $j + 1;

        }
        else
        {
            $j = $j + 1;

        }
    }
    //$j = $j -1;
    return strtotime("+$j days",$tmp1);
}

I’m completely open to improvements in the comments.





How to play an .asf stream on the iPhone/iPod Touch

24 10 2009

I loved adding my Shoutcast, AOL Radio and Last.fm apps to my iPod, but I was bummed when I couldn’t play my favorite O&A stream on my Touch, because it was in .asf format. I’ve never been one to pay attention to file format. I mean, if my winamp lite couldn’t play any audio, Iwas sure there was a plug-in for it. But this darn iPhone…

Anyway, I looked for a bunch of solutions, but none of them worked. I gave up and tried again a few weeks later, and noticed something out the corner of my eye.

My video re-streamer, the only damn solution I could find to being able to browse my network and play networked files, supported re-streaming Internet Radio too, in any standard format my computer can play/decode. Thank you again Orb/Orblive !

It took me a while to figure out how to add it – I did it through the browser, but then I realized I can do it from my iPhone app. Anyway, thanks for almost getting it (almost) right iPhone.

[Yeah, I upgraded from my $12 mp3 player to a Touch – it just made sense for replacing my Palm Pilot too]