Tricks for language agnosticism
When evaluating different approaches for a non-trivial problem like NLP I've found the libraries created by others to be invaluable for benchmarking different techniques. Unfortunately these libraries, kits, and code snippets are written in every language under the sun and are in various stages of broken, so some native language programming is generally required. For me the ability to try out different algorithms in any language is critical for being able to be the sole technical resource working on challenging problems. Here are some general tricks I've learned:
1. Develop an "interlingua"
In the spirit of proto-Esperanto and the machine translation concept, the idea of an interlingua in programming for me is a set of algorithm syntax broad enough to cover most programming tasks but general enough to be applied in most languages. Though language-unique techniques are often critical for tuning a production-ready app, straightforward code expressions are ideal for iterative programming cycles and ramping up newer programmers. I usually use simple loops and conditions and break apart multi-stage tasks onto their own lines. Basic classes and methods in a standard MVC structure work in most environments. One exception to this approach: ORM. When using a language which can handle my SQL for me, I'm happy to do whatever it takes to get that off my plate.
2. Use batch processes liberally
One of the hardest elements of creating a multi-lingual application is the "putty" layer - making code in one language talk to the other. Interoperability is eventually necessary for threaded or synchronous tasks, but if you're just testing an approach you can often skip this layer. Try to find a way for both codebases to talk to the same data layer, and then run one as a batch process. MySQL extensions are fairly ubiquitous these days.
3. Get good at scripting
This goes right along with number two... the better you are at scripting Perl, Python, or Ruby, the more you can massage the data going in and out of the unfamiliar languages' code and the less native programming you'll need to do for testing purposes.
4. Invest in the native runtime
When experimenting with a new language it is important to be able to do trial and error and iterate quickly. It is worth the investment to write good make/ant/rake files and check the source into subversion for testing. This will sound silly, but also make sure you write down the directory structures and execution commands for each environment you'll need to remember.
5. Have a Escape Plan
Obviously you don't want to end up with a seven-headed hydra of a production application, so you should have a sense of how you're going to detangle your technology stack. Generally this means rewriting the modules using the approach you end up with in one of your home languages. Personally I prefer a two language stack - one interpreted language which is fast to develop in and one compiled language which performs well.
There's no one language which offers the libraries that all languages do and for more intensive research-related tasks a multi-language approach can save a huge amount of time. If you follow the KISS principal and the "build one to throw away" approach, you'll be able to get hands-on experience with many more techniques in areas like computational linguistics, image mining, or AI than you could with a single stack.
1. Develop an "interlingua"
In the spirit of proto-Esperanto and the machine translation concept, the idea of an interlingua in programming for me is a set of algorithm syntax broad enough to cover most programming tasks but general enough to be applied in most languages. Though language-unique techniques are often critical for tuning a production-ready app, straightforward code expressions are ideal for iterative programming cycles and ramping up newer programmers. I usually use simple loops and conditions and break apart multi-stage tasks onto their own lines. Basic classes and methods in a standard MVC structure work in most environments. One exception to this approach: ORM. When using a language which can handle my SQL for me, I'm happy to do whatever it takes to get that off my plate.
2. Use batch processes liberally
One of the hardest elements of creating a multi-lingual application is the "putty" layer - making code in one language talk to the other. Interoperability is eventually necessary for threaded or synchronous tasks, but if you're just testing an approach you can often skip this layer. Try to find a way for both codebases to talk to the same data layer, and then run one as a batch process. MySQL extensions are fairly ubiquitous these days.
3. Get good at scripting
This goes right along with number two... the better you are at scripting Perl, Python, or Ruby, the more you can massage the data going in and out of the unfamiliar languages' code and the less native programming you'll need to do for testing purposes.
4. Invest in the native runtime
When experimenting with a new language it is important to be able to do trial and error and iterate quickly. It is worth the investment to write good make/ant/rake files and check the source into subversion for testing. This will sound silly, but also make sure you write down the directory structures and execution commands for each environment you'll need to remember.
5. Have a Escape Plan
Obviously you don't want to end up with a seven-headed hydra of a production application, so you should have a sense of how you're going to detangle your technology stack. Generally this means rewriting the modules using the approach you end up with in one of your home languages. Personally I prefer a two language stack - one interpreted language which is fast to develop in and one compiled language which performs well.
There's no one language which offers the libraries that all languages do and for more intensive research-related tasks a multi-language approach can save a huge amount of time. If you follow the KISS principal and the "build one to throw away" approach, you'll be able to get hands-on experience with many more techniques in areas like computational linguistics, image mining, or AI than you could with a single stack.

0 Comments:
Post a Comment
<< Home