A fellow at the german hadoop user meeting (Thanks to Isabel that organized that again) pointed me to the fact that GPUs on a graphic cards basically working like server grids.
He mentioned there are some research papers in this field. I spend some time to read through what I could found and it was quite interesting. Let me citate some of the facts from the two most interesting papers:

+ “A Map Reduce Framework for Programming Graphics Processors” by Bryan Catanzaro, Narayanan Sundaram and Kurt Keutzer UC, Berkeley
+ “Mars: A MapReduce Framework on Graphics Processors” by Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, Tuyong Wang

First lets compare some facts (via Wenbin Fang, ppt).

--------------------------------------------------------------------------------------------
What                       |           GPU                |               CPU             |
--------------------------------------------------------------------------------------------
Memory Bandwidth           | ~80 GB/s                      | ~10 GB/s                      |
Floating point performance | ~500 GFLOPS                   | ~50 GFLOPS                    |
Parallelism                | ~10, 000 light weight threads |Optimized for sequential code. |
Performance improvement    |~2.5x ~ 3x per year            |~1.5 per year                  |
--------------------------------------------------------------------------------------------

Obviously GPUs provide a lot of horsepower. The problem so far was that programming for GPUs is quite difficult. For example higher level language constructs like variable-length data types and recursion do not exist. Also all GPUs API are highly vendor specific, but things moving forward, as I found out.
Both papers try to validate there statements by implementing map reduce but looks like Mars is much further and stil under development (Last Mars release August 2008). Both parties uses NVIDIA CUDA as development platform, means they require a Nvidia graphic

GPUs looks quite a lot like hadoop cluster for me, from my limited perspective.

GpuGrid.jpg

Both paper trying do some performance comparison to cpu based processing. The results have be take with the required salt but sounds pretty impressive.

gputrainingstime.jpg

gpuclassificationtime.jpg

marsspeedup.jpg

 

Of course I checked if there are any java bindings for CUDA and the great news is yes there is JCublas. Though I didnt have time to try it out.

Unfortunately there is too much on my todo list but it would be very interesting to eater integrate Mars with hadoop to allow running computing intense maps and reduce on a gpu or port lucene indexing to mars.

I will keep watching this topic and who know maybe there is a project coming up where I can find time to investigate more in this field.

Bryan and Vivek from Rapleaf invited me to talk at there hadoop event this Tuesday. Thanks again.
I presented a experience report of a in production system we build over the last year with hadoop and katta (lucene the grid style). It was a fun event and quite a lot of people showed up.
Here you find my katta, pig and hadoop in production – experience report slides .

And here are the videos

Part 1 (Bryan): http://www.vimeo.com/2084824

Part 2 (Stefan): http://www.vimeo.com/2085140

Part 3 (Arun): http://www.vimeo.com/2085477

 

UPDATE: 

I managed to upload the script by changing the extension to .doc. Here you can download the OmniFocusToThings script, but you need to rename it to .applescript  or copy it into the apple script editor. 

UPDATE: there is a new version of this script from robotii, see comments for the link.

 

I decided to give cultured code things another try after Jyri reported he is pretty happy with things. I tried a couple month ago, at this time I liked the simple interface but it had not the features I was looking for. Things got better but I still missing a good mail app integration. Omnifocus for example allows you to select what ever text, you press a hot key and vola you have a new task in the inbox. 

Anyhow my biggest problem was to migrate my hounders of task from omni focus to things, so I glued two apple scripts together and improved them up until I had what I was looking for. Thanks to Karels MailToThings script and Robinfrancistrew OF2TaskpaperMail script.

Consider the script alpha, and make sure you update omni focus and things before you try it! Please give me feedback and fix the bugs!🙂

How to:

1.) Copy the text into a new script document (script editor)

2.) open things and omni focus

3.) switch into a project view of omni focus and select all task you want to import into things.

4.) Start the script and get a beer, the script is super slow since i need ti use key events to communicate with things. 

5.) Clean up the import. The script tries to import all OF meta data as tags, this might be a little messy since also due date etc are used as tags. Feel free to change it in a way it makes sense to you.

 


-- -------------------------------------
-- the general run method
-- -------------------------------------
on run

display dialog "Should we start? Make sure you have all task in Omni Focus marked you want to import into Thing" buttons {"OK", "Cancel"} default button 1

tell application "OmniFocus"
tell default document
if number of document window is 0 then
make new document window with properties {bounds:{0, 0, 1000, 500}}
end if
end tell

tell document window 1 of front document
set lstTrees to selected trees of content
if (count of lstTrees) = 0 then
try
display dialog "Nothing selected in the right-hand panel." & return & return & "Select material to export, and try again." & return
end try
else
-- Generate a TaskPaper string of the selected content
set blnContext to (selected view mode identifier is not equal to "project")
set lngIndent to 0
my ExportTrees(lstTrees, lngIndent, blnContext)

end if
end tell
end tell
end run
-- -------------------------------------
-- Walks the omni focus tree
-- -------------------------------------
on ExportTrees(lstTrees, lngIndent, blnContextView)
-- if the tree is a task give full detail
-- else just name and any note
-- set strTP to ""

using terms from application "OmniFocus"
repeat with oTree in lstTrees
-- intialize task string
set strTP to ""
set notes to ""
set tags to ""
set oValue to value of oTree
try
set strName to name of oValue
on error
set strName to "Inbox"
end try
if length of strName > 0 then
set strName to my Esc(strName)
end if

if strName ≠ "Inbox" then
set strNote to note of oValue
if length of strNote > 0 then
set strNote to my Esc(strNote)
end if
end if

set clValue to class of oValue
if (clValue is not equal to task) and (clValue is not equal to inbox task) then

-- Project or Folder
if clValue is not equal to folder then
if clValue is not equal to project then
--Inbox (No details)
set strTP to strTP & "Inbox:" & return

else
-- Project (Name and possibly note)
if length of strName > 0 then
set strTP to strTP & strName & ":" & return
if length of strNote > 0 then
set notes to strNote & return
end if
end if
end if
else
-- Folder (Just name - no note)
set strTP to strTP & strName & ":" & return
end if

else -- Task (with details from specified columns)

-- set recFields to {fldName:name of oValue, fldNote:note of oValue, fldDone:completed of oValue, fldContext:strContext, fldStartDate:start date of oValue, flddueDate:due date of oValue, fldDoneDate:completion date of oValue, fldDuration:estimated minutes of oValue, fldFlagged:flagged of oValue}

-- write first line of task, followed by tags
set lstLines to paragraphs of strName

set strTP to strTP & item 1 of lstLines

-- Add any tags
set oContext to context of oValue
if oContext is not equal to missing value then
set tags to " @" & name of oContext & ","
end if

set dteStart to start date of oValue
if dteStart is not equal to missing value then
set tags to tags & " @start(" & my DateString(dteStart) & ")" & ","
end if

set dteDue to due date of oValue
if dteDue is not equal to missing value then
set tags to tags & " @due(" & my DateString(dteDue) & ")" & ","
end if

set lngDurn to estimated minutes of oValue
if lngDurn is not equal to missing value then
set tags to tags & " @mins(" & (lngDurn as string) & ")" & ","
end if

if flagged of oValue then
set tags to tags & " @flag" & ","
end if

if completed of oValue then
set tags to tags & " @done" & ","
end if

-- project if we know
set aProject to containing project of oValue
if aProject is not equal to missing value then
set tags to tags & " @" & name of aProject & ","
end if

set strTP to strTP & return

-- write any remaining lines of task as note text
if length of lstLines > 1 then
repeat with strLine in rest of lstLines
set strLine to my RTrim(strLine)
if length of strLine > 0 then
-- change any trailling : to :-, to avoid misinterpretation as a header
if last character of strLine ≠ ":" then
set notes to notes & strLine & return
else
set notes to notes & strLine & "-" & return
end if
end if
end repeat
end if

-- append any attached note text
set lstLines to paragraphs of strNote

repeat with strLine in lstLines
set strLine to my RTrim(strLine)
if length of strLine > 0 then
-- change any trailling : to :-
if last character of strLine ≠ ":" then
set notes to notes & strLine & return
else
set notes to notes & strLine & "-" & return
end if
end if
end repeat

end if

-- if the current node has sub-trees then recurse
set lstSubTrees to trees of oTree
if (count of lstSubTrees) > 0 then
if (clValue ≠ project) and (clValue ≠ item) then
set lngNewIndent to lngIndent + 1
else
set lngNewIndent to lngIndent
end if
set strTP to strTP & ExportTrees(lstSubTrees, lngNewIndent, blnContextView)
end if
-- my log_event(my Esc(strTP))
my createThingTask(my Esc(strTP), my Esc(notes), tags)
end repeat
end using terms from

end ExportTrees
-- -------------------------------------
-- trims a text
-- -------------------------------------
on RTrim(someText)
local someText

repeat until someText does not end with return
if length of someText > 1 then
set someText to text 1 thru -2 of someText
else
set someText to ""
end if
end repeat

return someText
end RTrim
-- -------------------------------------
-- converts dates into a string
-- -------------------------------------
on DateString(dte)
-- yyyy-mm-dd hh:mm
set strDate to ""
if dte is not equal to missing value then
set lngMonth to month of dte as integer
set strMonth to lngMonth as string
if lngMonth < 10 then set strMonth to "0" & strMonth

set lngDay to day of dte as integer
set strDay to lngDay as string
if lngDay 0) or (lngmins > 0) then
set strHrs to lngHrs as string
if lngHrs < 10 then set strHrs to "0" & strHrs

set strMins to lngmins as string
if lngmins 1 then

set strNew to item 1 of lstParts
repeat with n from 2 to lngParts
set strNew to strNew & replace & item n of lstParts
end repeat
set text item delimiters to strOldDelim
return strNew
else
set text item delimiters to strOldDelim
return str
end if
end EscAmpersand
-- -------------------------------------
-- A simple logging mechanism
-- -------------------------------------

on log_event(theMessage)
set theLine to (do shell script ¬
"date +'%Y-%m-%d %H:%M:%S'" as string) ¬
& " " & theMessage
do shell script "echo " & "\"" & theLine & "\"" & ¬
" >> /import-events.log"
end log_event

-- -------------------------------------
-- import the task into things, we have to use key events
-- and the clipboard since things do not have apple script support yet
-- -------------------------------------
on createThingTask(subject, notes, tags)
activate application "Things"

-- jump to inbox
delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \"0\" using {option down, command down}"
my doWithTimeout(uiScript, timeoutSeconds)

-- create new task
delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \"n\" using command down"
my doWithTimeout(uiScript, timeoutSeconds)

-- set subject
set the clipboard to subject

delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \"v\" using command down"
my doWithTimeout(uiScript, timeoutSeconds)

-- jump to tags
delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \" \""
my doWithTimeout(uiScript, timeoutSeconds)

set the clipboard to tags

delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \"v\" using command down"
my doWithTimeout(uiScript, timeoutSeconds)

-- jump to notes
delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \" \""
my doWithTimeout(uiScript, timeoutSeconds)

set the clipboard to notes

delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \"v\" using command down"
my doWithTimeout(uiScript, timeoutSeconds)

delay 1
set timeoutSeconds to 1.0
set uiScript to "keystroke \" \""
my doWithTimeout(uiScript, timeoutSeconds)

end createThingTask

 

 

 

 

-- -------------------------------------
-- DO SOMETHING WITH A TIMEOUT
-- -------------------------------------
on doWithTimeout(uiScript, timeoutSeconds)
set endDate to (current date) + timeoutSeconds
repeat
try
run script "tell application \"System Events\"
" & uiScript & "
end tell"
exit repeat
on error errorMessage
if ((current date) > endDate) then
error "Can not " & uiScript
end if
end try
end repeat
end doWithTimeout

I visit the Inside Web 2.0 Strategy Panel Discussion today and Christa S. Quarles, Dave Morin and Seth Sternberg gave some interesting numbers:

Google: CPM: 120 USD per 1000 searches.

Yahoo: CPM 1 USD per 1000 page views on yahoo properties

Super bowl: CPM 30 USD per 1000 people

Facebook: 25 000 Apps today

Facebook: 400 000 developers

Facebook: Some  Apps make up to a  1Mio Dollar a month

Meboo: every day user of meebo spend 300 years time on meebo.

Hi All, 
as requirested here are the two slide decks from my katta presentation at the hadoop user group meeting at yahoo last week. 

Here are the katta overview slides: katta-overview 

And here are the hadoop survey slides: hadoopsurvey

 

Feedback is always helpful and very welome!

Yeah!!! Yesterday we release our first katta version!!
My colleagues worked very hard on this and yesterday we finished our first big milestone. I also gave a katta talk at the hadoop conference and I will post the slides as soon I can.

Here goes the release announcement:

 

 

After 5 month work we are happy to announce the first developer preview release of katta.This release contains all functionality to serve a large, sharded lucene index on many servers.Katta is standing on the shoulders of the giants lucene, hadoop and zookeeper.

 

Main features:

+ Plays well with Hadoop
+ Apache Version 2 License.
+ Node failure tolerance
+ Master failover
+ Shard replication
+ Plug-able network topologies (Shard – Distribution and Selection Polices)
+ Node load balancing at client

Please give katta a test drive and give us some feedback!

Download:
http://sourceforge.net/project/platformdownload.php?group_id=225750

website:
http://katta.sourceforge.net/

Getting started in less than 3 min:
http://katta.wiki.sourceforge.net/Getting+started

Installation on a grid:
http://katta.wiki.sourceforge.net/Installation

Katta presentation today (09/17/08) at hadoop user, yahoo mission college:
http://upcoming.yahoo.com/event/1075456/
* slides will be available online later

Many thanks for the hard work:
Johannes Zillmann, Marko Bauhardt, Martin Schaaf (101tec)

When maven1 was terrible, since the xml scripting engine was buggy. Maven2 was great – I thought and I used it heavily. But now a couple years later I understand people still using ant. I had great hope using maven but the reality proved, that maven is one of the biggest time waster in my developer life.
It is nice and low effort if you do simple one-project-just-build-me-a-jar kind of projects. But as soon you start working on serious projects – there is no chance you are productive with maven.
So what is good on maven:
* convention over configuration
* transient dependency management
* a lot of plugins
Though the bad parts are:
* all those plugins are very buggy
* it uses a descriptive language (xml), so no real possibility to script logic
* do some custom logic is a pain
** as soon you do any kind of different behavior your xml blows up
** custom plugins need to be external projects

And I could go on for hours. So it is time for something new. Sure there is buildr, but I dont want to learn even another language just for builds. So why dont use java – right use groovy.
So here is my new star on the build sky. Gradle
It has all the nice feature I want:
* very lightweight
* convention over configuration
* transient dependency management (we can still use all our maven repositories)
* a script language to do my custom logic
* java language syntax
* it works great with all build servers and moving to it is super easy since it comes with a script that download the release if needed.

So you should give it a try.

If you are a software developer you might already know this. In case there is a performance problem you should profile your application first and then improve performance on the hot spots. Unfortunately in my carrier I saw many developers doing it wrong and optimize where they guess the problem is. Of course they end up optimizing on the wrong place. Joshua Bloch talks about this problem in Effective Java.

Anyhow different topic, being a productivity freak and trying to otimize all my work on my computer I always looking for tune ups. How save me to type in passwords, autocomplete texts etc. 

But wait, didn’t I just say profile before optimize – right. So I took this approach to my computer user life. I installed slife a couple weeks ago and profiled my daily work. So where do I really spend time?

Email! 

I learned I spend pretty frequently over 2 hours out of my 8 to 12 working hours in just reading through all the mails. Wow! 

Definitely time to start improving this and Merlin Manns talk looks like a very good starting point:

Do I have all key words to get a good search score? Good! Just spend 1 hours to figure out a problem where we use the JAVA_HOME environment variable in a unit test. This test works great running it with maven but did not not in eclipse on os x. The problem is of course that environment variables set in the shell are not available in apps like eclipse. So worth to spread the word how to solve this. 

Here is the best writ up I found: http://www.digitaledgesw.com/node/31

RSS events

  • An error has occurred; the feed is probably down. Try again later.

some photos

fall in california. :) still enough flowers ...

flowers in my garden

neighbors cat

More Photos
Follow

Get every new post delivered to your Inbox.