Wednesday, December 30, 2009

Solving Asterisk DTMF callerid issues - or why i love Open Source

I have been busy the last few days with an issue that has come up, with a new batch of GSM FCT's we need to interface with some of our deployed Hermes e-IPBX units.

Hermes e-IPBX is a solution we have worked on for sometime now and basically is a build-from-scratch linux distro tailored for use with Asterisk as a PBX.
It's a "compact" solution, requiring around 32MB of flash for the entire system including a custom web based UI we have developed.

For a number of reasons that would take a lot of time to explain (and make excellent material for another blog post) our current version of Hermes e-ipbx uses Asterisk 1.2 and not the latest stable 1.4 or 1.6

The problem we faced, was that we could not get callerid working with these new FCT units.
As it turned out these units were using dtmf and not fsk to pass the callerid info between the first and second ring.

So at the begging we though, "hey that's not a problem, asterisk supports dtmf for callerid, all we have to do is change the cidsignalling variable in zapata.conf to use dtmf."
We could not be more wrong...

No matter what configuration we tried in zapata.conf, asterisk was throughing errors and we could not get the callerid info.


-- Starting simple switch on 'Zap/1-1'
Dec 28 00:07:29 ERROR[3896]: callerid.c:276 callerid_feed: fsk_serie
made mylen < 0 (-1)
Dec 28 00:07:29 WARNING[3896]: chan_zap.c:6627 ss_thread: CallerID feed
failed: Success
Dec 28 00:07:29 WARNING[3896]: chan_zap.c:6671 ss_thread: CallerID
returned with error on channel 'Zap/1-1'
-- Executing Wait("Zap/1-1", "5") in new stack
Dec 28 00:07:29 DEBUG[3896]: chan_zap.c:4001 zt_handle_dtmfup: DTMF
digit: 9 on Zap/1-1
Dec 28 00:07:29 DEBUG[3896]: chan_zap.c:4001 zt_handle_dtmfup: DTMF
digit: 1 on Zap/1-1
Dec 28 00:07:29 DEBUG[3896]: chan_zap.c:4001 zt_handle_dtmfup: DTMF
digit: 0 on Zap/1-1
Dec 28 00:07:29 DEBUG[3896]: chan_zap.c:4001 zt_handle_dtmfup: DTMF
digit: 5 on Zap/1-1
Dec 28 00:07:31 DEBUG[3896]: chan_zap.c:4907 __zt_exception: Exception
on 14, channel 1
Dec 28 00:07:31 DEBUG[3896]: chan_zap.c:4092 zt_handle_event: Got event
Ring Begin(18) on channel 1 (index 0)
Dec 28 00:07:32 DEBUG[3896]: chan_zap.c:4907 __zt_exception: Exception
on 14, channel 1
Dec 28 00:07:32 DEBUG[3896]: chan_zap.c:4092 zt_handle_event: Got event
Ring/Answered(2) on channel 1 (index 0)
Dec 28 00:07:32 DEBUG[3896]: chan_zap.c:4441 zt_handle_event: Setting
IDLE polarity due to ring. Old polarity was 0
Dec 28 00:07:34 DEBUG[3896]: pbx.c:1548
pbx_substitute_variables_helper_full: Function result is '"" <>'
-- Executing NoOp("Zap/1-1", "CALLERID="" <>") in new stack
Dec 28 00:07:36 DEBUG[3896]: chan_zap.c:4907 __zt_exception: Exception
on 14, channel 1
Dec 28 00:07:36 DEBUG[3896]: chan_zap.c:4092 zt_handle_event: Got event
Ring Begin(18) on channel 1 (index 0)



As you can see asterisk does capture part of the callerid (which is 2114019105 in this case) *after* the initial first ring, but misses the rest of it (last 4 digits).

Trying to figure out what the issue was, the first though was that the dtmf send by the device was somehow distorted,
What was need it, was a way to capture what the device was sending and check it.
So i decided to used one of what i call, asterisk's hidden super-weapons, ztmonitor.
ztmonitor is a utility that comes with asterisk and allows you to monitor a Zap (aka Dahdi) channel for signal level and also save a 'raw' image of what comes in or goes out through that interface.
It has proven a valuable tool in the past when we were trying to solve some echo issues a customer had.

The following command can be used to record any given zap channel, mixing input and output streams in a single file

ztmonitor "channel number" -f "file_name.raw"


ztmonitor has also options to capture input and output streams in different files and also get the stream without echo canceling applied.

Once you have captured the data, you can convert them to a wav file for further processing using sox.

sox -r 8000 -s -w -c 1 "file_name.raw" "file_name.wav">


What that gives you is a wav file of both streams ready for further processing and investigation.

First thing to do is to have a visual view of the captured signal.

There are several open source tools but i use Audacity for this task, as it provides several interesting features.

Following is what the captured streams looked like when loaded in Audacity.






By measuring the different parts of the stream the following info was gathered.

After the 1st ring ends, there is a 640ms delay before the dtmf starts.
Dtmf pulses have a length of 84ms and there is inter-digit delay of 120ms.

Things looked "normal",there are a few glitches in the captured stream but nothing too big and the level of noise was not big, so next we had to test if the dtmf pulses were valid.

And the tool of choice for this kind of work is none other than multimon.

Multimon can decode a variety of digital transmission modes commonly found on UHF radio using the soundcard or a wav file as input and DTMF is one of the supported.

It's also pretty straight forward to use

multimon -a DTMF -t wav ./streamtx.raw.wav


would produce an output like that.

multimod (C) 1996/1997 by Tom Sailer HB9JNX/AE4WA
available demodulators: POCSAG512 POCSAG1200 POCSAG2400 EAS AFSK1200 AFSK2400 AFSK2400_2 HAPN4800 FSK9600 DTMF ZVEI SCOPE
Enabled demodulators: DTMF
DTMF: 2
DTMF: 1
DTMF: 1
DTMF: 4
DTMF: 0
DTMF: 1
DTMF: 9
DTMF: 1
DTMF: 0
DTMF: 5



That proved that the dtmf send by the device was OK, so where is the problem ?

After looking at source code and searching for people with similar problems, it was clear that dtmf callerid in asterisk 1.2 is "partially" working.
What partially means, is that there are basically 3 ways callerid is send.
One is between the gap of the first and second ring, second is by signaling a polarity reversal before the first ring and as i found out in some countries callerid is send before the ring without any signalling (i.e polarity reversal)
Also in the case dtmf is used to send the callerid a start-of-callerid is send before the number in the means of a letter (which is different depending on country)

In Asterisk 1.2 when the callerid is send with dtmf there is only support for the polarity reversal method and not the other two.
The good thing was, that a lot of Indian telcos where using dtmf between the first and second ring so there was a patch available for 1.2 that never made it in the svn, but got incorporated in 1.4 and 1.6.

So i got the patch,removed some of the debug lines and rebuilded chan_zap.

Now i could get the last 8 digits of the callerid the FCT send, but some reason i was missing the initial two...
It looked like the dtmf detection routine was kicking-in a bit too late, missing the first 2 dtmf digits.
After a bit more of head-scratching i remembered there was definition in the zaptel card driver (wctdm.c) that was defining the time the line would take to settle after the ring.

#define DEFAULT_RING_DEBOUNCE 64 /* Ringer Debounce (64 ms) */


Setting this to 32 ms solved the issue of the two first missing digits.

The whole procedure took several days, till we managed to solve the issue, but consider the alternative if this was a closed source system, where we would have to file a bug report to the company supplying the software.
We would probably had to wait for an answer for much longer provided they had enough interest (and resources) to fix the issue.