Discussion:
Segfault in "fossil sqlite"
(too old to reply)
Warren Young
2015-05-13 20:25:30 UTC
Permalink
I got this backtrace from running “fossil sqlite” under gdb on CentOS 5:

0x0000000000455784 in StrNLen32 (z=0x18820 <Address 0x18820 out of bounds>, N=-1) at ./src/printf.c:202
202 while( (N-- != 0) && *(z++)!=0 ){ n++; }
(gdb) bt
#0 0x0000000000455784 in StrNLen32 (z=0x18820 <Address 0x18820 out of bounds>, N=-1) at ./src/printf.c:202
#1 0x0000000000456a63 in vxprintf (pBlob=0x7fffffffce60, fmt=<value optimized out>, ap=<value optimized out>) at ./src/printf.c:665
#2 0x0000000000455c52 in vmprintf (zFormat=0xffffffff <Address 0xffffffff out of bounds>, ap=0x7fffffffceb8) at ./src/printf.c:842
#3 0x0000000000455dcb in mprintf (zFormat=0x18820 <Address 0x18820 out of bounds>) at ./src/printf.c:836
#4 0x000000000045fcaa in search_init (zPattern=<value optimized out>, zMarkBegin=0x824a10 "", zMarkEnd=0x0, zMarkGap=0x18820 <Address 0x18820 out of bounds>, fSrchFlg=3) at ./src/search.c:128
#5 0x00002aaaaaadf212 in el_init (prog=0x2aaaaacfbdb8 "", fin=0x36301536a0, fout=0x3630153780, ferr=0x3630153860) at el.c:109
#6 0x00002aaaaaaeebb9 in rl_initialize () at readline.c:301
#7 0x00002aaaaaaef9e5 in read_history (filename=0x81dcc0 "/home/tangent/.sqlite_history") at readline.c:1305
#8 0x0000000000515329 in sqlite3_shell (argc=1, argv=0x7fffffffe7f0) at ./src/shell.c:4794
#9 0x0000000000466a08 in cmd_sqlite3 () at ./src/sqlcmd.c:202
#10 0x0000000000445390 in main (argc=<value optimized out>, argv=0x2) at ./src/main.c:789

This doesn’t happen on CentOS 7 or OS X.

The “N” value is -1, and “n” is optimized out.

Running a standalone version of sqlite3 on the DB file succeeds.
Richard Hipp
2015-05-13 20:29:47 UTC
Permalink
Post by Warren Young
0x0000000000455784 in StrNLen32 (z=0x18820 <Address 0x18820 out of bounds>,
N=-1) at ./src/printf.c:202
202 while( (N-- != 0) && *(z++)!=0 ){ n++; }
(gdb) bt
#0 0x0000000000455784 in StrNLen32 (z=0x18820 <Address 0x18820 out of
bounds>, N=-1) at ./src/printf.c:202
#1 0x0000000000456a63 in vxprintf (pBlob=0x7fffffffce60, fmt=<value
optimized out>, ap=<value optimized out>) at ./src/printf.c:665
#2 0x0000000000455c52 in vmprintf (zFormat=0xffffffff <Address 0xffffffff
out of bounds>, ap=0x7fffffffceb8) at ./src/printf.c:842
#3 0x0000000000455dcb in mprintf (zFormat=0x18820 <Address 0x18820 out of
bounds>) at ./src/printf.c:836
#4 0x000000000045fcaa in search_init (zPattern=<value optimized out>,
zMarkBegin=0x824a10 "", zMarkEnd=0x0, zMarkGap=0x18820 <Address 0x18820 out
of bounds>, fSrchFlg=3) at ./src/search.c:128
#5 0x00002aaaaaadf212 in el_init (prog=0x2aaaaacfbdb8 "", fin=0x36301536a0,
fout=0x3630153780, ferr=0x3630153860) at el.c:109
This stack seems to have been smashed. el_init() never calls search_init().

Can you run in valgrind and see what that tell you?
Post by Warren Young
#6 0x00002aaaaaaeebb9 in rl_initialize () at readline.c:301
#7 0x00002aaaaaaef9e5 in read_history (filename=0x81dcc0
"/home/tangent/.sqlite_history") at readline.c:1305
#8 0x0000000000515329 in sqlite3_shell (argc=1, argv=0x7fffffffe7f0) at ./src/shell.c:4794
#9 0x0000000000466a08 in cmd_sqlite3 () at ./src/sqlcmd.c:202
#10 0x0000000000445390 in main (argc=<value optimized out>, argv=0x2) at ./src/main.c:789
This doesn’t happen on CentOS 7 or OS X.
The “N” value is -1, and “n” is optimized out.
Running a standalone version of sqlite3 on the DB file succeeds.
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
--
D. Richard Hipp
***@sqlite.org
Warren Young
2015-05-13 20:55:28 UTC
Permalink
Post by Richard Hipp
el_init() never calls search_init().
It does in the copy of libedit I’m using here. src/el.c, line 109.

I updated to the latest libedit, and it still happens:

http://thrysoee.dk/editline/
Post by Richard Hipp
Can you run in valgrind and see what that tell you?
It’s a Heisenbug. :)

$ valgrind fossil sql .quit
==17280== Memcheck, a memory error detector
==17280== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==17280== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==17280== Command: fossil sql .quit
==17280==
==17280==
==17280== HEAP SUMMARY:
==17280== in use at exit: 74,004 bytes in 77 blocks
==17280== total heap usage: 1,434 allocs, 1,357 frees, 658,013 bytes allocated
==17280==
==17280== LEAK SUMMARY:
==17280== definitely lost: 302 bytes in 3 blocks
==17280== indirectly lost: 0 bytes in 0 blocks
==17280== possibly lost: 73,512 bytes in 67 blocks
==17280== still reachable: 190 bytes in 7 blocks
==17280== suppressed: 0 bytes in 0 blocks
==17280== Rerun with --leak-check=full to see details of leaked memory
==17280==
==17280== For counts of detected and suppressed errors, rerun with: -v
==17280== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)

Note the lack of a segfault.

And to answer Stephan’s comments here, I did “make clean” and reconfigure.

Also, “configure --disable-lineedit” fixes the symptom, so it’s definitely either libedit itself causing the problem, or Fossil’s use of it.
Warren Young
2015-05-13 20:59:25 UTC
Permalink
Post by Warren Young
Also, “configure --disable-lineedit” fixes the symptom, so it’s definitely either libedit itself causing the problem, or Fossil’s use of it.
Another discovery: My standalone sqlite3 binary is from the platform sqlite3 package, which links to the platform libreadline, not to libedit.

libedit is something we have installed from source, for licensing reasons.
Richard Hipp
2015-05-13 21:20:40 UTC
Permalink
Post by Warren Young
Post by Richard Hipp
el_init() never calls search_init().
It does in the copy of libedit I’m using here. src/el.c, line 109.
Maybe its a symbol conflict between libel and the search_init()
function of Fossil. Try rebuilding Fossil without command-line
editing and see if that clears the problem.

I have to be afk for an hour or so. When I come back, if you say that
this cleared the problem, I'll change the name of the search_init()
routine in Fossil. :-)
--
D. Richard Hipp
***@sqlite.org
Warren Young
2015-05-13 21:37:19 UTC
Permalink
Post by Richard Hipp
Post by Richard Hipp
el_init() never calls search_init().
It does in the copy of libedit I’m using here. src/el.c, line 109.
Maybe its a symbol conflict between libel and the search_init()
function of Fossil.
Nailed it. I renamed that function, and the symptom is gone. See the attached trivial patch.

I’m surprised the linker even allows that.

By the way, I’m talking about NetBSD libedit here. I don’t know anything about a “libel” package. That may not have the same problem.
Post by Richard Hipp
Try rebuilding Fossil without command-line
editing and see if that clears the problem.
I already tried that. (See previous message.) And yes, it does allow “fossil sql” to run without crashing.
bch
2015-05-13 21:58:33 UTC
Permalink
I applied this.

Thanks.

-bch
Post by Warren Young
Post by Richard Hipp
Post by Richard Hipp
I got this backtrace from running "fossil sqlite" under gdb on CentOS
el_init() never calls search_init().
It does in the copy of libedit I'm using here. src/el.c, line 109.
Maybe its a symbol conflict between libel and the search_init()
function of Fossil.
Nailed it. I renamed that function, and the symptom is gone. See the
attached trivial patch.
I'm surprised the linker even allows that.
By the way, I'm talking about NetBSD libedit here. I don't know anything
about a "libel" package. That may not have the same problem.
Post by Richard Hipp
Try rebuilding Fossil without command-line
editing and see if that clears the problem.
I already tried that. (See previous message.) And yes, it does allow
"fossil sql" to run without crashing.
Richard Hipp
2015-05-13 22:41:21 UTC
Permalink
Post by Warren Young
Post by Richard Hipp
Post by Warren Young
Post by Richard Hipp
el_init() never calls search_init().
It does in the copy of libedit I’m using here. src/el.c, line 109.
Maybe its a symbol conflict between libel and the search_init()
function of Fossil.
Nailed it. I renamed that function, and the symptom is gone. See the
attached trivial patch.
I checked in an even simpler fix. Please verify that tip of trunk
works. Thanks.
--
D. Richard Hipp
***@sqlite.org
Warren Young
2015-05-13 22:57:58 UTC
Permalink
Post by Richard Hipp
I checked in an even simpler fix. Please verify that tip of trunk
works. Thanks.
Yes, it does. I should have thought of that myself.

Thank you!

Stephan Beal
2015-05-13 20:31:43 UTC
Permalink
Post by Warren Young
#4 0x000000000045fcaa in search_init (zPattern=<value optimized out>,
zMarkBegin=0x824a10 "", zMarkEnd=0x0, zMarkGap=0x18820 <Address 0x18820 out
of bounds>, fSrchFlg=3) at ./src/search.c:128
#5 0x00002aaaaaadf212 in el_init (prog=0x2aaaaacfbdb8 "",
fin=0x36301536a0, fout=0x3630153780, ferr=0x3630153860) at el.c:109
This doesn’t happen on CentOS 7 or OS X.
The “N” value is -1, and “n” is optimized out.
Running a standalone version of sqlite3 on the DB file succeeds.
If you built from sources, make sure you do a 'make clean' after updating.
Sometimes failing to do so causes such weirdness (as Richard mentioned)
like calling of functions which are not really being called.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Loading...