-
Notifications
You must be signed in to change notification settings - Fork 4
/
README
122 lines (76 loc) · 3.92 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
Damerau-Levenshtein Distance UDF for MySQL
Supports upper bounding for fast searching and UTF-8 case
insensitive throught iconv.
Copyright (C) 2013 Diego Torres <diego dot torres at gmail dot com>
Implementing
https://github.com/torvalds/linux/blob/8a72f3820c4d14b27ad5336aed00063a7a7f1bef/tools/perf/util/levenshtein.c
Redistribute as you wish, but leave this information intact.
Compiling
Run make. It will try to autodetect architecture and generate 32 or x86_64 bit
version. Enable testing with "make debug". An executable instead of
a library will be built. Execute it to run some tests regarding
memory allocation and levenshtein core calculations.
Errors you may get when compiling:
In file included from ./src/damlevlim.c:16:0:
./src/damlevlim.h:26:23: fatal error: my_global.h: No such file or directory
#include <my_global.h>
^
compilation terminated.
Makefile:30: recipe for target 'mysqldamlevlim' failed
make: *** [mysqldamlevlim] Error 1
Development libraries from mysql are missing.
Do "sudo apt-get install libmysqlclient-dev"
/usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory
You're missing the 32 bit libc dev package. This is answered here:
http://stackoverflow.com/questions/7412548/gnu-stubs-32-h-no-such-file-or-directory
- On Ubuntu it's called libc6-dev-i386 - do "sudo apt-get install"
libc6-dev-i386. See below for extra instructions for Ubuntu 12.04.
- On Red Hat distros, the package name is glibc-devel.i686 (Thanks to
David Gardner's comment)
- On CentOS 5.8, the package name is glibc-devel.i386 (Thanks to JimKleck's comment)
- On CentOS 6.3, the package name is glibc-devel.i686.
- On SLES it's called glibc-devel-32bit - do zypper in glibc-devel-32bit
Loading UDFs in MySQL STEP 1
Copy compiled udf to plugin dir. To know where exactly, ask MySQL:
mysql> SHOW VARIABLES LIKE 'plugin_dir';
and use result directory.
To modify plugin_dir to point to a custom location:
my.cnf: plugin_dir = /usr/lib/mysql/plugin
Don't forget to restart MySQL!!
Loading UDFs in MySQL STEP 2
Bind the library to an UDF inside MySQL (only do it once,
survives MySQL reboot). Depending your MySQL binaries (32
or 64 bits), insert the right library.
mysql> DROP FUNCTION IF EXISTS damlevlim;
mysql> CREATE FUNCTION damlevlim RETURNS INT SONAME 'mysqldamlevlim.so';
Check if UDF was loaded OK.
mysql> SELECT * FROM mysql.func;
+-----------+-----+-------------------+----------+
| name | ret | dl | type |
+-----------+-----+-------------------+----------+
| damlevlim | 2 | mysqldamlevlim.so | function |
+-----------+-----+-------------------+----------+
1 row in set (0.00 sec)
Using damlevlim
mysql> select damlevlim();
ERROR: DAMLEVLIM() requires three arguments (string, string, int)
First string is compared against second string. Compare only the first
int chars (upper bound).
mysql> select damlevlim('coche', 'çóche', 5);
+--------------------------------+
| damlevlim('coche', 'çòche', 5) |
+--------------------------------+
| 0 |
+--------------------------------+
1 row in set (0.00 sec)
Further reading about MySql, UTF8 and charset encoding
http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html
How to search for multibyte characters in a column
Use MySql CHAR_LENGTH function to find rows with multi-byte characters:
mysql> SELECT name FROM users WHERE LENGTH(name) != CHAR_LENGTH(name);
How to search for strange multibyte characters in a column
(don't remember source of original tip)
Use MySql HEX and PHP bin2hex functions
mysql> SELECT name, HEX(name) FROM users;
Get the result back into PHP, and run a bin2hex on the string, compare
it to MySQL’s hex of that same string.