[Sphinx] Sphinx 검색엔진 2부

DB/Sphinx

[Sphinx] Sphinx 검색엔진 2부 - 설정 및 사용

HMHA 2023. 2. 6. 11:17

300x250

설정

이전 글에서 Sphinx 설치와 설정 방법에 대해서 알아보았습니다.

이번에는 어떻게 사용할 수 있는지에 대해서 알아보도록 하겠습니다.

현재 제 Database 의 설정은 다음과 같습니다. 이전에 포스팅 할 당시와 좀 다르다는 점 먼저 말씀드립니다.

(서버가 변경되었습니다 ㅎ)

Database Table 의 구조

mysql> desc words_kr;

+-------+------------------+------+-----+---------+----------------+

+-------+------------------+------+-----+---------+----------------+

| id | int(11) unsigned | NO | PRI | NULL | auto_increment |

+-------+------------------+------+-----+---------+----------------+

2 rows in set (0.01 sec)

Table 의 Records

mysql> select * from words_kr limit 5;

+----+-----------+

| id | word |

+----+-----------+

| 1 | 사람 |

| 2 | 소망 |

| 3 | 인력 |

| 4 | 소나기 |

| 5 | 바람 |

+----+-----------+

5 rows in set (0.00 sec)

Sphinx.conf 파일

source words_kr

{

type = mysql

sql_host = localhost

sql_user = und3r

sql_pass = my_password

sql_db = und3r

sql_port = 3306 # optional, default is 3306

sql_query_pre = SET NAMES UTF8

sql_query = SELECT id, word FROM words_kr

sql_field_string= word

}

index words_kr

{

source = words_kr

path = /var/lib/sphinx/words_kr

}

indexer

{

mem_limit = 128M

}

searchd

{

listen = 9312

listen = 9306:mysql41

log = /var/log/sphinx/searchd.log

query_log = /var/log/sphinx/query.log

read_timeout = 5

max_children = 30

pid_file = /var/run/sphinx/searchd.pid

seamless_rotate = 1

preopen_indexes = 1

unlink_old = 1

workers = threads # for RT to work

binlog_path = /var/lib/sphinx/

}

만약 Database 의 password 에 특수문자가 포함되어 있다면, escape 해 주어야 합니다. 예를들어 password 가 'und3r#' 이라면, 아래처럼 되어야 합니다.

sql_pass = und3r\#

유니코드를 제대로 출력하기 위해서는 "sql_query_pre = SET NAMES UTF8" 부분을 반드시 넣어 주어야 합니다. :)

위 설정이 가장 기본적인 형태입니다. 사실 여기서 한글이나 일본어, 중국어 같은 문자로 검색을 하려면 설정을 조금 해 주어야 합니다. 이 부분은 밑에서 다시 설명 드리도록 하겠습니다. :)

실행

자, 이제 모든 설정은 끝이 났습니다. 이제 무엇을 해야 할까요?

동작 원리를 생각해 봅시다.

1. Sphinx 는 Database 에 접속해서 데이터를 긁어 옵니다.

2. Sphinx 는 긁어온 데이터로 index 트리를 생성합니다.

3. Sphinx 는 사용자로부터 요청이 오면, 응답을 해 주어야 합니다.

4. RT 를 사용하지 않는 이상, 주기적으로 Database 의 record 를 update 해야 합니다.

이 정도가 되겠네요.

현재 우리가 진행한 작업은 Database 에 접속하는 정도(?) 라고 볼 수 있겠네요.

그럼 이제 데이터를 긁어와 보도록 하겠습니다.

indexer 구성

[und3r@sungwook ~]$ sudo /usr/bin/indexer -c /etc/sphinx/sphinx.conf --all

[sudo] password for und3r:

Sphinx 2.2.8-id64-release (rel22-r4942)

using config file '/etc/sphinx/sphinx.conf'...

indexing index 'words_kr'...

collected 58 docs, 0.0 MB

sorted 0.0 Mhits, 100.0% done

total 58 docs, 131 bytes

total 0.003 sec, 37654 bytes/sec, 16671.45 docs/sec

total 64 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg

total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

indexer 를 이용하면 되는데, /usr/bin 에 있습니다. 그래서 절대경로를 적어줄 필요는 없는데, 확인차 붙여주었습니다 :)

-c 는 어떤 config 파일을 실행할 것인지에 대한 옵션입니다. 마지막으로 --all 은 모든 index 를 생성하겠다는 이야기입니다. 제 경우라면, sphinx.conf 파일 안에 words_kr 이라는 index 하나 밖에 없기 때문에, 결국 아래랑 동일합니다.

[und3r@sungwook ~]$ sudo indexer -c /etc/sphinx/sphinx.conf words_kr

만약 sphinx daemon 을 돌리고 있는 중이라면, --ratate 옵션 함께 주어야 합니다. 이후 설명 드리겠습니다.

자, 이제 제대로 data 가 들어왔는지 확인해 보도록 하겠습니다. sphinx 는 mysq client 로 접근하면 확인이 가능합니다.

그런데 접속을 하려면, sphinx server 가 돌고 있어야 하지 않겠습니까?

위에 indexer 는 말 그대로, database 파일을 긁어온게 전부입니다.(단순히 긁어온건 아니고, indexer 트리를 생성했죠)

서버는 돌려 봅시다. 서비스 하려면 서버가 필요합니다.

[und3r@sungwook ~]$ sudo searchd -c /etc/sphinx/sphinx.conf

Sphinx 2.2.8-id64-release (rel22-r4942)

using config file '/etc/sphinx/sphinx.conf'...

listening on all interfaces, port=9312

listening on all interfaces, port=9306

precaching index 'words_kr'

precached 1 indexes in 0.001 sec

-c 옵션은 indexer 랑 동일하게 config 파일을 지정해 주는 것입니다.

프로세스를 확인해 보죠.

[und3r@sungwook ~]$ ps aux | grep searchd

root 3163 0.0 0.0 92680 1652 ? S 15:32 0:00 searchd -c /etc/sphinx/sphinx.conf

root 3164 0.1 0.2 102136 5068 ? Sl 15:32 0:00 searchd -c /etc/sphinx/sphinx.conf

이제 접속해 보도록 합시다.

[und3r@sungwook ~]$ mysql -h0 -P9306

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 1

Server version: 2.2.8-id64-release (rel22-r4942)

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

host 는 0 으로 주고, 포트는 sphinx.conf 파일에서 설정한 9306 으로 접속합니다.

데이터를 확인해 봅시다.

mysql> show tables;

+----------+-------+

| Index | Type |

+----------+-------+

| words_kr | local |

+----------+-------+

1 row in set (0.00 sec)

mysql> desc words_kr;

+-------+--------+

| Field | Type |

+-------+--------+

| id | bigint |

| word | field |

| word | string |

+-------+--------+

3 rows in set (0.00 sec)

mysql> select * from words_kr limit 5;

+------+-----------+

| id | word |

+------+-----------+

| 1 | 사람 |

| 2 | 소망 |

| 3 | 인력 |

| 4 | 소나기 |

| 5 | 바람 |

+------+-----------+

5 rows in set (0.00 sec)

눈치 채셨겠지만, sphinx 에서는 database 를 select 하는게 없습니다. 이미 설정 되어있다고 생각하시면 됩니다. table 을 확인해보니, conf 파일에서 설정한 index 가 있네요. :)

주의할 점은 select 문 사용시, sphinx 에서는 기본값으로 LIMIT 20 이 생략되어 있다고 보시면 됩니다. 실제 데이터가 20개 이상 있다고 하더라도, SELECT * FROM words_kr; 을 실행하면, 20개만 나옵니다. 전체를 출력하고 싶다면 COUNT(*) 로 RECORD 갯수를 읽어와서, select 하는 수 밖에 없는것 같습니다.

이제 모든 기본적인 사용은 끝이 났습니다. 그런데 지금 한 것은 최초 indexer 를 생성해서 그것을 사용할 뿐입니다.

주기적으로 변경된 data 를 database 로 부터 읽어와서 갱신해 주어야 합니다.

앞서 포스팅했듯이, rt 는 다른 database 에서 값을 읽어오는게 아니라, 자신이 database 이므로 이 문제가 없습니다. 반면 merge 나 위에 코드처럼 타 database 의 값을 읽어와서 활용하는 경우는 이야기가 다르죠.

이를 해결하기 위해서 linux 의 cron 을 많이 이용합니다.

먼저 indexer 를 update 하는 스크립트를 작성합니다. x 퍼미션 주는것 잊지 않도록 합니다. :)

[words_kr_updater.sh]

#!/bin/bash

/usr/bin/indexer --config /etc/sphinx/sphinx.conf --rotate words_kr

그리고 cron 에 등록해 줍니다. 전 그냥 root 의 crontab 에 넣어주었습니다. sphinx.conf 에 각종 값들 수정이 귀찮아서;;

[und3r@sungwook ~]$ sudo crontab -e

에디터가 열리면, 아래처럼 indexer 를 update 하는 스크립트를 등록해 줍니다.

5 * * * * /home/und3r/sphinx/words_kr_updater.sh

위 경우 5분에 한번씩 갱신하도록 되어 있는데, 시간설정 관련된 것은 cron 사용법을 참고하시기 바랍니다.

더불어 찝찝하면, 아래처럼 crontab 을 재시작 해 줍니다 ^^

[und3r@sungwook ~]$ sudo /etc/init.d/crond restart

이제 시스템이 주기적으로(위 경우 5분) Database 에서 새로운 record 들을 읽어와서 index 를 생성하게 되었습니다. 검색이나 복잡한 query 를 sphinx 에서 처리할 수 있게 된 것이죠.

마지막으로 다음 포스팅에서는 한글 관련된 부분과 검색관련된 부분에 대해서 알아보도록 하겠습니다.

출처 : https://crystalcube.co.kr/166

300x250

'DB > Sphinx' 카테고리의 다른 글

[Sphinx] Sphinx 검색엔진 3부 - 검색 및 설정 (0)	2023.02.06
[Sphinx] Sphinx 검색엔진 1부 - 설치 및 설정 (0)	2023.02.06
[Sphinx] Sphinx 검색엔진 라이브러리 (0)	2023.02.06

현재글[Sphinx] Sphinx 검색엔진 2부 - 설정 및 사용

300x250

개발 중 궁금해서 찾아본 정보이며 모든 글에는 출처를 남깁니다

CURL, windowsserver, chatbot, Java, tomcat, php, Spring, 배송API, DeliveryTracker, 챗봇, Apache24, skilldata, 배송조회, javaconfig, 스킬데이터, 배송추적, javascript, sphinx, ECMA, kakao chatbot,

Today :
Yesterday :

뭐가 자꾸 궁금한 개발자