I am working on a Property Listing web site for long which is based on RETS(Real Estate Transaction Standard). It has more then 300000 listing in total for single RETS server and planing to added more RETS server in the system. Day by day data(listing) are increasing and the question scalability become more visible.
Solr makes it easy to run a full-featured search server. In fact, its so easy, It can be setup spending 10-15 min..
- Installing Solr
- Starting Solr
- Indexing Data
- Searching
- Shutdown
Installing Solr
For the purposes of this tutorial, I'll assume you're on a Linux or Mac environment.
You should also have JDK 5 or above installed.
wget http://archive.apache.org/dist/lucene/solr/3.4.0/apache-solr-3.4.0.zip
tar -zxvf apache-solr-3.4.0.tgz
cd apache-solr-3.4.0
tar -zxvf apache-solr-3.4.0.tgz
cd apache-solr-3.4.0
Starting Solr
Solr comes with an example directory which contains some sample files we can use.
We start this example server with java -jar start.jar.
cd example
java -jar start.jar
java -jar start.jar
You should see something like this in the terminal.
2011-10-02 05:20:27.120:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT
....
2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983
2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT
....
2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983
Solr is now running! You can now access the Solr Admin webapp by loadinghttp://localhost:8983/solr/admin/ in your web browser.
Indexing Data
We're now going to add some sample data to our Solr instance.
The exampledocs folder contains some XML files we can use.
A quick glance at one of the XML files reveals that each Solr document consists of multiple fields. Each field has a name and a value. For example:
<doc>
<field name="id">9885A004</field>
<field name="name">Canon PowerShot SD500</field>
<field name="manu">Canon Inc.</field>
...
<field name="inStock">true</field>
</doc>
<field name="id">9885A004</field>
<field name="name">Canon PowerShot SD500</field>
<field name="manu">Canon Inc.</field>
...
<field name="inStock">true</field>
</doc>
The post.sh shell script in the same folder provides a convenient way to add files to Solr via HTTP POST.
cd exampledocs
./post.sh monitor.xml
./post.sh monitor.xml
That produces:
Posting file monitor.xml to http://localhost:8983/solr/update
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><intname="QTime">89</int></lst>
</response>
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><intname="QTime">89</int></lst>
</response>
This response tells us that the POST operation was successful.
Note that there are 2 main ways of adding data to Solr:
- HTTP
- Native client
We'll explore these in greater detail in a subsequent tutorial.
Searching
Let's see if we can retrieve the document we just added.
Since Solr accepts HTTP requests, you can use your web browser to communicate with Solr: http://localhost:8983/solr/select?q=*:*&wt=json
This returns the following JSON result:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"wt": "json",
"q": "*:*"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "3007WFP",
"name": "Dell Widescreen UltraSharp 3007WFP",
"manu": "Dell, Inc.",
"includes": "USB cable",
"weight": 401.6,
"price": 2199,
"popularity": 6,
"inStock": true,
"store": "43.17614,-90.57341",
"cat": [
"electronics",
"monitor"
],
"features": [
"30\" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast"
]
}
]
}
}
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"wt": "json",
"q": "*:*"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "3007WFP",
"name": "Dell Widescreen UltraSharp 3007WFP",
"manu": "Dell, Inc.",
"includes": "USB cable",
"weight": 401.6,
"price": 2199,
"popularity": 6,
"inStock": true,
"store": "43.17614,-90.57341",
"cat": [
"electronics",
"monitor"
],
"features": [
"30\" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast"
]
}
]
}
}
Nice! A quick verification with monitor.xml should confirm that all is in order.
Note that the previous search query used the special *:* syntax to retrieve all documents.
Let's now add all the XML documents, and do some real searching.
./post.sh *.xml
Here's a demonstration of retrieving the name and id of all documents with inStock = false: http://localhost:8983/solr/select?q=inStock:false&wt=json&fl=id,name
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"wt": "json",
"q": "inStock:false",
"fl": "id,name"
}
},
"response": {
"numFound": 4,
"start": 0,
"docs": [
{
"id": "F8V7067-APL-KIT",
"name": "Belkin Mobile Power Cord for iPod w/ Dock"
},
{
"id": "IW-02",
"name": "iPod & iPod Mini USB 2.0 Cable"
},
{
"id": "EN7800GTX/2DHTV/256M",
"name": "ASUS Extreme N7800GTX/2DHTV (256 MB)"
},
{
"id": "100-435805",
"name": "ATI Radeon X1900 XTX 512 MB PCIE Video Card"
}
]
}
}
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"wt": "json",
"q": "inStock:false",
"fl": "id,name"
}
},
"response": {
"numFound": 4,
"start": 0,
"docs": [
{
"id": "F8V7067-APL-KIT",
"name": "Belkin Mobile Power Cord for iPod w/ Dock"
},
{
"id": "IW-02",
"name": "iPod & iPod Mini USB 2.0 Cable"
},
{
"id": "EN7800GTX/2DHTV/256M",
"name": "ASUS Extreme N7800GTX/2DHTV (256 MB)"
},
{
"id": "100-435805",
"name": "ATI Radeon X1900 XTX 512 MB PCIE Video Card"
}
]
}
}
You'll learn more about the various URL query parameters in a separate tutorial.
Shutdown
To shutdown Solr, from the terminal where you launched Solr, hit Ctrl+C. This will shutdown Solr cleanly.
Solr is fairly robust, so even in situations of OS or disk crashes, it is unlikely that Solr's index will become corrupted.
No comments:
Post a Comment