Web Page Classification using Semantic Image-Blocks
													
								Area 01 – Scienze matematiche e informatiche							
						
                                                
            
             
                                                
                                                
                                                
             
                                                
                                                
                                                
                                                
                                                
            
            
            
            
                        
            
                 
            Tweet
            
            
                  
						
							
						
						
								
                                                                    SINTESI                                                                
								
							
                                                                                                                
                                                        
                                                    
                                                    
                                                    
							We present a web document classification system based on the assumption that the images of a web page are those elements which mainly attract the attention of the user. This assumption implies that the text contained in the visual block in which an image is located, called semantic image-block, should contain relevant information about the page contents. In this paper we propose a new metric, called the Inverse Term Relevance Metric, aimed at assigning higher weighs to relevant terms contained into relevant image-blocks identified by performing a visual layout analysis. The traditional TFxIDF model is modified accordingly and used in the classification task. The effectiveness of this new metric has been validated using different classification algorithms, both supervised and unsupervised. 							
							
                                                        | pagine: | 24 | 
| formato: | 17 x 24 | 
| ISBN: | 978-88-548-1603-9 | 
| data pubblicazione: | Febbraio 2008 | 
| marchio editoriale: | Aracne | 
| collana: | QD quaderni | 2 | 
						
							SINTESI						
						
						
						
												
						
                                                
						
						
												
						
						
						
						
						
												
							
							
						
						
						



